[ 
https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-830:
--------------------------

    Description: 
as mentioned on the mailing list...

http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
{noformat}
We're seeing strange behavior on one of our slave nodes after replication. 
When the new searcher is created we see FileNotFoundExceptions in the log
and the index is strangely invalid/corrupted.

We may have identified the root cause but wanted to run it by the community. 
We figure there is a bug in the snappuller shell script, line 181:

snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"` 

This line determines the directory name of the latest snapshot to download
to the slave from the master.  Problem with this line is that it grab the
temporary work directory of a snapshot in progress.  Those temporary
directories are prefixed with  "temp" and as far as I can tell should never
get pulled from the master so its easy to disambiguate.  It seems that this
temp directory, if it exists will be the newest one so if present it will be
the one replicated: FAIL.
{noformat}

  was:
as mentioned on the mailing list...

http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
{quote}
We're seeing strange behavior on one of our slave nodes after replication. 
When the new searcher is created we see FileNotFoundExceptions in the log
and the index is strangely invalid/corrupted.

We may have identified the root cause but wanted to run it by the community. 
We figure there is a bug in the snappuller shell script, line 181:

snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"` 

This line determines the directory name of the latest snapshot to download
to the slave from the master.  Problem with this line is that it grab the
temporary work directory of a snapshot in progress.  Those temporary
directories are prefixed with  "temp" and as far as I can tell should never
get pulled from the master so its easy to disambiguate.  It seems that this
temp directory, if it exists will be the newest one so if present it will be
the one replicated: FAIL.
{quote}


> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Hoss Man
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication. 
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community. 
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"` 
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to