[ 
https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643512#action_12643512
 ] 

Hoss Man commented on SOLR-830:
-------------------------------

while i haven't noticed this bug myself, and i'm not very "shell script smart", 
it does seem like it would be a problem ... prior to r529471 snappuller used a 
find command that required the filenames start with "snapshot." where as the 
current grep just requires that it contain "snapshot."

the original poster had the following suggested fix...

{noformat}
We've tweaked the line to exclude any directories starting with "temp":
snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
${master_data_dir}|grep 'snapshot\.'|grep -v wip|grep -v temp|sort -r|head
-1"` 
{noformat}

...my first reaction would be to change the grep to use an anchored regex, but 
i'm not sure how portable that is - so perhaps the "grep -v temp" is the way to 
go.  I'll leave it to people who understand shell scripts better.


> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Hoss Man
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication. 
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community. 
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"` 
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to