[jira] [Updated] (SOLR-13872) Backup can fail to read index files w/NoSuchFileException during merges (SOLR-11616 regression)

2019-11-13 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-13872:
--
Fix Version/s: 8.4
   master (9.0)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

>  Backup can fail to read index files w/NoSuchFileException during merges 
> (SOLR-11616 regression)
> 
>
> Key: SOLR-13872
> URL: https://issues.apache.org/jira/browse/SOLR-13872
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: master (9.0), 8.4
>
> Attachments: SOLR-13872.patch, SOLR-13872.patch, SOLR-13872.patch, 
> SOLR-13872.patch, index_churn.pl
>
>
> SOLR-11616 purports to fix a bug in Solr's backup functionality that causes 
> 'NoSuchFileException' errors when attempting to backup an index while it is 
> undergoing indexing (and segment merging)
> Although SOLR-11616 is marked with "Fix Version: 7.2" it's pretty easy to 
> demonstrate that this bug still exists on master, branch_8x, and even in 7.2 
> - so it seems less like the current problem is a "regression" and more that 
> the original fix didn't work.
> 
> The crux of the problem seems to be concurrency bugs in if/how a commit is 
> "reserved" before attempting to copy the files in that commit to the backup 
> location.  
> A possible work around discussed in more depth in the comments below is to 
> update {{solrconfig.xml}} to explicitly configure the {{SolrDeletionPolicy}} 
> with either the {{maxCommitsToKeep}} or {{maxCommitAge}} options to ensure 
> the commits are kept around long enough for the backup to be created.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13872) Backup can fail to read index files w/NoSuchFileException during merges (SOLR-11616 regression)

2019-11-11 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-13872:
--
Status: Patch Available  (was: Open)

>  Backup can fail to read index files w/NoSuchFileException during merges 
> (SOLR-11616 regression)
> 
>
> Key: SOLR-13872
> URL: https://issues.apache.org/jira/browse/SOLR-13872
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-13872.patch, SOLR-13872.patch, SOLR-13872.patch, 
> SOLR-13872.patch, index_churn.pl
>
>
> SOLR-11616 purports to fix a bug in Solr's backup functionality that causes 
> 'NoSuchFileException' errors when attempting to backup an index while it is 
> undergoing indexing (and segment merging)
> Although SOLR-11616 is marked with "Fix Version: 7.2" it's pretty easy to 
> demonstrate that this bug still exists on master, branch_8x, and even in 7.2 
> - so it seems less like the current problem is a "regression" and more that 
> the original fix didn't work.
> 
> The crux of the problem seems to be concurrency bugs in if/how a commit is 
> "reserved" before attempting to copy the files in that commit to the backup 
> location.  
> A possible work around discussed in more depth in the comments below is to 
> update {{solrconfig.xml}} to explicitly configure the {{SolrDeletionPolicy}} 
> with either the {{maxCommitsToKeep}} or {{maxCommitAge}} options to ensure 
> the commits are kept around long enough for the backup to be created.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13872) Backup can fail to read index files w/NoSuchFileException during merges (SOLR-11616 regression)

2019-11-11 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-13872:
--
Attachment: SOLR-13872.patch
Status: Open  (was: Open)

updated patch with finished tests and all nocommits resolved.

i think this is ready

>  Backup can fail to read index files w/NoSuchFileException during merges 
> (SOLR-11616 regression)
> 
>
> Key: SOLR-13872
> URL: https://issues.apache.org/jira/browse/SOLR-13872
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-13872.patch, SOLR-13872.patch, SOLR-13872.patch, 
> SOLR-13872.patch, index_churn.pl
>
>
> SOLR-11616 purports to fix a bug in Solr's backup functionality that causes 
> 'NoSuchFileException' errors when attempting to backup an index while it is 
> undergoing indexing (and segment merging)
> Although SOLR-11616 is marked with "Fix Version: 7.2" it's pretty easy to 
> demonstrate that this bug still exists on master, branch_8x, and even in 7.2 
> - so it seems less like the current problem is a "regression" and more that 
> the original fix didn't work.
> 
> The crux of the problem seems to be concurrency bugs in if/how a commit is 
> "reserved" before attempting to copy the files in that commit to the backup 
> location.  
> A possible work around discussed in more depth in the comments below is to 
> update {{solrconfig.xml}} to explicitly configure the {{SolrDeletionPolicy}} 
> with either the {{maxCommitsToKeep}} or {{maxCommitAge}} options to ensure 
> the commits are kept around long enough for the backup to be created.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13872) Backup can fail to read index files w/NoSuchFileException during merges (SOLR-11616 regression)

2019-11-08 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-13872:
--
Attachment: SOLR-13872.patch
Status: Open  (was: Open)

updated patch working away at some of the remaining nocommit...

* removed nocommits related to things spun off into new (linked) jiras
* test additions
* ref-guide note about backups when using softCommit


>  Backup can fail to read index files w/NoSuchFileException during merges 
> (SOLR-11616 regression)
> 
>
> Key: SOLR-13872
> URL: https://issues.apache.org/jira/browse/SOLR-13872
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-13872.patch, SOLR-13872.patch, SOLR-13872.patch, 
> index_churn.pl
>
>
> SOLR-11616 purports to fix a bug in Solr's backup functionality that causes 
> 'NoSuchFileException' errors when attempting to backup an index while it is 
> undergoing indexing (and segment merging)
> Although SOLR-11616 is marked with "Fix Version: 7.2" it's pretty easy to 
> demonstrate that this bug still exists on master, branch_8x, and even in 7.2 
> - so it seems less like the current problem is a "regression" and more that 
> the original fix didn't work.
> 
> The crux of the problem seems to be concurrency bugs in if/how a commit is 
> "reserved" before attempting to copy the files in that commit to the backup 
> location.  
> A possible work around discussed in more depth in the comments below is to 
> update {{solrconfig.xml}} to explicitly configure the {{SolrDeletionPolicy}} 
> with either the {{maxCommitsToKeep}} or {{maxCommitAge}} options to ensure 
> the commits are kept around long enough for the backup to be created.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13872) Backup can fail to read index files w/NoSuchFileException during merges (SOLR-11616 regression)

2019-11-08 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-13872:
--
Attachment: SOLR-13872.patch
Status: Open  (was: Open)

Ok, here's a patch with the API/Usage changes I think we should make.

There are still a lot of nocommits, but mostly just related to additional test 
coverage I want to add (and most of that is around named snapshots since i 
didn't dig into that very deep and i want to make sure taking backups that way 
is as solid as the "simple" path) but I think the new API & synchornization 
logic is pretty solid.

There's also a few places were I have nocommits related to needing to spin out 
loosly related jiras.  Once I've done that i'll slim down this patch a bit and 
link those jiras and then finish up the tests.


>  Backup can fail to read index files w/NoSuchFileException during merges 
> (SOLR-11616 regression)
> 
>
> Key: SOLR-13872
> URL: https://issues.apache.org/jira/browse/SOLR-13872
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-13872.patch, SOLR-13872.patch, index_churn.pl
>
>
> SOLR-11616 purports to fix a bug in Solr's backup functionality that causes 
> 'NoSuchFileException' errors when attempting to backup an index while it is 
> undergoing indexing (and segment merging)
> Although SOLR-11616 is marked with "Fix Version: 7.2" it's pretty easy to 
> demonstrate that this bug still exists on master, branch_8x, and even in 7.2 
> - so it seems less like the current problem is a "regression" and more that 
> the original fix didn't work.
> 
> The crux of the problem seems to be concurrency bugs in if/how a commit is 
> "reserved" before attempting to copy the files in that commit to the backup 
> location.  
> A possible work around discussed in more depth in the comments below is to 
> update {{solrconfig.xml}} to explicitly configure the {{SolrDeletionPolicy}} 
> with either the {{maxCommitsToKeep}} or {{maxCommitAge}} options to ensure 
> the commits are kept around long enough for the backup to be created.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13872) Backup can fail to read index files w/NoSuchFileException during merges (SOLR-11616 regression)

2019-11-04 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-13872:
--
Attachment: SOLR-13872.patch
  Assignee: Chris M. Hostetter
Status: Open  (was: Open)


Ok, I think i have enough of a handle on what's going on, and what *needs* to 
be going on to move forward...

First off -- I'm attaching a patch with some new/additional tests logic:
* TestStressThreadBackup in particular is a starting point for a robust test 
similar to the manual steps to reproduce that i posted above.  
** It tests using both the REplicationHandler and the CoreAdmin API for doing 
core backups.  
** it fails easily for me using replication handler, but iv'e never actual seen 
it fail using the CoreApi (which jives with my previous comment about the 
window of time for the race condition being shorter in thta code path)
* The changes to TestCoreBackup and TestReplicationHandler are largely just to 
prove to myself that most of the complexity in the IndexDeletionPolicyWrapper 
as far as allowing callers to pass in an arbitrary commit (instead of the 
"latest" commit that IndexDeletionPolicyWrapper knows about) is really not 
needed (AFAICT ... i may have missed a use case).  
** So most of the "If no latest commit in IDWP, then use & reserve latest 
commit from searcher" is not needed
** In fact, because of how the "NRT" readers in use by the SolrIndexSearcher 
work, it's a really bad idea to do this
*** see additions to TestIndexWriterReader and 
TestCoreBackup.testDemoWhyBackupCodeShouldNeverUseIndexCommitFromSearcher

So with all this in mind, i'm going to move forward with the basic API changes 
i proposed before, and -- i think -- make hte delete() method in the Delegate 
IndexCommit wrappers synchronize on the outer IndexDeletionPolicyWrapper to 
address the main synchronization concerns i had before.  from what i can see so 
far that, combined with a new (synchronized) method to atomically 
"getAndReserveLatestCommit()", should fix the API flaws (when used properly in 
the caller code, which i'll also work on).

I'm also going to try and remove some of the duplicate code paths in 
SnapShooter -- there's no reason why createSnapshot and createSnapAsync should 
look so similar and still be so different -- the async code paths should just 
call the same methods as the sync code path, but in a thread.


>  Backup can fail to read index files w/NoSuchFileException during merges 
> (SOLR-11616 regression)
> 
>
> Key: SOLR-13872
> URL: https://issues.apache.org/jira/browse/SOLR-13872
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-13872.patch, index_churn.pl
>
>
> SOLR-11616 purports to fix a bug in Solr's backup functionality that causes 
> 'NoSuchFileException' errors when attempting to backup an index while it is 
> undergoing indexing (and segment merging)
> Although SOLR-11616 is marked with "Fix Version: 7.2" it's pretty easy to 
> demonstrate that this bug still exists on master, branch_8x, and even in 7.2 
> - so it seems less like the current problem is a "regression" and more that 
> the original fix didn't work.
> 
> The crux of the problem seems to be concurrency bugs in if/how a commit is 
> "reserved" before attempting to copy the files in that commit to the backup 
> location.  
> A possible work around discussed in more depth in the comments below is to 
> update {{solrconfig.xml}} to explicitly configure the {{SolrDeletionPolicy}} 
> with either the {{maxCommitsToKeep}} or {{maxCommitAge}} options to ensure 
> the commits are kept around long enough for the backup to be created.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org