[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082213#comment-13082213 ] Simon Willnauer commented on LUCENE-3348: - FYI - I opened LUCENE-3368 to track the failures in 3.x and backported the test together with the fix. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081553#comment-13081553 ] Simon Willnauer commented on LUCENE-3348: - Committed to trunk in revision 1155278. I backported the test to 3.x and it failed. Maybe something is wrong with the test though, I will dig! Here is the failure: {noformat} [junit] - Standard Output --- [junit] FAIL: hits id:34 val=-38 [junit] docID=43 id:-34 foundVal=38 [junit] READER3: FAILED: unexpected exception [junit] java.lang.AssertionError: id=34 reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 _2o(3.4):cv47) totalHits=2 [junit] at org.junit.Assert.fail(Assert.java:91) [junit] at org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:345) [junit] FAIL: hits id:25 val=39 [junit] docID=24 id:25 foundVal=39 [junit] docID=85 id:25 foundVal=43 [junit] READER1: FAILED: unexpected exception [junit] java.lang.AssertionError: id=25 reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 _2o(3.4):cv47) totalHits=2 [junit] at org.junit.Assert.fail(Assert.java:91) [junit] at org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:345) [junit] - --- [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressNRT -Dtestmethod=test -Dtests.seed=-78c35b20c01ed2f8:-292d76adf99900e2:3f7c8696906a10c7 [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressNRT -Dtestmethod=test -Dtests.seed=-78c35b20c01ed2f8:-292d76adf99900e2:3f7c8696906a10c7 [junit] The following exceptions were thrown by threads: [junit] *** Thread: READER3 *** [junit] java.lang.RuntimeException: java.lang.AssertionError: id=34 reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 _2o(3.4):cv47) totalHits=2 [junit] at org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:360) [junit] Caused by: java.lang.AssertionError: id=34 reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 _2o(3.4):cv47) totalHits=2 [junit] at org.junit.Assert.fail(Assert.java:91) [junit] at org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:345) [junit] *** Thread: READER1 *** [junit] java.lang.RuntimeException: java.lang.AssertionError: id=25 reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 _2o(3.4):cv47) totalHits=2 [junit] at org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:360) [junit] Caused by: java.lang.AssertionError: id=25 reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 _2o(3.4):cv47) totalHits=2 [junit] at org.junit.Assert.fail(Assert.java:91) [junit] at org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:345) [junit] NOTE: test params are: locale=fr_BE, timezone=EET [junit] NOTE: all tests run in this JVM: [junit] [TestCharFilter, TestClassicAnalyzer, TestKeywordAnalyzer, TestStandardAnalyzer, TestBinaryDocument, TestAtomicUpdate, TestConcurrentMergeScheduler, TestDeletionPolicy, TestDirectoryReader, TestDoc, TestLazyProxSkipping, TestMultiLevelSkipList, TestPerSegmentDeletes, TestSameTokenSamePosition, TestStressNRT] [junit] NOTE: Linux 2.6.35-30-generic amd64/Sun Microsystems Inc. 1.6.0_26 (64-bit)/cpus=12,threads=1,free=286656336,total=352714752 [junit] - --- {noformat} IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081038#comment-13081038 ] Yonik Seeley commented on LUCENE-3348: -- Tricky stuff... great job of tracking all these concurrency issues down! I tweaked the test (more variability in number of threads, etc) and it's been running for 2 hours w/ no failures. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081166#comment-13081166 ] Simon Willnauer commented on LUCENE-3348: - I am planning to commit this tomorrow if nobody objects IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081187#comment-13081187 ] Mark Miller commented on LUCENE-3348: - +1 - thanks for ferreting these concurrency issues out! IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079814#comment-13079814 ] Simon Willnauer commented on LUCENE-3348: - I think I now know what is causing the failure here. In IW#prepareCommit(Map) we release the full flush (docWriter.finishFullFlush(success);) before we apply the deletes. This means that another thread can start flushing and freeze push its global deletes into the BufferedDeleteStream before we call IW#maybeApplyDeletes(). if a flush is fast enough (small segment) and something else causes the committing thread to wait on the IW in order to apply the deletes a del-packet could sneak in not belonging to the commit. I IW#getReader this is already handled correctly. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080045#comment-13080045 ] Michael McCandless commented on LUCENE-3348: I think you are right! I will fix prepareCommit to match getReader and re-beast. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080210#comment-13080210 ] Simon Willnauer commented on LUCENE-3348: - bq. Patch, incorporating Simon's last suggestion. I think this fixes the concurrency bugs – beasting for 2703 iterations so far and no failure! awesome! :) bq. Not quite committable – lots of added SOPs. I'll be out next week so won't get to this until I'm back so feel free to clean it up and commit! Mike, I will assign this to me and get it committable next week. Thanks, have a good time :) IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080226#comment-13080226 ] Michael McCandless commented on LUCENE-3348: Thanks Simon! Also I didn't implement your suggestion above (putting the new code into DWTC.obtainAndLock), but I think we should! IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073535#comment-13073535 ] Michael McCandless commented on LUCENE-3348: Thanks Simon; I'll make both of those fixes. Unfortunately there is still at least one more thread safety issue that I'm trying to track down... beasting uncovered a good seed. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073556#comment-13073556 ] Simon Willnauer commented on LUCENE-3348: - bq. Unfortunately there is still at least one more thread safety issue that I'm trying to track down... beasting uncovered a good seed. argh! can you post it here? simon IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073588#comment-13073588 ] Michael McCandless commented on LUCENE-3348: Here's what I run with the while1 tester in luceneutil: {{TestStressNRT -iters 3 -verbose -seed -6208047570437556381:-3138230871915238634}} I think what's special about the seed is maxBufferedDocs is 3, so we are doing tons of segment flushing. I dumbed back the test somewhat (turned off merging entirely, only 1 reader thread, up to 5 writer threads, and it still fails. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073678#comment-13073678 ] Simon Willnauer commented on LUCENE-3348: - mike I can not reproduce this failure.. what exactly is failing there? maybe you can put the output in a text file and attache it? Regarding the latest patch, I think we can call DWFlushControl#addFlushableState() from DWFlushControl#markForFullFlush() and use a global list to collect the DWPT for the full flush. I think we should move the getAndLock call into DWFlushControl something like DWFlushControl#obtainAndLock(), this would allow us to make the check and the DWFlushControl#addFlushableState() method private to DWFC. Further we can also simplify the deleteQueue check a little since we already obtained a ThreadState we don't need to unlock the state again after calling addFlushableState(), something like this: {code} ThreadState obtainAndLock() { final ThreadState perThread = perThreadPool.getAndLock(Thread .currentThread(), documentsWriter); if (perThread.isActive() perThread.perThread.deleteQueue != documentsWriter.deleteQueue) { // There is a flush-all in process and this DWPT is // now stale -- enroll it for flush and try for // another DWPT: addFlushableState(perThread); } return perThread; } {code} Eventually we are spending too much time in full flush since we lock all ThreadStates at least once while some indexing threads might have already helped out with swapping out DWPT instances. I think we can collect already swapped out ThreadStates during a full flush and only check the ones that have not been processed? IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073804#comment-13073804 ] Michael McCandless commented on LUCENE-3348: OK I attached output of a failure -- it's 400K lines. Search for the AssertionError, where id:26 couldn't find a doc nor tombstone. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073332#comment-13073332 ] Simon Willnauer commented on LUCENE-3348: - mike, patch looks good. some thoughts: * can we factor out the while(true){.. getAndLock(thread, DW) .. } to prevent this code duplication? * you throw a NPE if the DWPT is null, yet this is handled by ThreadState#isActive() and calls ensureOpen() to throw consistent exception when IW is closed like further down you see: {code} if (!perThread.isActive()) { ensureOpen(); assert false: perThread is not active but we are still open; } {code} I think this will also solve the deadlock issue you describing above, no? thanks for taking care of this, another proof that concurrency is not easy :) IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072962#comment-13072962 ] Jason Rutherglen commented on LUCENE-3348: -- Sorry to add my opinion to this, however I think that while non-blocking deletes are quite fancy, it seems they are open to various bugs such as this. Is there a compelling reason non-locking is used, eg, performance? IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072968#comment-13072968 ] Simon Willnauer commented on LUCENE-3348: - bq. Sorry to add my opinion to this, however I think that while non-blocking deletes are quite fancy, it seems they are open to various bugs such as this. Is there a compelling reason non-locking is used, eg, performance? Jason, this issue is unrelated to non-blocking deletes. The bug here is in concurrent flush which is indeed the main performance factor in DWPT. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072509#comment-13072509 ] Simon Willnauer commented on LUCENE-3348: - mike, patch looks good. one little think, you should check if the DWPT is already pending before calling #setFlushPending(DWPT). {quote} I think it'd be better to somehow, up front in flush-all, mark all current DWPTs as stale, pull them out of rotation, so that the thread pool would never return such a stale DWPT. {quote} the problem here is that you need to lock all the states that are selected for flushing. at the same time an indexing thread could lock such a DWPT and index a document which causes the problem this issues tries to solve. If you sync the thread pools getAndLock method this could work but in the non-blocking approach I think this is the only way to prevent this. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072523#comment-13072523 ] Michael McCandless commented on LUCENE-3348: OK I'll make sure it's not already pending. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072599#comment-13072599 ] Michael McCandless commented on LUCENE-3348: The 2nd bug seems to be because a commit() is running concurrently with a getReader(), and the flush-all being done for the getReader() is making a newly flushed segment visible to the SegmentInfos just before commit clones the SegmentInfos, and the buffered deletes have not been fully processed yet at the point for that new segment (and for segments before it). You can see it in IW.prepareCommit -- we call flush(true, true) and then startCommit w/o any sync, so in there a concurrent getReader can sneak in a change to the segmentInfos so that an updateDocument appears non-atomic. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org