[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-10 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082213#comment-13082213
 ] 

Simon Willnauer commented on LUCENE-3348:
-

FYI - I opened LUCENE-3368 to track the failures in 3.x and backported the test 
together with the fix.

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, 
 LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, 
 fail2.txt.bz2


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081553#comment-13081553
 ] 

Simon Willnauer commented on LUCENE-3348:
-

Committed to trunk in revision 1155278.
I backported the test to 3.x and it failed. Maybe something is wrong with the 
test though, I will dig! Here is the failure: 

{noformat}
[junit] - Standard Output ---
[junit] FAIL: hits id:34 val=-38
[junit]   docID=43 id:-34 foundVal=38
[junit] READER3: FAILED: unexpected exception
[junit] java.lang.AssertionError: id=34 
reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 
_2o(3.4):cv47) totalHits=2
[junit] at org.junit.Assert.fail(Assert.java:91)
[junit] at 
org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:345)
[junit] FAIL: hits id:25 val=39
[junit]   docID=24 id:25 foundVal=39
[junit]   docID=85 id:25 foundVal=43
[junit] READER1: FAILED: unexpected exception
[junit] java.lang.AssertionError: id=25 
reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 
_2o(3.4):cv47) totalHits=2
[junit] at org.junit.Assert.fail(Assert.java:91)
[junit] at 
org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:345)
[junit] -  ---
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestStressNRT 
-Dtestmethod=test 
-Dtests.seed=-78c35b20c01ed2f8:-292d76adf99900e2:3f7c8696906a10c7
[junit] NOTE: reproduce with: ant test -Dtestcase=TestStressNRT 
-Dtestmethod=test 
-Dtests.seed=-78c35b20c01ed2f8:-292d76adf99900e2:3f7c8696906a10c7
[junit] The following exceptions were thrown by threads:
[junit] *** Thread: READER3 ***
[junit] java.lang.RuntimeException: java.lang.AssertionError: id=34 
reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 
_2o(3.4):cv47) totalHits=2
[junit] at 
org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:360)
[junit] Caused by: java.lang.AssertionError: id=34 
reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 
_2o(3.4):cv47) totalHits=2
[junit] at org.junit.Assert.fail(Assert.java:91)
[junit] at 
org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:345)
[junit] *** Thread: READER1 ***
[junit] java.lang.RuntimeException: java.lang.AssertionError: id=25 
reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 
_2o(3.4):cv47) totalHits=2
[junit] at 
org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:360)
[junit] Caused by: java.lang.AssertionError: id=25 
reader=ReadOnlyDirectoryReader(segments_q _2l(3.4):cv62/13 _2p(3.4):Cv6 
_2o(3.4):cv47) totalHits=2
[junit] at org.junit.Assert.fail(Assert.java:91)
[junit] at 
org.apache.lucene.index.TestStressNRT$2.run(TestStressNRT.java:345)
[junit] NOTE: test params are: locale=fr_BE, timezone=EET
[junit] NOTE: all tests run in this JVM:
[junit] [TestCharFilter, TestClassicAnalyzer, TestKeywordAnalyzer, 
TestStandardAnalyzer, TestBinaryDocument, TestAtomicUpdate, 
TestConcurrentMergeScheduler, TestDeletionPolicy, TestDirectoryReader, TestDoc, 
TestLazyProxSkipping, TestMultiLevelSkipList, TestPerSegmentDeletes, 
TestSameTokenSamePosition, TestStressNRT]
[junit] NOTE: Linux 2.6.35-30-generic amd64/Sun Microsystems Inc. 1.6.0_26 
(64-bit)/cpus=12,threads=1,free=286656336,total=352714752
[junit] -  ---

{noformat}

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, 
 LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, 
 fail2.txt.bz2


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but 

[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081038#comment-13081038
 ] 

Yonik Seeley commented on LUCENE-3348:
--

Tricky stuff... great job of tracking all these concurrency issues down!
I tweaked the test (more variability in number of threads, etc) and it's been 
running for 2 hours w/ no failures.

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, 
 LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, 
 fail2.txt.bz2


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-08 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081166#comment-13081166
 ] 

Simon Willnauer commented on LUCENE-3348:
-

I am planning to commit this tomorrow if nobody objects

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, 
 LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, 
 fail2.txt.bz2


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081187#comment-13081187
 ] 

Mark Miller commented on LUCENE-3348:
-

+1 - thanks for ferreting these concurrency issues out!

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, 
 LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, 
 fail2.txt.bz2


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-05 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079814#comment-13079814
 ] 

Simon Willnauer commented on LUCENE-3348:
-

I think I now know what is causing the failure here. In IW#prepareCommit(Map) 
we release the full flush (docWriter.finishFullFlush(success);) before we apply 
the deletes. This means that another thread can start flushing and freeze  
push its global deletes into the BufferedDeleteStream before we call 
IW#maybeApplyDeletes(). if a flush is fast enough (small segment) and something 
else causes the committing thread to wait on the IW in order to apply the 
deletes a del-packet could sneak in not belonging to the commit. I IW#getReader 
this is already handled correctly. 



 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, 
 LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080045#comment-13080045
 ] 

Michael McCandless commented on LUCENE-3348:


I think you are right!  I will fix prepareCommit to match getReader and 
re-beast.

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, 
 LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-05 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080210#comment-13080210
 ] 

Simon Willnauer commented on LUCENE-3348:
-

bq. Patch, incorporating Simon's last suggestion. I think this fixes the 
concurrency bugs – beasting for 2703 iterations so far and no failure!
awesome! :)

bq. Not quite committable – lots of added SOPs. I'll be out next week so won't 
get to this until I'm back so feel free to clean it up and commit!

Mike, I will assign this to me and get it committable next week. 

Thanks, have a good time :)

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, 
 LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080226#comment-13080226
 ] 

Michael McCandless commented on LUCENE-3348:


Thanks Simon!  Also I didn't implement your suggestion above (putting the new 
code into DWTC.obtainAndLock), but I think we should!

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, 
 LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2, fail2.txt.bz2


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073535#comment-13073535
 ] 

Michael McCandless commented on LUCENE-3348:


Thanks Simon; I'll make both of those fixes.

Unfortunately there is still at least one more thread safety issue that I'm 
trying to track down... beasting uncovered a good seed.

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-01 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073556#comment-13073556
 ] 

Simon Willnauer commented on LUCENE-3348:
-

bq. Unfortunately there is still at least one more thread safety issue that I'm 
trying to track down... beasting uncovered a good seed.

argh! can you post it here?

simon


 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073588#comment-13073588
 ] 

Michael McCandless commented on LUCENE-3348:


Here's what I run with the while1 tester in luceneutil: {{TestStressNRT -iters 
3 -verbose -seed -6208047570437556381:-3138230871915238634}}

I think what's special about the seed is maxBufferedDocs is 3, so we are doing 
tons of segment flushing.  I dumbed back the test somewhat (turned off merging 
entirely, only 1 reader thread, up to 5 writer threads, and it still fails.

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-01 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073678#comment-13073678
 ] 

Simon Willnauer commented on LUCENE-3348:
-

mike I can not reproduce this failure.. what exactly is failing there? maybe 
you can put the output in a text file and attache it?

Regarding the latest patch, I think we can call 
DWFlushControl#addFlushableState() from DWFlushControl#markForFullFlush() and 
use a global list to collect the DWPT for the full flush. 

I think we should move the getAndLock call into DWFlushControl something like 
DWFlushControl#obtainAndLock(), this would allow us to make the check and the 
DWFlushControl#addFlushableState() method private to DWFC. Further we can also 
simplify the deleteQueue check a little since we already obtained a ThreadState 
we don't need to unlock the state again after calling addFlushableState(), 
something like this:

{code}
ThreadState obtainAndLock() {
final ThreadState perThread = perThreadPool.getAndLock(Thread
.currentThread(), documentsWriter);
if (perThread.isActive()
 perThread.perThread.deleteQueue != documentsWriter.deleteQueue) {
  // There is a flush-all in process and this DWPT is
  // now stale -- enroll it for flush and try for
  // another DWPT:
  addFlushableState(perThread);
}
return perThread;
  }
{code}

Eventually we are spending too much time in full flush since we lock all 
ThreadStates at least once while some indexing threads might have already 
helped out with swapping out DWPT instances. I think we can collect already 
swapped out ThreadStates during a full flush and only check the ones that have 
not been processed? 


 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-08-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073804#comment-13073804
 ] 

Michael McCandless commented on LUCENE-3348:


OK I attached output of a failure -- it's 400K lines.  Search for the 
AssertionError, where id:26 couldn't find a doc nor tombstone.

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, 
 fail.txt.bz2


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-07-31 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073332#comment-13073332
 ] 

Simon Willnauer commented on LUCENE-3348:
-

mike, patch looks good. some thoughts:

* can we factor out the while(true){.. getAndLock(thread, DW) .. } to prevent 
this code duplication?
* you throw a NPE if the DWPT is null, yet this is handled by 
ThreadState#isActive() and calls ensureOpen() to throw consistent exception 
when IW is closed like further down you see:
{code}
if (!perThread.isActive()) {
  ensureOpen();
  assert false: perThread is not active but we are still open;
}
{code}

I think this will also solve the deadlock issue you describing above, no?

thanks for taking care of this, another proof that concurrency is not easy :)



 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch, LUCENE-3348.patch


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-07-29 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072962#comment-13072962
 ] 

Jason Rutherglen commented on LUCENE-3348:
--

Sorry to add my opinion to this, however I think that while non-blocking 
deletes are quite fancy, it seems they are  open to various bugs such as this.  
Is there a compelling reason non-locking is used, eg, performance?

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-07-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072968#comment-13072968
 ] 

Simon Willnauer commented on LUCENE-3348:
-

bq. Sorry to add my opinion to this, however I think that while non-blocking 
deletes are quite fancy, it seems they are open to various bugs such as this. 
Is there a compelling reason non-locking is used, eg, performance?

Jason, this issue is unrelated to non-blocking deletes. The bug here is in 
concurrent flush which is indeed the main performance factor in DWPT. 

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-07-28 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072509#comment-13072509
 ] 

Simon Willnauer commented on LUCENE-3348:
-

mike, patch looks good. one little think, you should check if the DWPT is 
already pending before calling #setFlushPending(DWPT).

{quote}
I think it'd be better to somehow, up
front in flush-all, mark all current DWPTs as stale, pull them out of
rotation, so that the thread pool would never return such a stale
DWPT.
{quote}
the problem here is that you need to lock all the states that are selected for 
flushing. at the same time an indexing thread could lock such a DWPT and index 
a document which causes the problem this issues tries to solve. If you sync the 
thread pools getAndLock method this could work but in the non-blocking approach 
I think this is the only way to prevent this.

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-07-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072523#comment-13072523
 ] 

Michael McCandless commented on LUCENE-3348:


OK I'll make sure it's not already pending.



 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all

2011-07-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072599#comment-13072599
 ] 

Michael McCandless commented on LUCENE-3348:


The 2nd bug seems to be because a commit() is running concurrently with a 
getReader(), and the flush-all being done for the getReader() is making a newly 
flushed segment visible to the SegmentInfos just before commit clones the 
SegmentInfos, and the buffered deletes have not been fully processed yet at the 
point for that new segment (and for segments before it).

You can see it in IW.prepareCommit -- we call flush(true, true) and then 
startCommit w/o any sync, so in there a concurrent getReader can sneak in a 
change to the segmentInfos so that an updateDocument appears non-atomic.

 IndexWriter applies wrong deletes during concurrent flush-all
 -

 Key: LUCENE-3348
 URL: https://issues.apache.org/jira/browse/LUCENE-3348
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3348.patch


 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is
 underway, it is possible for an incoming update to pick a DWPT that is
 stale, ie, not yet pulled/marked for flushing, yet the DW has cutover
 to a new deletes queue.  If this happens, and the deleted term was
 also updated in one of the non-stale DWPTs, then the wrong document is
 deleted and the test fails by detecting the wrong value.
 There's a 2nd failure mode that I haven't figured out yet, whereby 2
 docs are returned when searching by id (there should only ever be 1
 doc since the test uses updateDocument which is atomic wrt
 commit/reopen).
 Yonik verified the test passes pre-DWPT, so my guess is (but I
 have yet to verify) this test also passes on 3.x.  I'll backport
 the test to 3.x to be sure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org