[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-07-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696930#comment-13696930
 ] 

ASF subversion and git services commented on SOLR-4926:
---

Commit 1498554 from [~yo...@apache.org]
[ https://svn.apache.org/r1498554 ]

SOLR-4926: SolrDeletionPolicy.getLatestCommit can be null before IW is used

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4

 Attachments: SOLR-4926.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-07-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696939#comment-13696939
 ] 

ASF subversion and git services commented on SOLR-4926:
---

Commit 1498559 from [~yo...@apache.org]
[ https://svn.apache.org/r1498559 ]

SOLR-4926: SolrDeletionPolicy.getLatestCommit can be null before IW is used

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4

 Attachments: SOLR-4926.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693987#comment-13693987
 ] 

Yonik Seeley commented on SOLR-4926:


So one thing I see is that when the old IndexWriter is closed, a new segment 
file is written out.

{code}
  2 58339 T257 C69 P58203 oasu.DefaultSolrCoreState.newIndexWriter Closing old 
IndexWriter... core=collection1 
dir=org.apache.lucene.store.MMapDirectory@/opt/code/lusolr_clean5/solr/build/solr-core/test/J0/org.apache.solr.cloud.ChaosMonkeySafeLeaderTest-1372252751336/jetty12/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@42062bad
  2 58341 T257 C69 P58203 oasu.DefaultSolrCoreState.newIndexWriter 
PREFILES=[_0.fdx, segments_1, _0.cfs, _0.si, _0.cfe, _0.fdt, segments.gen, 
segments_4, _0_1.del, write.lock]


  2 58622 T257 C69 P58203 oasc.SolrDeletionPolicy.onCommit 
SolrDeletionPolicy.onCommit: commits DEBUG_INFO: num=2
  2
commit{dir=/opt/code/lusolr_clean5/solr/build/solr-core/test/J0/org.apache.solr.cloud.ChaosMonkeySafeLeaderTest-1372252751336/jetty12/index,segFN=segments_1,generation=1,filenam
es=[segments_1]}
  2
commit{dir=/opt/code/lusolr_clean5/solr/build/solr-core/test/J0/org.apache.solr.cloud.ChaosMonkeySafeLeaderTest-1372252751336/jetty12/index,segFN=segments_2,generation=2,filenam
es=[_0.cfs, _0.cfe, _0_1.del, segments_2, _0.si]}


  2 58630 T257 C69 P58203 oasu.DefaultSolrCoreState.newIndexWriter 
POSTFILES=[segments_2, _0.cfs, _0.si, _0.cfe, segments.gen, segments_4, 
_0_1.del]

{code}

It looks like this new segments file actually references all of the files we 
just replicated, which is of course wrong!


 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-26 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693993#comment-13693993
 ] 

Mark Miller commented on SOLR-4926:
---

Cool - this fits what I am seeing as well. I also checked to see that you could 
open a writer on the index we replicated and you can no problem. I was 
struggling to find what could cause the corruption from there - was suspecting 
the copy to the other dir somehow, but the writer close makes a lot of sense.

And explains the mysterious segments_2 file when I only expected segments_1 and 
segments_3.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693999#comment-13693999
 ] 

Yonik Seeley commented on SOLR-4926:


OK, looks like the close of the old IW is actually changing some of the segment 
files (the number in parens is the file size).
Now the question is why...

{code}
  2 57648 T256 C30 P58775 oasu.DefaultSolrCoreState.showFiles PREFILES 
_0.cfe(266)_0.cfs(4375)_0.fdt(0)_0.fdx(0)_0.si(239)_0_1.del(32)segments.gen(20)segments_1(45)segments_4(99)write.lock(0)

  2 57887 T256 C30 P58775 oasu.DefaultSolrCoreState.showFiles POSTFILES 
_0.cfe(266)_0.cfs(2993)_0.si(239)_0_1.del(31)segments.gen(20)segments_2(70)segments_4(99)
{code}


 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694042#comment-13694042
 ] 

Yonik Seeley commented on SOLR-4926:


Running with a local modification that does a commit at the start of snappuller 
(as well as getting the latest commit point from the deletion policy).  
Crossing my fingers... 12 successes in a row so far.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694055#comment-13694055
 ] 

Yonik Seeley commented on SOLR-4926:


bq. I have this competing approach attached 

Yeah, I like the rollback right before... safer in the event that somehow the 
IW acquires more changes.  I was initially worried about code that uses the IW 
to reopen the reader, but that code (openSearcher) doesn't force open a writer 
and has a fallback to reopen on the reader.

I'll start looping the chaos test with your patch and see how it goes.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4

 Attachments: SOLR-4926.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-26 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694211#comment-13694211
 ] 

Mark Miller commented on SOLR-4926:
---

The commit bot got knocked out because JIRA started responding that it was not 
authorized.

committed:
5x 1497020
4x 1497022

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4

 Attachments: SOLR-4926.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-26 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694213#comment-13694213
 ] 

Mark Miller commented on SOLR-4926:
---

Yonik follow up with

4x 1497055 get latest commit from deletion policy
5x 1497054 get latest commit from deletion policy

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4

 Attachments: SOLR-4926.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-25 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693090#comment-13693090
 ] 

Mark Miller commented on SOLR-4926:
---

Just to update:

bq. is somehow using a different Directory

I was confused - better logging shows it's the same directory, just with 
completely different files in it. So I'm still confused.

I see the index getting replicated into the directory and the right files in 
the directory.
I see the IndexWriter open on the directory and all the right files in the 
directory.
I see the IndexReader reopened with that IndexWriter - and boom, it's got a 
different set of files. It started with an index of gen1, replicated an index 
of gen 3, and ends up with a gen 2 index with totally different files.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-25 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693265#comment-13693265
 ] 

Mark Miller commented on SOLR-4926:
---

It turns out this was all being caused by:

{noformat}
[junit4:junit4]   2 org.apache.lucene.index.CorruptIndexException: invalid 
deletion count: 38 (resource: 
ChecksumIndexInput(RAMInputStream(name=segments_3)))
[junit4:junit4]   2at 
org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:340)
[junit4:junit4]   2at 
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:374)
{noformat}

That causes the opening IW to roll back to gen 2 and remove the gen 3 files.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692346#comment-13692346
 ] 

Yonik Seeley commented on SOLR-4926:


I started looking for failed assertions, and ran across one not related to 
index corruption.  This is prob not related to our current issues though.

{code}
  2 68103 T325 oasc.SolrException.log ERROR java.lang.AssertionError
  2at 
org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:62)
  2at org.apache.solr.core.SolrCore.close(SolrCore.java:1059)
  2at 
org.apache.solr.handler.admin.CoreAdminHandler$2.run(CoreAdminHandler.java:814)
  2
  2 68104 T206 C40 P47070 oasc.SolrException.log ERROR java.lang.AssertionError
  2at 
org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:62)
  2at org.apache.solr.core.SolrCore.close(SolrCore.java:1059)
  2at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:389)
{code}


 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-20 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689992#comment-13689992
 ] 

Mark Miller commented on SOLR-4926:
---

Still digging into RecoveryZkTest fails with compound file format...

Still only seems to happen with RamDir.

Oddly, it seems that the searcher opened after the replication is somehow using 
a different Directory than the one the replication goes to and the writer is 
reopened on. Even though path are against the same path. I don't know how this 
can happen yet - and it would seem this may not be the same issue that affected 
the safe leader test considering it looks hard coded to standard directory.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688146#comment-13688146
 ] 

Yonik Seeley commented on SOLR-4926:


I hacked the lucene IWC and MergePolicy classes to never use compound format, 
and then started ChaosMonkeySafeLeaderTest tests in a loop.
11 passes in a row so far, so it definitely looks like these failures are 
related to the compound file format.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688150#comment-13688150
 ] 

Uwe Schindler commented on SOLR-4926:
-

How does this test depend on CFS or not?

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688151#comment-13688151
 ] 

Yonik Seeley commented on SOLR-4926:


bq. How does this test depend on CFS or not?

That's the million dollar question :-)  It does not, explicitly, but it seems 
like the use of CFS somehow causes replication to fail.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688339#comment-13688339
 ] 

Mark Miller commented on SOLR-4926:
---

bq. the use of CFS somehow causes replication to fail

Yeah, this is what I'm seeing - I just caught a really good sample case with 
decent logging.

The recovering replica commits on the leader and that leader then has 126 docs 
to replicate.

16 documents end up on the relica after the replication - 110 short.

The leader is on gen 3, the replica on gen 1.

Perhaps a red herring, but in the many cases of this I've looked at, oddly, no 
buffered docs are ever replayed after that - though I have seen buffered docs 
replayed in those same runs when the replication did not fail. Weird 
observation.

Anyway, I need to turn on more replication level logging I think.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688440#comment-13688440
 ] 

Mark Miller commented on SOLR-4926:
---

Reviewing some more sample fails of RecoveryZkTest:

It actually looks like after the replication we end up with one commit point 
back - eg we are trying to replicate gen 3 and replica moves from gen 1 to gen 
2.

- Mark

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688522#comment-13688522
 ] 

Mark Miller commented on SOLR-4926:
---

In the case where the slave is on gen 2, it did just download the files for gen 
3 - so it seems we are not picking up the latest commit point somehow..

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688807#comment-13688807
 ] 

Mark Miller commented on SOLR-4926:
---

I've been focusing on the RecoveryZkTest case.

Every fail I've looked at has used the RAM dir. Odd because the safe leader 
test that fails is hard coded to not use ramdir I think. RecoveryZkTest also 
uses mock dir, but I don't think the safe leader test does because of the hard 
coding to standard dir.

Anyway, more on what I'm seeing from the RecoveryZkTest fails:

we replicate gen 3 files, we reopen the writer and then the searcher using that 
writer - we get an index of gen 2 - the files from the searcher's directory 
don't contain the newly replicated files, just the gen 2 index files.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686775#comment-13686775
 ] 

Mark Miller commented on SOLR-4926:
---

It looks like replication thinks it's successful, then buffered replays are 
done - but only the buffered replays work - the replication is adding no docs. 
Somehow I think the compound file format stuff affected this, but no clue how 
yet.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686814#comment-13686814
 ] 

Mark Miller commented on SOLR-4926:
---

Hmm...that may have just been what that one case looked like - looking at 
another case now that may not match. More digging...

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686821#comment-13686821
 ] 

Yonik Seeley commented on SOLR-4926:


In some of the fails, I'm seeing some errors of this form:
{code}
  2 Caused by: org.apache.solr.common.SolrException: Error opening Reader
  2at 
org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:174)
  2at 
org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:185)
  2at 
org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:181)
  2at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1487)
  2... 15 more
  2 Caused by: java.lang.AssertionError: liveDocs.count()=4 info.docCount=6 
info.getDelCount()=6
  2at 
org.apache.lucene.codecs.lucene40.Lucene40LiveDocsFormat.readLiveDocs(Lucene40LiveDocsFormat.java:92)
{code}

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-13 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13683037#comment-13683037
 ] 

Mark Miller commented on SOLR-4926:
---

I've got some theories, but nothing hard. I do think this was caused very 
recently. I'm going to use fullmetaljenkins tomorrow to track down the issue.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org