[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699337#comment-13699337 ] Mark Miller commented on SOLR-4933: --- This seems to have solved things for me. org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Shalin Shekhar Mangar Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697545#comment-13697545 ] Shalin Shekhar Mangar commented on SOLR-4933: - I was wrong. The split is not retried automatically by the overseer because the exception from coreadmin is just added to the response and not really thrown in OverseerCollectionProcessor. I'll take a stab at fixing the test. org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Shalin Shekhar Mangar Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697769#comment-13697769 ] ASF subversion and git services commented on SOLR-4933: --- Commit 1498923 from sha...@apache.org [ https://svn.apache.org/r1498923 ] SOLR-4933: Retry splitshard three times before giving up org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Shalin Shekhar Mangar Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697780#comment-13697780 ] ASF subversion and git services commented on SOLR-4933: --- Commit 1498928 from sha...@apache.org [ https://svn.apache.org/r1498928 ] SOLR-4933: Retry splitshard three times before giving up org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Shalin Shekhar Mangar Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697354#comment-13697354 ] Mark Miller commented on SOLR-4933: --- This fails constantly on my jenkins and pretty often on the other jenkins machines - what change has to be made to the test? I can take a crack at it. bq. The split itself will be retried by the Overseer Collection Processor again but the test does not take that into account. It seems like the this is not just a test problem then - it's the call to submit the cmd through the collections api that returns a failure - if the overseer is going to retry, I don't think that command should return a fail. It should probably wait until the split is done (taking into account retries) and then return the result when it actually knows it. Otherwise it claims the call failed, but it may succeed on the retry. org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Shalin Shekhar Mangar Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697453#comment-13697453 ] ASF subversion and git services commented on SOLR-4933: --- Commit 1498763 from [~markrmil...@gmail.com] [ https://svn.apache.org/r1498763 ] SOLR-4933: if shard split fails with 500, wait a while to see if it succeeds on a retry org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Shalin Shekhar Mangar Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697456#comment-13697456 ] ASF subversion and git services commented on SOLR-4933: --- Commit 1498764 from [~markrmil...@gmail.com] [ https://svn.apache.org/r1498764 ] SOLR-4933: if shard split fails with 500, wait a while to see if it succeeds on a retry org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Shalin Shekhar Mangar Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697459#comment-13697459 ] Mark Miller commented on SOLR-4933: --- side note: just hit a rare fail here - like 1 out of 40: {noformat} java.lang.AssertionError: Wrong doc count on shard1_1 expected:85 but was:84 at __randomizedtesting.SeedInfo.seed([9515158A26713D9D:85D3018DA6DAAC74]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.solr.cloud.ShardSplitTest.checkDocCountsAndShardStates(ShardSplitTest.java:235) at org.apache.solr.cloud.ShardSplitTest.doTest(ShardSplitTest.java:163) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:835) {noformat} org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Shalin Shekhar Mangar Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686888#comment-13686888 ] Shalin Shekhar Mangar commented on SOLR-4933: - I marked SOLR-4929 as duplicate to have all comments in one issue. It only happens on slow machines I think. I have never been able to reproduce it on my box. If this happens in a real production environment then the leader may be on a different box so we'll need to go and create the sub shard cores again (on the new leader box) so failing the split is correct. The split itself will be retried by the Overseer Collection Processor again but the test does not take that into account. org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686892#comment-13686892 ] Mark Miller commented on SOLR-4933: --- bq. The split itself will be retried by the Overseer Collection Processor again but the test does not take that into account. Oh, okay - so the fix is really just fixing the test. Is it the same thing with the chaos monkey shard split test? org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686895#comment-13686895 ] Shalin Shekhar Mangar commented on SOLR-4933: - bq. Is it the same thing with the chaos monkey shard split test? Yes though there are other (separate) issues with the chaos monkey test. We need to start killing the overseer in there. org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686090#comment-13686090 ] Mark Miller commented on SOLR-4933: --- {noformat} 1 tests failed. REGRESSION: org.apache.solr.cloud.ShardSplitTest.testDistribSearch Error Message: Server at http://127.0.0.1:41393/fo/l returned non ok status:500, message:Server Error Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://127.0.0.1:41393/fo/l returned non ok status:500, message:Server Error at __randomizedtesting.SeedInfo.seed([B325BB39D1A2EAEE:32C33521A6FD8AD2]:0) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.ShardSplitTest.splitShard(ShardSplitTest.java:228) at org.apache.solr.cloud.ShardSplitTest.doTest(ShardSplitTest.java:150) {noformat} org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686097#comment-13686097 ] Mark Miller commented on SOLR-4933: --- On a first pass the 500 error *looks* like it's coming from... A sub shard that is created on the split command has just become a leader - it says it has no replicas during the sync phase. At around the same time, a request to wait on seeing a certain state fails because the node that it is made to complains it is not the leader. {noformat} oasc.OverseerCollectionProcessor.processResponse ERROR Error from shard: 127.0.0.1:41393/fo/l org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: We are not the leader {noformat} The wait that fails here seems to be: {noformat} // wait for parent leader to acknowledge the sub-shard core log.info(Asking parent leader to wait for: + subShardName + to be alive on: + nodeName); CoreAdminRequest.WaitForState cmd = new CoreAdminRequest.WaitForState(); cmd.setCoreName(subShardName); cmd.setNodeName(nodeName); cmd.setCoreNodeName(nodeName + _ + subShardName); cmd.setState(ZkStateReader.ACTIVE); cmd.setCheckLive(true); cmd.setOnlyIfLeader(true); sendShardRequest(nodeName, new ModifiableSolrParams(cmd.getParams())); {noformat} There a variety of reasons the leader might be briefly changing. There may be more to dig up here, but it looks like it also might be a good idea to be willing to retry this on this type of error. org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error. Key: SOLR-4933 URL: https://issues.apache.org/jira/browse/SOLR-4933 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org