[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-07-03 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699337#comment-13699337
 ] 

Mark Miller commented on SOLR-4933:
---

This seems to have solved things for me.

 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Shalin Shekhar Mangar
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-07-02 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697545#comment-13697545
 ] 

Shalin Shekhar Mangar commented on SOLR-4933:
-

I was wrong. The split is not retried automatically by the overseer because the 
exception from coreadmin is just added to the response and not really thrown in 
OverseerCollectionProcessor. I'll take a stab at fixing the test.

 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Shalin Shekhar Mangar
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-07-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697769#comment-13697769
 ] 

ASF subversion and git services commented on SOLR-4933:
---

Commit 1498923 from sha...@apache.org
[ https://svn.apache.org/r1498923 ]

SOLR-4933: Retry splitshard three times before giving up

 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Shalin Shekhar Mangar
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-07-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697780#comment-13697780
 ] 

ASF subversion and git services commented on SOLR-4933:
---

Commit 1498928 from sha...@apache.org
[ https://svn.apache.org/r1498928 ]

SOLR-4933: Retry splitshard three times before giving up

 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Shalin Shekhar Mangar
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-07-01 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697354#comment-13697354
 ] 

Mark Miller commented on SOLR-4933:
---

This fails constantly on my jenkins and pretty often on the other jenkins 
machines - what change has to be made to the test? I can take a crack at it.

bq. The split itself will be retried by the Overseer Collection Processor again 
but the test does not take that into account.

It seems like the this is not just a test problem then - it's the call to 
submit the cmd through the collections api that returns a failure - if the 
overseer is going to retry, I don't think that command should return a fail. It 
should probably wait until the split is done (taking into account retries) and 
then return the result when it actually knows it. Otherwise it claims the call 
failed, but it may succeed on the retry.

 

 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Shalin Shekhar Mangar
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-07-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697453#comment-13697453
 ] 

ASF subversion and git services commented on SOLR-4933:
---

Commit 1498763 from [~markrmil...@gmail.com]
[ https://svn.apache.org/r1498763 ]

SOLR-4933: if shard split fails with 500, wait a while to see if it succeeds on 
a retry

 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Shalin Shekhar Mangar
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-07-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697456#comment-13697456
 ] 

ASF subversion and git services commented on SOLR-4933:
---

Commit 1498764 from [~markrmil...@gmail.com]
[ https://svn.apache.org/r1498764 ]

SOLR-4933: if shard split fails with 500, wait a while to see if it succeeds on 
a retry

 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Shalin Shekhar Mangar
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-07-01 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697459#comment-13697459
 ] 

Mark Miller commented on SOLR-4933:
---

side note: just hit a rare fail here - like 1 out of 40:

{noformat}
java.lang.AssertionError: Wrong doc count on shard1_1 expected:85 but was:84
at 
__randomizedtesting.SeedInfo.seed([9515158A26713D9D:85D3018DA6DAAC74]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.solr.cloud.ShardSplitTest.checkDocCountsAndShardStates(ShardSplitTest.java:235)
at org.apache.solr.cloud.ShardSplitTest.doTest(ShardSplitTest.java:163)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:835)
{noformat}


 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Shalin Shekhar Mangar
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-06-18 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686888#comment-13686888
 ] 

Shalin Shekhar Mangar commented on SOLR-4933:
-

I marked SOLR-4929 as duplicate to have all comments in one issue.

It only happens on slow machines I think. I have never been able to reproduce 
it on my box.

If this happens in a real production environment then the leader may be on a 
different box so we'll need to go and create the sub shard cores again (on the 
new leader box) so failing the split is correct. The split itself will be 
retried by the Overseer Collection Processor again but the test does not take 
that into account.

 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-06-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686892#comment-13686892
 ] 

Mark Miller commented on SOLR-4933:
---

bq. The split itself will be retried by the Overseer Collection Processor again 
but the test does not take that into account.

Oh, okay - so the fix is really just fixing the test.

Is it the same thing with the chaos monkey shard split test?



 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-06-18 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686895#comment-13686895
 ] 

Shalin Shekhar Mangar commented on SOLR-4933:
-

bq. Is it the same thing with the chaos monkey shard split test?

Yes though there are other (separate) issues with the chaos monkey test. We 
need to start killing the overseer in there.

 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-06-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686090#comment-13686090
 ] 

Mark Miller commented on SOLR-4933:
---

{noformat}
1 tests failed.
REGRESSION:  org.apache.solr.cloud.ShardSplitTest.testDistribSearch

Error Message:
Server at http://127.0.0.1:41393/fo/l returned non ok status:500, 
message:Server Error

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at 
http://127.0.0.1:41393/fo/l returned non ok status:500, message:Server Error
at 
__randomizedtesting.SeedInfo.seed([B325BB39D1A2EAEE:32C33521A6FD8AD2]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
org.apache.solr.cloud.ShardSplitTest.splitShard(ShardSplitTest.java:228)
at org.apache.solr.cloud.ShardSplitTest.doTest(ShardSplitTest.java:150)
{noformat}

 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4933) org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 error.

2013-06-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686097#comment-13686097
 ] 

Mark Miller commented on SOLR-4933:
---

On a first pass the 500 error *looks* like it's coming from...

A sub shard that is created on the split command has just become a leader - it 
says it has no replicas during the sync phase.

At around the same time, a request to wait on seeing a certain state fails 
because the node that it is made to complains it is not the leader.

{noformat}
oasc.OverseerCollectionProcessor.processResponse ERROR Error from shard: 
127.0.0.1:41393/fo/l 
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: We are 
not the leader
{noformat}

The wait that fails here seems to be:
{noformat}
// wait for parent leader to acknowledge the sub-shard core
log.info(Asking parent leader to wait for:  + subShardName +  to be 
alive on:  + nodeName);
CoreAdminRequest.WaitForState cmd = new CoreAdminRequest.WaitForState();
cmd.setCoreName(subShardName);
cmd.setNodeName(nodeName);
cmd.setCoreNodeName(nodeName + _ + subShardName);
cmd.setState(ZkStateReader.ACTIVE);
cmd.setCheckLive(true);
cmd.setOnlyIfLeader(true);
sendShardRequest(nodeName, new ModifiableSolrParams(cmd.getParams()));
{noformat}

There a variety of reasons the leader might be briefly changing. There may be 
more to dig up here, but it looks like it also might be a good idea to be 
willing to retry this on this type of error.

 org.apache.solr.cloud.ShardSplitTest.testDistribSearch fails often with a 500 
 error.
 

 Key: SOLR-4933
 URL: https://issues.apache.org/jira/browse/SOLR-4933
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org