[jira] [Commented] (SOLR-3561) Error during deletion of shard/core

2016-10-04 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546190#comment-15546190
 ] 

Per Steffensen commented on SOLR-3561:
--

I originally created the ticket. I am not against closing it. I do not know if 
the problem still exists (in some shape), but a lot of things has changed 
since, so someone will have to bring up the problem again if it is still a 
problem.

> Error during deletion of shard/core
> ---
>
> Key: SOLR-3561
> URL: https://issues.apache.org/jira/browse/SOLR-3561
> Project: Solr
>  Issue Type: Bug
>  Components: multicore, replication (java), SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Solr trunk (4.0-SNAPSHOT) from 29/2-2012
>Reporter: Per Steffensen
>Assignee: Mark Miller
> Fix For: 4.9, 6.0
>
>
> Running several Solr servers in Cloud-cluster (zkHost set on the Solr 
> servers).
> Several collections with several slices and one replica for each slice (each 
> slice has two shards)
> Basically we want let our system delete an entire collection. We do this by 
> trying to delete each and every shard under the collection. Each shard is 
> deleted one by one, by doing CoreAdmin-UNLOAD-requests against the relevant 
> Solr
> {code}
> CoreAdminRequest request = new CoreAdminRequest();
> request.setAction(CoreAdminAction.UNLOAD);
> request.setCoreName(shardName);
> CoreAdminResponse resp = request.process(new CommonsHttpSolrServer(solrUrl));
> {code}
> The delete/unload succeeds, but in like 10% of the cases we get errors on 
> involved Solr servers, right around the time where shard/cores are deleted, 
> and we end up in a situation where ZK still claims (forever) that the deleted 
> shard is still present and active.
> Form here the issue is easilier explained by a more concrete example:
> - 7 Solr servers involved
> - Several collection a.o. one called "collection_2012_04", consisting of 28 
> slices, 56 shards (remember 1 replica for each slice) named 
> "collection_2012_04_sliceX_shardY" for all pairs in {X:1..28}x{Y:1,2}
> - Each Solr server running 8 shards, e.g Solr server #1 is running shard 
> "collection_2012_04_slice1_shard1" and Solr server #7 is running shard 
> "collection_2012_04_slice1_shard2" belonging to the same slice "slice1".
> When we decide to delete the collection "collection_2012_04" we go through 
> all 56 shards and delete/unload them one-by-one - including 
> "collection_2012_04_slice1_shard1" and "collection_2012_04_slice1_shard2". At 
> some point during or shortly after all this deletion we see the following 
> exceptions in solr.log on Solr server #7
> {code}
> Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
> core not found:collection_2012_04_slice1_shard1
> request: 
> http://solr_server_1:8983/solr/admin/cores?action=PREPRECOVERY=collection_2012_04_slice1_shard1=solr_server_7%3A8983_solr=solr_server_7%3A8983_solr_collection_2012_04_slice1_shard2=recovering=true=6000=javabin=2
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:29)
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
> at 
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
> at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
> Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
> SEVERE: Recovery failed - trying again...
> Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
> WARNING:
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
> at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
> at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
> Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
> {code}
> Im 

[jira] [Commented] (SOLR-3561) Error during deletion of shard/core

2016-10-04 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15544742#comment-15544742
 ] 

Cao Manh Dat commented on SOLR-3561:


[~arafalov] Based on comments of [~markrmil...@gmail.com] I think we can close 
this ticket now.

> Error during deletion of shard/core
> ---
>
> Key: SOLR-3561
> URL: https://issues.apache.org/jira/browse/SOLR-3561
> Project: Solr
>  Issue Type: Bug
>  Components: multicore, replication (java), SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Solr trunk (4.0-SNAPSHOT) from 29/2-2012
>Reporter: Per Steffensen
>Assignee: Mark Miller
> Fix For: 4.9, 6.0
>
>
> Running several Solr servers in Cloud-cluster (zkHost set on the Solr 
> servers).
> Several collections with several slices and one replica for each slice (each 
> slice has two shards)
> Basically we want let our system delete an entire collection. We do this by 
> trying to delete each and every shard under the collection. Each shard is 
> deleted one by one, by doing CoreAdmin-UNLOAD-requests against the relevant 
> Solr
> {code}
> CoreAdminRequest request = new CoreAdminRequest();
> request.setAction(CoreAdminAction.UNLOAD);
> request.setCoreName(shardName);
> CoreAdminResponse resp = request.process(new CommonsHttpSolrServer(solrUrl));
> {code}
> The delete/unload succeeds, but in like 10% of the cases we get errors on 
> involved Solr servers, right around the time where shard/cores are deleted, 
> and we end up in a situation where ZK still claims (forever) that the deleted 
> shard is still present and active.
> Form here the issue is easilier explained by a more concrete example:
> - 7 Solr servers involved
> - Several collection a.o. one called "collection_2012_04", consisting of 28 
> slices, 56 shards (remember 1 replica for each slice) named 
> "collection_2012_04_sliceX_shardY" for all pairs in {X:1..28}x{Y:1,2}
> - Each Solr server running 8 shards, e.g Solr server #1 is running shard 
> "collection_2012_04_slice1_shard1" and Solr server #7 is running shard 
> "collection_2012_04_slice1_shard2" belonging to the same slice "slice1".
> When we decide to delete the collection "collection_2012_04" we go through 
> all 56 shards and delete/unload them one-by-one - including 
> "collection_2012_04_slice1_shard1" and "collection_2012_04_slice1_shard2". At 
> some point during or shortly after all this deletion we see the following 
> exceptions in solr.log on Solr server #7
> {code}
> Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
> core not found:collection_2012_04_slice1_shard1
> request: 
> http://solr_server_1:8983/solr/admin/cores?action=PREPRECOVERY=collection_2012_04_slice1_shard1=solr_server_7%3A8983_solr=solr_server_7%3A8983_solr_collection_2012_04_slice1_shard2=recovering=true=6000=javabin=2
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:29)
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
> at 
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
> at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
> Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
> SEVERE: Recovery failed - trying again...
> Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
> WARNING:
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
> at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
> at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
> Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
> {code}
> Im not sure exactly how to interpret this, but it seems to me that some 
> recovery job tries to recover collection_2012_04_slice1_shard2 on Solr server 

[jira] [Commented] (SOLR-3561) Error during deletion of shard/core

2016-09-29 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534920#comment-15534920
 ] 

Cao Manh Dat commented on SOLR-3561:


So can this issue be closed?

> Error during deletion of shard/core
> ---
>
> Key: SOLR-3561
> URL: https://issues.apache.org/jira/browse/SOLR-3561
> Project: Solr
>  Issue Type: Bug
>  Components: multicore, replication (java), SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Solr trunk (4.0-SNAPSHOT) from 29/2-2012
>Reporter: Per Steffensen
>Assignee: Mark Miller
> Fix For: 4.9, 6.0
>
>
> Running several Solr servers in Cloud-cluster (zkHost set on the Solr 
> servers).
> Several collections with several slices and one replica for each slice (each 
> slice has two shards)
> Basically we want let our system delete an entire collection. We do this by 
> trying to delete each and every shard under the collection. Each shard is 
> deleted one by one, by doing CoreAdmin-UNLOAD-requests against the relevant 
> Solr
> {code}
> CoreAdminRequest request = new CoreAdminRequest();
> request.setAction(CoreAdminAction.UNLOAD);
> request.setCoreName(shardName);
> CoreAdminResponse resp = request.process(new CommonsHttpSolrServer(solrUrl));
> {code}
> The delete/unload succeeds, but in like 10% of the cases we get errors on 
> involved Solr servers, right around the time where shard/cores are deleted, 
> and we end up in a situation where ZK still claims (forever) that the deleted 
> shard is still present and active.
> Form here the issue is easilier explained by a more concrete example:
> - 7 Solr servers involved
> - Several collection a.o. one called "collection_2012_04", consisting of 28 
> slices, 56 shards (remember 1 replica for each slice) named 
> "collection_2012_04_sliceX_shardY" for all pairs in {X:1..28}x{Y:1,2}
> - Each Solr server running 8 shards, e.g Solr server #1 is running shard 
> "collection_2012_04_slice1_shard1" and Solr server #7 is running shard 
> "collection_2012_04_slice1_shard2" belonging to the same slice "slice1".
> When we decide to delete the collection "collection_2012_04" we go through 
> all 56 shards and delete/unload them one-by-one - including 
> "collection_2012_04_slice1_shard1" and "collection_2012_04_slice1_shard2". At 
> some point during or shortly after all this deletion we see the following 
> exceptions in solr.log on Solr server #7
> {code}
> Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
> core not found:collection_2012_04_slice1_shard1
> request: 
> http://solr_server_1:8983/solr/admin/cores?action=PREPRECOVERY=collection_2012_04_slice1_shard1=solr_server_7%3A8983_solr=solr_server_7%3A8983_solr_collection_2012_04_slice1_shard2=recovering=true=6000=javabin=2
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:29)
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
> at 
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
> at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
> Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
> SEVERE: Recovery failed - trying again...
> Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
> WARNING:
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
> at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
> at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
> Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
> {code}
> Im not sure exactly how to interpret this, but it seems to me that some 
> recovery job tries to recover collection_2012_04_slice1_shard2 on Solr server 
> #7 from collection_2012_04_slice1_shard1 on Solr server #1, but 

[jira] [Commented] (SOLR-3561) Error during deletion of shard/core

2012-10-25 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484629#comment-13484629
 ] 

Mark Miller commented on SOLR-3561:
---

It's very likely this could have been SOLR-3939.

 Error during deletion of shard/core
 ---

 Key: SOLR-3561
 URL: https://issues.apache.org/jira/browse/SOLR-3561
 Project: Solr
  Issue Type: Bug
  Components: multicore, replication (java), SolrCloud
Affects Versions: 4.0-ALPHA
 Environment: Solr trunk (4.0-SNAPSHOT) from 29/2-2012
Reporter: Per Steffensen
Assignee: Mark Miller
 Fix For: 4.1, 5.0


 Running several Solr servers in Cloud-cluster (zkHost set on the Solr 
 servers).
 Several collections with several slices and one replica for each slice (each 
 slice has two shards)
 Basically we want let our system delete an entire collection. We do this by 
 trying to delete each and every shard under the collection. Each shard is 
 deleted one by one, by doing CoreAdmin-UNLOAD-requests against the relevant 
 Solr
 {code}
 CoreAdminRequest request = new CoreAdminRequest();
 request.setAction(CoreAdminAction.UNLOAD);
 request.setCoreName(shardName);
 CoreAdminResponse resp = request.process(new CommonsHttpSolrServer(solrUrl));
 {code}
 The delete/unload succeeds, but in like 10% of the cases we get errors on 
 involved Solr servers, right around the time where shard/cores are deleted, 
 and we end up in a situation where ZK still claims (forever) that the deleted 
 shard is still present and active.
 Form here the issue is easilier explained by a more concrete example:
 - 7 Solr servers involved
 - Several collection a.o. one called collection_2012_04, consisting of 28 
 slices, 56 shards (remember 1 replica for each slice) named 
 collection_2012_04_sliceX_shardY for all pairs in {X:1..28}x{Y:1,2}
 - Each Solr server running 8 shards, e.g Solr server #1 is running shard 
 collection_2012_04_slice1_shard1 and Solr server #7 is running shard 
 collection_2012_04_slice1_shard2 belonging to the same slice slice1.
 When we decide to delete the collection collection_2012_04 we go through 
 all 56 shards and delete/unload them one-by-one - including 
 collection_2012_04_slice1_shard1 and collection_2012_04_slice1_shard2. At 
 some point during or shortly after all this deletion we see the following 
 exceptions in solr.log on Solr server #7
 {code}
 Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
 core not found:collection_2012_04_slice1_shard1
 request: 
 http://solr_server_1:8983/solr/admin/cores?action=PREPRECOVERYcore=collection_2012_04_slice1_shard1nodeName=solr_server_7%3A8983_solrcoreNodeName=solr_server_7%3A8983_solr_collection_2012_04_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:29)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
 at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
 SEVERE: Recovery failed - trying again...
 Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
 WARNING:
 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
 at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
 at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
 Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
 {code}
 Im not sure exactly how to interpret this, but it seems to me that some 
 recovery job tries to recover collection_2012_04_slice1_shard2 on Solr server 
 #7 from collection_2012_04_slice1_shard1 on Solr server #1, 

[jira] [Commented] (SOLR-3561) Error during deletion of shard/core

2012-10-25 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484711#comment-13484711
 ] 

Mark Miller commented on SOLR-3561:
---

And/Or SOLR-3994

 Error during deletion of shard/core
 ---

 Key: SOLR-3561
 URL: https://issues.apache.org/jira/browse/SOLR-3561
 Project: Solr
  Issue Type: Bug
  Components: multicore, replication (java), SolrCloud
Affects Versions: 4.0-ALPHA
 Environment: Solr trunk (4.0-SNAPSHOT) from 29/2-2012
Reporter: Per Steffensen
Assignee: Mark Miller
 Fix For: 4.1, 5.0


 Running several Solr servers in Cloud-cluster (zkHost set on the Solr 
 servers).
 Several collections with several slices and one replica for each slice (each 
 slice has two shards)
 Basically we want let our system delete an entire collection. We do this by 
 trying to delete each and every shard under the collection. Each shard is 
 deleted one by one, by doing CoreAdmin-UNLOAD-requests against the relevant 
 Solr
 {code}
 CoreAdminRequest request = new CoreAdminRequest();
 request.setAction(CoreAdminAction.UNLOAD);
 request.setCoreName(shardName);
 CoreAdminResponse resp = request.process(new CommonsHttpSolrServer(solrUrl));
 {code}
 The delete/unload succeeds, but in like 10% of the cases we get errors on 
 involved Solr servers, right around the time where shard/cores are deleted, 
 and we end up in a situation where ZK still claims (forever) that the deleted 
 shard is still present and active.
 Form here the issue is easilier explained by a more concrete example:
 - 7 Solr servers involved
 - Several collection a.o. one called collection_2012_04, consisting of 28 
 slices, 56 shards (remember 1 replica for each slice) named 
 collection_2012_04_sliceX_shardY for all pairs in {X:1..28}x{Y:1,2}
 - Each Solr server running 8 shards, e.g Solr server #1 is running shard 
 collection_2012_04_slice1_shard1 and Solr server #7 is running shard 
 collection_2012_04_slice1_shard2 belonging to the same slice slice1.
 When we decide to delete the collection collection_2012_04 we go through 
 all 56 shards and delete/unload them one-by-one - including 
 collection_2012_04_slice1_shard1 and collection_2012_04_slice1_shard2. At 
 some point during or shortly after all this deletion we see the following 
 exceptions in solr.log on Solr server #7
 {code}
 Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
 core not found:collection_2012_04_slice1_shard1
 request: 
 http://solr_server_1:8983/solr/admin/cores?action=PREPRECOVERYcore=collection_2012_04_slice1_shard1nodeName=solr_server_7%3A8983_solrcoreNodeName=solr_server_7%3A8983_solr_collection_2012_04_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:29)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
 at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
 SEVERE: Recovery failed - trying again...
 Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
 WARNING:
 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
 at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
 at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
 Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
 {code}
 Im not sure exactly how to interpret this, but it seems to me that some 
 recovery job tries to recover collection_2012_04_slice1_shard2 on Solr server 
 #7 from collection_2012_04_slice1_shard1 on Solr server #1, but fail because 
 Solr server #1 

[jira] [Commented] (SOLR-3561) Error during deletion of shard/core

2012-10-25 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484712#comment-13484712
 ] 

Mark Miller commented on SOLR-3561:
---

{noformat}
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:571)
at java.util.ArrayList.get(ArrayList.java:349)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:95)
{noformat}

This is actually likely find and unrelated - it's something that can happen on 
shutdown and should not be a problem. I've updated it so that a more 
appropriate message is logged.

 Error during deletion of shard/core
 ---

 Key: SOLR-3561
 URL: https://issues.apache.org/jira/browse/SOLR-3561
 Project: Solr
  Issue Type: Bug
  Components: multicore, replication (java), SolrCloud
Affects Versions: 4.0-ALPHA
 Environment: Solr trunk (4.0-SNAPSHOT) from 29/2-2012
Reporter: Per Steffensen
Assignee: Mark Miller
 Fix For: 4.1, 5.0


 Running several Solr servers in Cloud-cluster (zkHost set on the Solr 
 servers).
 Several collections with several slices and one replica for each slice (each 
 slice has two shards)
 Basically we want let our system delete an entire collection. We do this by 
 trying to delete each and every shard under the collection. Each shard is 
 deleted one by one, by doing CoreAdmin-UNLOAD-requests against the relevant 
 Solr
 {code}
 CoreAdminRequest request = new CoreAdminRequest();
 request.setAction(CoreAdminAction.UNLOAD);
 request.setCoreName(shardName);
 CoreAdminResponse resp = request.process(new CommonsHttpSolrServer(solrUrl));
 {code}
 The delete/unload succeeds, but in like 10% of the cases we get errors on 
 involved Solr servers, right around the time where shard/cores are deleted, 
 and we end up in a situation where ZK still claims (forever) that the deleted 
 shard is still present and active.
 Form here the issue is easilier explained by a more concrete example:
 - 7 Solr servers involved
 - Several collection a.o. one called collection_2012_04, consisting of 28 
 slices, 56 shards (remember 1 replica for each slice) named 
 collection_2012_04_sliceX_shardY for all pairs in {X:1..28}x{Y:1,2}
 - Each Solr server running 8 shards, e.g Solr server #1 is running shard 
 collection_2012_04_slice1_shard1 and Solr server #7 is running shard 
 collection_2012_04_slice1_shard2 belonging to the same slice slice1.
 When we decide to delete the collection collection_2012_04 we go through 
 all 56 shards and delete/unload them one-by-one - including 
 collection_2012_04_slice1_shard1 and collection_2012_04_slice1_shard2. At 
 some point during or shortly after all this deletion we see the following 
 exceptions in solr.log on Solr server #7
 {code}
 Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
 core not found:collection_2012_04_slice1_shard1
 request: 
 http://solr_server_1:8983/solr/admin/cores?action=PREPRECOVERYcore=collection_2012_04_slice1_shard1nodeName=solr_server_7%3A8983_solrcoreNodeName=solr_server_7%3A8983_solr_collection_2012_04_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:29)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
 at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
 SEVERE: Recovery failed - trying again...
 Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
 WARNING:
 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
 at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
 at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121)
 at 
 

[jira] [Commented] (SOLR-3561) Error during deletion of shard/core

2012-10-11 Thread Rob Speer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474421#comment-13474421
 ] 

Rob Speer commented on SOLR-3561:
-

I'm still seeing this error, consistently.

I'm currently running two Solr processes on one machine to test sharding. If I 
ever delete all the cores of a collection (and even if I explicitly delete the 
collection using the cloud admin), it shows an error like this first:

{noformat}
INFO: Unregistering core testdb-shard6-rep2 from cloudstate.
Oct 11, 2012 2:42:11 PM org.apache.solr.core.SolrCore close
INFO: [testdb-shard6-rep2]  CLOSING SolrCore 
org.apache.solr.core.SolrCore@7a0ec60b
Oct 11, 2012 2:42:11 PM org.apache.solr.core.SolrCore closeSearcher
INFO: [testdb-shard6-rep2] Closing main searcher on request.
Oct 11, 2012 2:42:11 PM org.apache.solr.update.DirectUpdateHandler2 close
INFO: closing DirectUpdateHandler2{commits=14,autocommits=0,soft 
autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=44,cumulative_deletesById=0,cumulative_deletesByQuery=8,cumulative_errors=0}
Oct 11, 2012 2:42:11 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for core testdb-shard6-rep2 
zkNodeName=panama:8983_solr_testdb-shard6-rep2
Oct 11, 2012 2:42:11 PM org.apache.solr.cloud.LeaderElector$1 process
WARNING: 
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:571)
at java.util.ArrayList.get(ArrayList.java:349)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:95)
at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:125)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
{noformat}

Afterward it repeats this error over and over:

{noformat}
SEVERE: Error while trying to recover.  

 
java.lang.RuntimeException: No registered leader was found, 
collection:lumi-test_pipeline-test slice:shard2
at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:428)
at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:414)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:297)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:211)
Oct 11, 2012 11:30:11 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
SEVERE: Recovery failed - trying again...
Oct 11, 2012 11:30:11 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
{noformat}

 Error during deletion of shard/core
 ---

 Key: SOLR-3561
 URL: https://issues.apache.org/jira/browse/SOLR-3561
 Project: Solr
  Issue Type: Bug
  Components: multicore, replication (java), SolrCloud
Affects Versions: 4.0-ALPHA
 Environment: Solr trunk (4.0-SNAPSHOT) from 29/2-2012
Reporter: Per Steffensen
Assignee: Mark Miller
 Fix For: 4.1, 5.0


 Running several Solr servers in Cloud-cluster (zkHost set on the Solr 
 servers).
 Several collections with several slices and one replica for each slice (each 
 slice has two shards)
 Basically we want let our system delete an entire collection. We do this by 
 trying to delete each and every shard under the collection. Each shard is 
 deleted one by one, by doing CoreAdmin-UNLOAD-requests against the relevant 
 Solr
 {code}
 CoreAdminRequest request = new CoreAdminRequest();
 request.setAction(CoreAdminAction.UNLOAD);
 request.setCoreName(shardName);
 CoreAdminResponse resp = request.process(new CommonsHttpSolrServer(solrUrl));
 {code}
 The delete/unload succeeds, but in like 10% of the cases we get errors on 
 involved Solr servers, right around the time where shard/cores are deleted, 
 and we end up in a situation where ZK still claims (forever) that the deleted 
 shard is still present and active.
 Form here the issue is easilier explained by a more concrete example:
 - 7 Solr servers involved
 - Several collection a.o. one called collection_2012_04, consisting of 28 
 slices, 56 shards (remember 1 replica for each slice) named 
 collection_2012_04_sliceX_shardY for all pairs in {X:1..28}x{Y:1,2}
 - Each Solr server running 8 shards, e.g Solr server #1 is running shard 
 collection_2012_04_slice1_shard1 and Solr server #7 is running shard 
 collection_2012_04_slice1_shard2 belonging to the same slice slice1.
 When we decide to delete the