Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?
Thanks for your reply Anshum. I took a look at clusterstate.json, and it seems they are stuck in construction while the others are still active. I'm able to query my index again (that seems to have been an unrelated issue), but I'd still like to remove these stuck shards and recreate them (or fix the existing ones). -- View this message in context: http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297p4107620.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?
My apologies, I forgot to paste the output of clusterstate.json to my last post. Here it is: [zk: localhost:2181(CONNECTED) 1] get /clusterstate.json {collection1:{ shards:{ shard1:{ range:8000-d554, state:active, replicas:{10.0.0.229:8443_solr_collection1:{ state:active, base_url:http://10.0.0.229:8443/solr;, core:collection1, node_name:10.0.0.229:8443_solr, leader:true}}}, shard2:{ range:d555-2aa9, state:active, replicas:{10.0.0.5:8443_solr_collection1:{ state:active, base_url:http://10.0.0.5:8443/solr;, core:collection1, node_name:10.0.0.5:8443_solr, leader:true}}}, shard3:{ range:2aaa-7fff, state:active, replicas:{10.0.0.246:8443_solr_collection1:{ state:active, base_url:http://10.0.0.246:8443/solr;, core:collection1, node_name:10.0.0.246:8443_solr, leader:true}}}, shard1_0:{ range:8000-aaa9, state:construction, parent:shard1, replicas:{10.0.0.229:8443_solr_collection1_shard1_0_replica1:{ state:down, base_url:http://10.0.0.229:8443/solr;, core:collection1_shard1_0_replica1, node_name:10.0.0.229:8443_solr}}}, shard1_1:{ range:-d554, state:construction, parent:shard1, replicas:{10.0.0.229:8443_solr_collection1_shard1_1_replica1:{ state:down, base_url:http://10.0.0.229:8443/solr;, core:collection1_shard1_1_replica1, node_name:10.0.0.229:8443_solr}}}, shard2_0:{ range:d555-fffe, state:construction, parent:shard2, replicas:{10.0.0.5:8443_solr_collection1_shard2_0_replica1:{ state:down, base_url:http://10.0.0.5:8443/solr;, core:collection1_shard2_0_replica1, node_name:10.0.0.5:8443_solr, leader:true}}}, shard2_1:{ range:-2aa9, state:construction, parent:shard2, replicas:{10.0.0.5:8443_solr_collection1_shard2_1_replica1:{ state:down, base_url:http://10.0.0.5:8443/solr;, core:collection1_shard2_1_replica1, node_name:10.0.0.5:8443_solr, leader:true, maxShardsPerNode:1, router:{name:compositeId}, replicationFactor:1, autoCreated:true}} -- View this message in context: http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297p4107622.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?
Looking at this, it doesn't look like the operation completed. Also, the parent shard seems to be intact and ideally should have served the results. Until splitting and replication completes, the sub-shards don't go active (and the parent shard doesn't go inactive). Can you give me more information on this? What version of Solr are you using? Also, exceptions/messages from the logs would be required to get more context. On Fri, Dec 20, 2013 at 7:58 AM, cwhi chris.whi...@gmail.com wrote: My apologies, I forgot to paste the output of clusterstate.json to my last post. Here it is: [zk: localhost:2181(CONNECTED) 1] get /clusterstate.json {collection1:{ shards:{ shard1:{ range:8000-d554, state:active, replicas:{10.0.0.229:8443_solr_collection1:{ state:active, base_url:http://10.0.0.229:8443/solr;, core:collection1, node_name:10.0.0.229:8443_solr, leader:true}}}, shard2:{ range:d555-2aa9, state:active, replicas:{10.0.0.5:8443_solr_collection1:{ state:active, base_url:http://10.0.0.5:8443/solr;, core:collection1, node_name:10.0.0.5:8443_solr, leader:true}}}, shard3:{ range:2aaa-7fff, state:active, replicas:{10.0.0.246:8443_solr_collection1:{ state:active, base_url:http://10.0.0.246:8443/solr;, core:collection1, node_name:10.0.0.246:8443_solr, leader:true}}}, shard1_0:{ range:8000-aaa9, state:construction, parent:shard1, replicas:{10.0.0.229:8443_solr_collection1_shard1_0_replica1:{ state:down, base_url:http://10.0.0.229:8443/solr;, core:collection1_shard1_0_replica1, node_name:10.0.0.229:8443_solr}}}, shard1_1:{ range:-d554, state:construction, parent:shard1, replicas:{10.0.0.229:8443_solr_collection1_shard1_1_replica1:{ state:down, base_url:http://10.0.0.229:8443/solr;, core:collection1_shard1_1_replica1, node_name:10.0.0.229:8443_solr}}}, shard2_0:{ range:d555-fffe, state:construction, parent:shard2, replicas:{10.0.0.5:8443_solr_collection1_shard2_0_replica1:{ state:down, base_url:http://10.0.0.5:8443/solr;, core:collection1_shard2_0_replica1, node_name:10.0.0.5:8443_solr, leader:true}}}, shard2_1:{ range:-2aa9, state:construction, parent:shard2, replicas:{10.0.0.5:8443_solr_collection1_shard2_1_replica1:{ state:down, base_url:http://10.0.0.5:8443/solr;, core:collection1_shard2_1_replica1, node_name:10.0.0.5:8443_solr, leader:true, maxShardsPerNode:1, router:{name:compositeId}, replicationFactor:1, autoCreated:true}} -- View this message in context: http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297p4107622.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta http://www.anshumgupta.net
Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?
Thanks again for your replies. I'm using Solr 4.6. I just tried splitting another shard so I could grab the exceptions from the logs, and here is the log output. http://pastebin.com/7uC5PQsa I noticed a few obvious exceptions that might have caused this to fail, such as this: ERROR - 2013-12-20 20:18:24.231; org.apache.solr.core.CoreContainer; Unable to create core: collection1_shard3_1_replica1 java.lang.RuntimeException: java.io.IOException: Error opening /configs/config1/stopwords.txt at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:254) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:590) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:498) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Error opening /configs/config1/stopwords.txt at org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:83) at org.apache.lucene.analysis.util.AbstractAnalysisFactory.getLines(AbstractAnalysisFactory.java:255) at org.apache.lucene.analysis.util.AbstractAnalysisFactory.getWordSet(AbstractAnalysisFactory.java:243) at org.apache.lucene.analysis.core.StopFilterFactory.inform(StopFilterFactory.java:99) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:655) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:167) ... 35 more That exception claims that it can't read stopwords.txt, but the file is definitely present locally at solr/conf/stopwords.txt, and it's present in zookeeper at /configs/config1/stopwords.txt (I just checked with zkCli.cmd). -- View this message in context: http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297p4107668.html Sent from the Solr - User mailing list archive at Nabble.com.
Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?
I called SPLITSHARD on a shard in an existing SolrCloud instance, where the shard had ~1 million documents in it. It's been about 3 hours since that splitting has completed, and the subshards are still stuck in a Down state. They are reported as down in localhost/solr/#/~cloud, and I'm unable to query my index. How can we recover from a failed SPLITSHARD operation? -- View this message in context: http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?
Hi, Is the parent shard currently active? What does the clusterstate.json say? The subshard could be stuck in down when it's trying to recover but as far as I remember, the sub-shards only get marked active (and the parent goes inactive) once the recovery and replication (for as many replicas as the parent shard) are completed. On Wed, Dec 18, 2013 at 10:01 AM, cwhi chris.whi...@gmail.com wrote: I called SPLITSHARD on a shard in an existing SolrCloud instance, where the shard had ~1 million documents in it. It's been about 3 hours since that splitting has completed, and the subshards are still stuck in a Down state. They are reported as down in localhost/solr/#/~cloud, and I'm unable to query my index. How can we recover from a failed SPLITSHARD operation? -- View this message in context: http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta http://www.anshumgupta.net