Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?

2013-12-20 Thread cwhi
Thanks for your reply Anshum.  I took a look at clusterstate.json, and it
seems they are stuck in construction while the others are still active. 
I'm able to query my index again (that seems to have been an unrelated
issue), but I'd still like to remove these stuck shards and recreate them
(or fix the existing ones).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297p4107620.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?

2013-12-20 Thread cwhi
My apologies, I forgot to paste the output of clusterstate.json to my last
post.  Here it is:

[zk: localhost:2181(CONNECTED) 1] get /clusterstate.json
{collection1:{
shards:{
  shard1:{
range:8000-d554,
state:active,
replicas:{10.0.0.229:8443_solr_collection1:{
state:active,
base_url:http://10.0.0.229:8443/solr;,
core:collection1,
node_name:10.0.0.229:8443_solr,
leader:true}}},
  shard2:{
range:d555-2aa9,
state:active,
replicas:{10.0.0.5:8443_solr_collection1:{
state:active,
base_url:http://10.0.0.5:8443/solr;,
core:collection1,
node_name:10.0.0.5:8443_solr,
leader:true}}},
  shard3:{
range:2aaa-7fff,
state:active,
replicas:{10.0.0.246:8443_solr_collection1:{
state:active,
base_url:http://10.0.0.246:8443/solr;,
core:collection1,
node_name:10.0.0.246:8443_solr,
leader:true}}},
  shard1_0:{
range:8000-aaa9,
state:construction,
parent:shard1,
replicas:{10.0.0.229:8443_solr_collection1_shard1_0_replica1:{
state:down,
base_url:http://10.0.0.229:8443/solr;,
core:collection1_shard1_0_replica1,
node_name:10.0.0.229:8443_solr}}},
  shard1_1:{
range:-d554,
state:construction,
parent:shard1,
replicas:{10.0.0.229:8443_solr_collection1_shard1_1_replica1:{
state:down,
base_url:http://10.0.0.229:8443/solr;,
core:collection1_shard1_1_replica1,
node_name:10.0.0.229:8443_solr}}},
  shard2_0:{
range:d555-fffe,
state:construction,
parent:shard2,
replicas:{10.0.0.5:8443_solr_collection1_shard2_0_replica1:{
state:down,
base_url:http://10.0.0.5:8443/solr;,
core:collection1_shard2_0_replica1,
node_name:10.0.0.5:8443_solr,
leader:true}}},
  shard2_1:{
range:-2aa9,
state:construction,
parent:shard2,
replicas:{10.0.0.5:8443_solr_collection1_shard2_1_replica1:{
state:down,
base_url:http://10.0.0.5:8443/solr;,
core:collection1_shard2_1_replica1,
node_name:10.0.0.5:8443_solr,
leader:true,
maxShardsPerNode:1,
router:{name:compositeId},
replicationFactor:1,
autoCreated:true}}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297p4107622.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?

2013-12-20 Thread Anshum Gupta
Looking at this, it doesn't look like the operation completed. Also, the
parent shard seems to be intact and ideally should have served the results.
Until splitting and replication completes, the sub-shards don't go active
(and the parent shard doesn't go inactive).

Can you give me more information on this? What version of Solr are you
using?
Also, exceptions/messages from the logs would be required to get more
context.


On Fri, Dec 20, 2013 at 7:58 AM, cwhi chris.whi...@gmail.com wrote:

 My apologies, I forgot to paste the output of clusterstate.json to my last
 post.  Here it is:

 [zk: localhost:2181(CONNECTED) 1] get /clusterstate.json
 {collection1:{
 shards:{
   shard1:{
 range:8000-d554,
 state:active,
 replicas:{10.0.0.229:8443_solr_collection1:{
 state:active,
 base_url:http://10.0.0.229:8443/solr;,
 core:collection1,
 node_name:10.0.0.229:8443_solr,
 leader:true}}},
   shard2:{
 range:d555-2aa9,
 state:active,
 replicas:{10.0.0.5:8443_solr_collection1:{
 state:active,
 base_url:http://10.0.0.5:8443/solr;,
 core:collection1,
 node_name:10.0.0.5:8443_solr,
 leader:true}}},
   shard3:{
 range:2aaa-7fff,
 state:active,
 replicas:{10.0.0.246:8443_solr_collection1:{
 state:active,
 base_url:http://10.0.0.246:8443/solr;,
 core:collection1,
 node_name:10.0.0.246:8443_solr,
 leader:true}}},
   shard1_0:{
 range:8000-aaa9,
 state:construction,
 parent:shard1,
 replicas:{10.0.0.229:8443_solr_collection1_shard1_0_replica1:{
 state:down,
 base_url:http://10.0.0.229:8443/solr;,
 core:collection1_shard1_0_replica1,
 node_name:10.0.0.229:8443_solr}}},
   shard1_1:{
 range:-d554,
 state:construction,
 parent:shard1,
 replicas:{10.0.0.229:8443_solr_collection1_shard1_1_replica1:{
 state:down,
 base_url:http://10.0.0.229:8443/solr;,
 core:collection1_shard1_1_replica1,
 node_name:10.0.0.229:8443_solr}}},
   shard2_0:{
 range:d555-fffe,
 state:construction,
 parent:shard2,
 replicas:{10.0.0.5:8443_solr_collection1_shard2_0_replica1:{
 state:down,
 base_url:http://10.0.0.5:8443/solr;,
 core:collection1_shard2_0_replica1,
 node_name:10.0.0.5:8443_solr,
 leader:true}}},
   shard2_1:{
 range:-2aa9,
 state:construction,
 parent:shard2,
 replicas:{10.0.0.5:8443_solr_collection1_shard2_1_replica1:{
 state:down,
 base_url:http://10.0.0.5:8443/solr;,
 core:collection1_shard2_1_replica1,
 node_name:10.0.0.5:8443_solr,
 leader:true,
 maxShardsPerNode:1,
 router:{name:compositeId},
 replicationFactor:1,
 autoCreated:true}}



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297p4107622.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

Anshum Gupta
http://www.anshumgupta.net


Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?

2013-12-20 Thread cwhi
Thanks again for your replies.  I'm using Solr 4.6.  I just tried splitting
another shard so I could grab the exceptions from the logs, and  here is the
log output. http://pastebin.com/7uC5PQsa

I  noticed a few obvious exceptions that might have caused this to fail,
such as this:

ERROR - 2013-12-20 20:18:24.231; org.apache.solr.core.CoreContainer; Unable
to create core: collection1_shard3_1_replica1
java.lang.RuntimeException: java.io.IOException: Error opening
/configs/config1/stopwords.txt
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:169)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:254)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:590)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:498)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Error opening /configs/config1/stopwords.txt
at
org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:83)
at
org.apache.lucene.analysis.util.AbstractAnalysisFactory.getLines(AbstractAnalysisFactory.java:255)
at
org.apache.lucene.analysis.util.AbstractAnalysisFactory.getWordSet(AbstractAnalysisFactory.java:243)
at
org.apache.lucene.analysis.core.StopFilterFactory.inform(StopFilterFactory.java:99)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:655)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:167)
... 35 more


That exception claims that it can't read stopwords.txt, but the file is
definitely present locally at solr/conf/stopwords.txt, and it's present in
zookeeper at /configs/config1/stopwords.txt (I just checked with zkCli.cmd).




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297p4107668.html
Sent from the Solr - User mailing list archive at Nabble.com.


Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?

2013-12-18 Thread cwhi
I called SPLITSHARD on a shard in an existing SolrCloud instance, where the
shard had ~1 million documents in it.  It's been about 3 hours since that
splitting has completed, and the subshards are still stuck in a Down
state.  They are reported as down in localhost/solr/#/~cloud, and I'm unable
to query my index.

How can we recover from a failed SPLITSHARD operation?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?

2013-12-18 Thread Anshum Gupta
Hi,

Is the parent shard currently active? What does the clusterstate.json say?
The subshard could be stuck in down when it's trying to recover but as far
as I remember, the sub-shards only get marked active (and the parent goes
inactive) once the recovery and replication (for as many replicas as the
parent shard) are completed.


On Wed, Dec 18, 2013 at 10:01 AM, cwhi chris.whi...@gmail.com wrote:

 I called SPLITSHARD on a shard in an existing SolrCloud instance, where the
 shard had ~1 million documents in it.  It's been about 3 hours since that
 splitting has completed, and the subshards are still stuck in a Down
 state.  They are reported as down in localhost/solr/#/~cloud, and I'm
 unable
 to query my index.

 How can we recover from a failed SPLITSHARD operation?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

Anshum Gupta
http://www.anshumgupta.net