Use shards.tolerant=true to return documents that are available in the shards that are still alive.
Typically people setup ZooKeeper outside of Solr so that solr nodes can be added/removed easily independent of ZooKeeper plus it isolates ZK from large GC pauses due to Solr's garbage. See http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7 Depending on you use-case, 2-3 replicas might be okay. We don't have enough information to answer that question. On Sat, Jun 22, 2013 at 10:40 PM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote: > Thanks Anshum. > > Sure, creating a replica will make it failure resistant, but death of one > shard should not make the whole cluster unusable. > > 1/3rd of the keys hosted in the killed shard should be unavailable but others > should be available. Right? > > Also, any suggestions on the recommended size of zk and solr cluster size and > configuration? > > Example: 3 shards with 3 replicas and 3 zk processes running on the same solr > mode sounds acceptable? (Total of 6 VMs) > > Thanks, > -Utkarsh > > On Jun 22, 2013, at 4:20 AM, Anshum Gupta <ans...@anshumgupta.net> wrote: > >> You need to have at least 1 replica from each shard for the SolrCloud setup >> to work for you. >> When you kill 1 shard, you essentially are taking away 1/3 of the range of >> shard key. >> >> >> On Sat, Jun 22, 2013 at 4:31 PM, Utkarsh Sengar <utkarsh2...@gmail.com>wrote: >> >>> Hello, >>> >>> I am testing a 3 node solrcloud cluster with 3 shards. 3 zk nodes are >>> running in a different process in the same machines. >>> >>> I wanted to know the recommended size of a solrcloud cluster (min zk >>> nodes?) >>> >>> This is the SolrCloud dump: https://gist.github.com/utkarsh2012/5840455 >>> >>> And, I am not sure if I am hitting this frustrating bug or this is just a >>> configuration error from my side. When I kill any *one* of the nodes, the >>> whole cluster stops responding and I get this request when I query any one >>> of the two alive nodes. >>> >>> { >>> "responseHeader":{ >>> "status":503, >>> "QTime":2, >>> "params":{ >>> "indent":"true", >>> "q":"*:*", >>> "wt":"json"}}, >>> "error":{ >>> "msg":"no servers hosting shard: ", >>> "code":503}} >>> >>> >>> >>> I see this exception: >>> 952399 [qtp516992923-74] ERROR org.apache.solr.core.SolrCore – >>> org.apache.solr.common.SolrException: no servers hosting shard: >>> at >>> >>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149) >>> at >>> >>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119) >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >>> at java.lang.Thread.run(Thread.java:662) >>> >>> >>> -- >>> Thanks, >>> -Utkarsh >> >> >> >> -- >> >> Anshum Gupta >> http://www.anshumgupta.net -- Regards, Shalin Shekhar Mangar.