I'm getting the same error. I followed the SolrCloud examples and it didn't work.. here's basically what I've done:
EXPERIMENT 1: start shards and index documents, search for documents in all replicas # Starting Shards - Shard1 Leader (with zookeeper) java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2 -jar start.jar - Shard1 Replica (with zookeeper) java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar - Shard2 Leader (with zookeeper) java -Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar - Shard2 Replica java -Djetty.port=7500 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar clusterstate.json: http://dl.dropbox.com/u/7570330/clusterstate.txt # Indexing sample document java -jar post.jar hd.xml # search in all Shards: number of results found: 2 Note: all shards have the same result EXPERIMENT 2: Kill current Shard1 Leader, expect Shard1 Replica to become leader, search should still work and results return (is that right?) # Killing Shard2 Leader Shard2 Replica logs: ... Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor pRequest INFO: Processed session termination for sessionid: 0x3bbe3403c00001 Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor pRequest INFO: Got user-level KeeperException when processing sessionid:0x3bbe3403c00000 type:delete cxid:0x4dea zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/collections/collection1/leaders/shard1 Error:KeeperErrorCode = NoNode for /collections/collection1/leaders/shard1 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor pRequest INFO: Got user-level KeeperException when processing sessionid:0x3bbe3403c00000 type:create cxid:0x4dec zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = NodeExists for /overseer Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: Checking if I should try and be the leader. Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: My last published State was Active, it's okay to be the leader. Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: I may be the new leader - try and sync Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.SyncStrategy sync INFO: Sync replicas to http://Gustavos-MacBook-Pro.local:8900/solr/collection1/ Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=collection1 url=http://Gustavos-MacBook-Pro.local:8900/solr START replicas=[http://Gustavos-MacBook-Pro.local:8983/solr/collection1/] nUpdates=100 Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=collection1 url=http://Gustavos-MacBook-Pro.local:8900/solr DONE. We have no versions. sync failed. Dec 21, 2012 11:57:39 AM org.apache.solr.common.SolrException log SEVERE: Sync Failed Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext rejoinLeaderElection INFO: There is a better leader candidate than us - going back into recovery Dec 21, 2012 11:57:39 AM org.apache.solr.update.DefaultSolrCoreState doRecovery INFO: Running recovery - first canceling any ongoing recovery Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy run INFO: Starting recovery process. core=collection1 recoveringAfterStartup=false Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: Attempting to PeerSync from http://Gustavos-MacBook-Pro.local:8983/solr/collection1/ core=collection1 - recoveringAfterStartup=false Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=collection1 url=http://Gustavos-MacBook-Pro.local:8900/solr START replicas=[http://Gustavos-MacBook-Pro.local:8983/solr/collection1/] nUpdates=100 Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor pRequest INFO: Got user-level KeeperException when processing sessionid:0x3bbe3403c00000 type:delete cxid:0x4df3 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/collections/collection1/leaders/shard1 Error:KeeperErrorCode = NoNode for /collections/collection1/leaders/shard1 Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync WARNING: no frame of reference to tell of we've missed updates Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: PeerSync Recovery was not successful - trying replication. core=collection1 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: Starting Replication Recovery. core=collection1 Dec 21, 2012 11:57:39 AM org.apache.solr.client.solrj.impl.HttpClientUtil createClient INFO: Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor pRequest INFO: Got user-level KeeperException when processing sessionid:0x3bbe3403c00000 type:create cxid:0x4df4 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = NodeExists for /overseer Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=180000 Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.quorum.LearnerHandler run SEVERE: Unexpected exception causing shutdown while sock still open java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:416) Dec 21, 2012 11:57:39 AM org.apache.zookeeper.ClientCnxn$SendThread run INFO: Unable to read additional data from server sessionid 0x3bbe3403c00000, likely server has closed socket, closing socket connection and attempting reconnect Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker run WARNING: Connection broken for id 0, my id = 2, error = java.io.IOException: Channel eof Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.quorum.LearnerHandler run WARNING: ******* GOODBYE /127.0.0.1:58549 ******** Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker run WARNING: Interrupting SendWorker Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker run WARNING: Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1961) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2038) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:347) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622) Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker run WARNING: Send worker leaving thread Dec 21, 2012 11:57:40 AM org.apache.zookeeper.ClientCnxn$SendThread startConnect INFO: Opening socket connection to server localhost/127.0.0.1:9900 Dec 21, 2012 11:57:40 AM org.apache.zookeeper.ClientCnxn$SendThread primeConnection INFO: Socket connection established to localhost/127.0.0.1:9900, initiating session Dec 21, 2012 11:57:40 AM org.apache.zookeeper.server.NIOServerCnxn$Factory run INFO: Accepted socket connection from /127.0.0.1:58930 Dec 21, 2012 11:57:40 AM org.apache.zookeeper.server.NIOServerCnxn readConnectRequest INFO: Client attempting to renew session 0x3bbe3403c00000 at /127.0.0.1:58930 Dec 21, 2012 11:57:40 AM org.apache.zookeeper.server.NIOServerCnxn finishSessionInit INFO: Established session 0x3bbe3403c00000 with negotiated timeout 15000 for client /127.0.0.1:58930 Dec 21, 2012 11:57:40 AM org.apache.zookeeper.ClientCnxn$SendThread readConnectResult INFO: Session establishment complete on server localhost/127.0.0.1:9900, sessionid = 0x3bbe3403c00000, negotiated timeout = 15000 Dec 21, 2012 11:57:40 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=179497 Dec 21, 2012 11:57:40 AM org.apache.solr.common.cloud.ZkStateReader updateClusterState INFO: Updating cloud state from ZooKeeper... Dec 21, 2012 11:57:40 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178996 Dec 21, 2012 11:57:41 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178494 Dec 21, 2012 11:57:41 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=177992 Dec 21, 2012 11:57:42 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=177491 Dec 21, 2012 11:57:42 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=176989 Dec 21, 2012 11:57:43 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=176488 Dec 21, 2012 11:57:43 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=175986 Dec 21, 2012 11:57:44 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=175484 ... Notes: - Shard1 replica doesnt become the leader and keeps "waiting until see more replicas up" - Search results for all Shards: <lst name="error"> <str name="msg">no servers hosting shard:</str> <int name="code">503</int> </lst> Dump: http://dl.dropbox.com/u/7570330/dump.txt What am I doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-not-reachable-and-after-restart-just-a-no-servers-hosting-shard-tp4009786p4028623.html Sent from the Solr - User mailing list archive at Nabble.com.