Your hitting https://issues.apache.org/jira/browse/SOLR-3939
The luck of hashing must have left the guy trying to become the leader without any docs. Due to SOLR-3939, a node with an empty index cannot become the leader. - Mark On Dec 21, 2012, at 1:41 PM, gumatias <gust...@matias.com> wrote: > I'm getting the same error. I followed the SolrCloud examples and it didn't > work.. here's basically what I've done: > > EXPERIMENT 1: start shards and index documents, search for documents in all > replicas > > # Starting Shards > - Shard1 Leader (with zookeeper) > java -Dbootstrap_confdir=./solr/collection1/conf > -Dcollection.configName=myconf -DzkRun > -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2 -jar > start.jar > - Shard1 Replica (with zookeeper) > java -Djetty.port=7574 -DzkRun > -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar > - Shard2 Leader (with zookeeper) > java -Djetty.port=8900 -DzkRun > -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar > - Shard2 Replica > java -Djetty.port=7500 > -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar > > clusterstate.json: http://dl.dropbox.com/u/7570330/clusterstate.txt > > # Indexing sample document > java -jar post.jar hd.xml > > # search in all Shards: number of results found: 2 > Note: all shards have the same result > > EXPERIMENT 2: Kill current Shard1 Leader, expect Shard1 Replica to become > leader, search should still work and results return (is that right?) > > # Killing Shard2 Leader > > Shard2 Replica logs: > ... > Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor > pRequest > INFO: Processed session termination for sessionid: 0x3bbe3403c00001 > Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor > pRequest > INFO: Got user-level KeeperException when processing > sessionid:0x3bbe3403c00000 type:delete cxid:0x4dea zxid:0xfffffffffffffffe > txntype:unknown reqpath:n/a Error > Path:/collections/collection1/leaders/shard1 Error:KeeperErrorCode = NoNode > for /collections/collection1/leaders/shard1 > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext > runLeaderProcess > INFO: Running the leader process. > Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor > pRequest > INFO: Got user-level KeeperException when processing > sessionid:0x3bbe3403c00000 type:create cxid:0x4dec zxid:0xfffffffffffffffe > txntype:unknown reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = > NodeExists for /overseer > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext > shouldIBeLeader > INFO: Checking if I should try and be the leader. > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext > shouldIBeLeader > INFO: My last published State was Active, it's okay to be the leader. > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext > runLeaderProcess > INFO: I may be the new leader - try and sync > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.SyncStrategy sync > INFO: Sync replicas to > http://Gustavos-MacBook-Pro.local:8900/solr/collection1/ > Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync > INFO: PeerSync: core=collection1 > url=http://Gustavos-MacBook-Pro.local:8900/solr START > replicas=[http://Gustavos-MacBook-Pro.local:8983/solr/collection1/] > nUpdates=100 > Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync > INFO: PeerSync: core=collection1 > url=http://Gustavos-MacBook-Pro.local:8900/solr DONE. We have no versions. > sync failed. > Dec 21, 2012 11:57:39 AM org.apache.solr.common.SolrException log > SEVERE: Sync Failed > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext > rejoinLeaderElection > INFO: There is a better leader candidate than us - going back into recovery > Dec 21, 2012 11:57:39 AM org.apache.solr.update.DefaultSolrCoreState > doRecovery > INFO: Running recovery - first canceling any ongoing recovery > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy run > INFO: Starting recovery process. core=collection1 > recoveringAfterStartup=false > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery > INFO: Attempting to PeerSync from > http://Gustavos-MacBook-Pro.local:8983/solr/collection1/ core=collection1 - > recoveringAfterStartup=false > Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync > INFO: PeerSync: core=collection1 > url=http://Gustavos-MacBook-Pro.local:8900/solr START > replicas=[http://Gustavos-MacBook-Pro.local:8983/solr/collection1/] > nUpdates=100 > Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor > pRequest > INFO: Got user-level KeeperException when processing > sessionid:0x3bbe3403c00000 type:delete cxid:0x4df3 zxid:0xfffffffffffffffe > txntype:unknown reqpath:n/a Error > Path:/collections/collection1/leaders/shard1 Error:KeeperErrorCode = NoNode > for /collections/collection1/leaders/shard1 > Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync > WARNING: no frame of reference to tell of we've missed updates > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery > INFO: PeerSync Recovery was not successful - trying replication. > core=collection1 > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery > INFO: Starting Replication Recovery. core=collection1 > Dec 21, 2012 11:57:39 AM org.apache.solr.client.solrj.impl.HttpClientUtil > createClient > INFO: Creating new http client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext > runLeaderProcess > INFO: Running the leader process. > Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor > pRequest > INFO: Got user-level KeeperException when processing > sessionid:0x3bbe3403c00000 type:create cxid:0x4df4 zxid:0xfffffffffffffffe > txntype:unknown reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = > NodeExists for /overseer > Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext > waitForReplicasToComeUp > INFO: Waiting until we see more replicas up: total=2 found=1 > timeoutin=180000 > Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.quorum.LearnerHandler > run > SEVERE: Unexpected exception causing shutdown while sock still open > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) > at > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:416) > Dec 21, 2012 11:57:39 AM org.apache.zookeeper.ClientCnxn$SendThread run > INFO: Unable to read additional data from server sessionid 0x3bbe3403c00000, > likely server has closed socket, closing socket connection and attempting > reconnect > Dec 21, 2012 11:57:39 AM > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker run > WARNING: Connection broken for id 0, my id = 2, error = java.io.IOException: > Channel eof > Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.quorum.LearnerHandler > run > WARNING: ******* GOODBYE /127.0.0.1:58549 ******** > Dec 21, 2012 11:57:39 AM > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker run > WARNING: Interrupting SendWorker > Dec 21, 2012 11:57:39 AM > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker run > WARNING: Interrupted while waiting for message on queue > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1961) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2038) > at > java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:347) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622) > Dec 21, 2012 11:57:39 AM > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker run > WARNING: Send worker leaving thread > Dec 21, 2012 11:57:40 AM org.apache.zookeeper.ClientCnxn$SendThread > startConnect > INFO: Opening socket connection to server localhost/127.0.0.1:9900 > Dec 21, 2012 11:57:40 AM org.apache.zookeeper.ClientCnxn$SendThread > primeConnection > INFO: Socket connection established to localhost/127.0.0.1:9900, initiating > session > Dec 21, 2012 11:57:40 AM org.apache.zookeeper.server.NIOServerCnxn$Factory > run > INFO: Accepted socket connection from /127.0.0.1:58930 > Dec 21, 2012 11:57:40 AM org.apache.zookeeper.server.NIOServerCnxn > readConnectRequest > INFO: Client attempting to renew session 0x3bbe3403c00000 at > /127.0.0.1:58930 > Dec 21, 2012 11:57:40 AM org.apache.zookeeper.server.NIOServerCnxn > finishSessionInit > INFO: Established session 0x3bbe3403c00000 with negotiated timeout 15000 for > client /127.0.0.1:58930 > Dec 21, 2012 11:57:40 AM org.apache.zookeeper.ClientCnxn$SendThread > readConnectResult > INFO: Session establishment complete on server localhost/127.0.0.1:9900, > sessionid = 0x3bbe3403c00000, negotiated timeout = 15000 > Dec 21, 2012 11:57:40 AM org.apache.solr.cloud.ShardLeaderElectionContext > waitForReplicasToComeUp > INFO: Waiting until we see more replicas up: total=2 found=1 > timeoutin=179497 > Dec 21, 2012 11:57:40 AM org.apache.solr.common.cloud.ZkStateReader > updateClusterState > INFO: Updating cloud state from ZooKeeper... > Dec 21, 2012 11:57:40 AM org.apache.solr.cloud.ShardLeaderElectionContext > waitForReplicasToComeUp > INFO: Waiting until we see more replicas up: total=2 found=1 > timeoutin=178996 > Dec 21, 2012 11:57:41 AM org.apache.solr.cloud.ShardLeaderElectionContext > waitForReplicasToComeUp > INFO: Waiting until we see more replicas up: total=2 found=1 > timeoutin=178494 > Dec 21, 2012 11:57:41 AM org.apache.solr.cloud.ShardLeaderElectionContext > waitForReplicasToComeUp > INFO: Waiting until we see more replicas up: total=2 found=1 > timeoutin=177992 > Dec 21, 2012 11:57:42 AM org.apache.solr.cloud.ShardLeaderElectionContext > waitForReplicasToComeUp > INFO: Waiting until we see more replicas up: total=2 found=1 > timeoutin=177491 > Dec 21, 2012 11:57:42 AM org.apache.solr.cloud.ShardLeaderElectionContext > waitForReplicasToComeUp > INFO: Waiting until we see more replicas up: total=2 found=1 > timeoutin=176989 > Dec 21, 2012 11:57:43 AM org.apache.solr.cloud.ShardLeaderElectionContext > waitForReplicasToComeUp > INFO: Waiting until we see more replicas up: total=2 found=1 > timeoutin=176488 > Dec 21, 2012 11:57:43 AM org.apache.solr.cloud.ShardLeaderElectionContext > waitForReplicasToComeUp > INFO: Waiting until we see more replicas up: total=2 found=1 > timeoutin=175986 > Dec 21, 2012 11:57:44 AM org.apache.solr.cloud.ShardLeaderElectionContext > waitForReplicasToComeUp > INFO: Waiting until we see more replicas up: total=2 found=1 > timeoutin=175484 > ... > > Notes: > > - Shard1 replica doesnt become the leader and keeps "waiting until see more > replicas up" > > - Search results for all Shards: > <lst name="error"> > <str name="msg">no servers hosting shard:</str> > <int name="code">503</int> > </lst> > > Dump: http://dl.dropbox.com/u/7570330/dump.txt > > What am I doing wrong? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solrcloud-not-reachable-and-after-restart-just-a-no-servers-hosting-shard-tp4009786p4028623.html > Sent from the Solr - User mailing list archive at Nabble.com.