Re: Wrong leader election leads to shard removal

Manuel Le Normand Wed, 14 Aug 2013 06:04:23 -0700

Does this sound like the scenario that happened:
By removing the index dir from replica 2 I also removed the tlog from which
the zookeeper extracts the version of the two replicas and decides which
one should be elected to leader. As replica 2 did have no tlog, the zk
didn't have anyway to compare the 2 registered replicas so it just picked
arbitrarly one of the replicas to lead, resulting in electing empty
replicas.


How does the zookeeper compare the 2 tlogs to know which one is more
recent? does it not rely on the version number shown in the admin UI?


On Wed, Aug 14, 2013 at 11:00 AM, Manuel Le Normand <
manuel.lenorm...@gmail.com> wrote:

> Hello,
> My solr cluster runs on RH Linux with tomcat7 servlet.
> NumOfShards=40, replicationFactor=2, 40 servers each has 2 replicas. Solr
> 4.3
>
> For experimental reasons I splitted my cluster to 2 sub-clusters, each
> containing a single replica of each shard.
> When connecting back these sub-clusters the sync failed (more than 100
> docs indexed per shard) so a replication process started on sub-cluster #2.
> Due to transient storage limitations needed for the replication process, I
> removed all the index from sub-cluster #2 before connecting it back, then I
> connected sub-cluster #2's servers in 3-4 bulks to avoid high disk load.
> The first bulk replications worked well, but after a while an internal
> script pkilled all the solr instances, some while replicating. After
> starting back the servlet I discovered the disaster - on part of the
> replicas that were in a replicating stage there was a wrong zookeeper
> leader election - good state replicas (sub-cluster 1) replicated from empty
> replicas (sub-cluster 2) ending up in removing all documents in these
> shards!!
>
> These are the logs from solr-prod32 (sub cluster #2 - bad state) - the
> shard1_replica1 is elected to be leader although it was not before the
> replication process (and shouldn't have the higher version number):
>
> 2013-08-13 13:39:15.838 [INFO ]
> org.apache.solr.cloud.ShardLeaderElectionContext Enough replicas found to
> continue.
> 2013-08-13 13:39:15.838 [INFO ]
> org.apache.solr.cloud.ShardLeaderElectionContext I may be the new leader -
> try and sync
> 2013-08-13 13:39:15.839 [INFO ] org.apache.solr.cloud.SyncStrategy Sync
> replicas to http://solr-prod32:5050/solr/raw shard1_replica1/
>  2013-08-13 13:39:15.841 [INFO ]
> org.apache.solr.client.solrj.impl.HttpClientUtil Creating new http client,
> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
> 2013-08-13 13:39:15.844 [INFO ] org.apache.solr.update.PeerSync PeerSync:
> core=raw_shard1_replica1 url=http://solr-prod32:8080/solr START replicas=[
> http://solr-prod02:5080/solr/raw shard1_replica2/] nUpdates=100
> 2013-08-13 13:39:15.847 [INFO I org.apache.solr.update.PeerSync PeerSync:
> core=raw shard1_replica1 url=http://solr-prod32:8080/solr DONE. We have
> no versions. sync failed.
> 2013-08-13 13:39:15.847 [INFO ] org.apache.solr.cloud.SyncStrategy
> Leader's attempt to sync with shard failed, moving to the next canidate
> 2013-08-13 13:39:15.847 [INFO ]
> org.apache.solr.cloud.ShardLeaderElectionContext We failed sync, but we
> have no versions - we can't sync in that case - we were active before, so
> become leader anyway
> 2013-08-13 13:39:15.847 [INFO ]
> org.apache.solr.cloud.ShardLeaderElectionContext I am the new leader:
> http://solr-prod32:8080/solr/raw_shard1_replica1/
> 2013-08-13 13:39:15.847 [INFO ] org.apache.solr.common.cloud.SolrZkClient
> makePath: /collections/raw/leaders/shardl
> 2013-08-13 13:39:17.423 [INFO ] org.apache.solr.common.cloud.ZkStateReader
> A cluster state change: WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
> (live nodes size: 40)
>
> While in solr-prod02 (sub cluster #1 - good state) I get:
> 2013-08-13 13:39:15.671 [INFO ] org.apache.solr.cloud.ZkController
> publishing core=raw_shard1_replica2 state=down
> 2013-08-13 13:39:15.671 [INFO ] org.apache.solr.cloud.ZkController
> numShards not found on descriptor - reading it from system property
> 2013-08-13 13:39:15.673 [INFO ] org.apache.solr.core.CoreContainer
> registering core: raw_shard1_replica2
> 2013-08-13 13:39:15.673 [INFO ] org.apache.solr.cloud.ZkController
> Register replica - core:raw_shard1_replica2 address:
> http://so1r-prod02:8080/solr collection:raw shard:shard1
> 2013-08-13 13:39:17.423 [INFO ] org.apache.solr.common.cloud.ZkStateReader
> A cluster state change: WatchedEvent stare:SyncConnected
> type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
> (live nodes size: 40)
> 2013-08-13 13:39:17.480 [INFO ] org.apache.solr.cloud.ZkController We are
> httpL//solr-prod02:8080/solr/raw_shard1_replica2/ and leader is
> http://solr-prod32:8080/solr/raw_shard1_replica1/
> 2013-08-13 13:39:17.481 [INFO ] org.apache.solr.cloud.ZkController No
> LogReplay needed for core=raw_shard1_replica2
> 2013-08-13 13:39:17.481 [INFO ] org.apache.solr.cloud.ZkController Core
> needs to recover:raw shard1_replica2
> 2013-08-13 13:39:17.481 [INFO ]
> org.apache.solr.update.DefaultSolrCoreState Running recovery - first
> canceling any ongoing recovery
> 2013-08-13 13:39:17.485 [INFO org.apache.solr.common.cloud.ZkStateReader
> Updating cloud state from ZooKeeper...
> 2013-08-13 13:39:17.485 [INFO ] org.apache.solr.cloud.RecoveryStrategy
> Starting recovery process. core=raw_shard1_rep1ica2
>
> Why was the leader elected wrongly??
>
> Thanks
>

Re: Wrong leader election leads to shard removal

Reply via email to