Forget to mention Solr is 4.2 and zookepeer 3.4.5 I do not do manual commits and prefer softCommit each second and autoCommit each 3 minutes.
the problem happened again, lots of errors in logs and no description. Cluster state changed, on the shard 2 replica became a leader, former leader get in to recovering mode. The error happened when 1. Shard1 tried to forward an update to Shard2, and this was the initial error From Shard2: ClusterState says we are the leader, but locally we don't think so 2. Shard2 forwarded the update to the Replica2 and get: org.apache.solr.common.SolrException: Request says it is coming from leader, but we are the leader Please see attachments Topology: <http://lucene.472066.n3.nabble.com/file/n4061839/Topology_new.png> Shard1: <http://lucene.472066.n3.nabble.com/file/n4061839/Shard1_new.png> Replica1: <http://lucene.472066.n3.nabble.com/file/n4061839/Replica1_new.png> Shard2: <http://lucene.472066.n3.nabble.com/file/n4061839/Shard2_new.png> Replica2: <http://lucene.472066.n3.nabble.com/file/n4061839/Replica2_new.png> All errors from the screenshots appears each time the server load gets higher. Only I started a few more queue workers, load gets higher and cluster becomes unstable. So I have doubts about reliability. Could any docs be lost during these errors or should I just ignore those? I understand that 4 solr instances and 3 zookeeper could be too many for a single machine, there could be not enough resources, etc. But anyway it should not cause anything like that. The worst scenario there should be is a timeout error, when Solr not responding and my queue processors could handle that and resend a request after a while. -- View this message in context: http://lucene.472066.n3.nabble.com/ColrCloud-IOException-occured-when-talking-to-server-at-tp4061831p4061839.html Sent from the Solr - User mailing list archive at Nabble.com.