Hello there,

we have following setup:

SolrCloud 4.4.0 (3 nodes, physical machines)
Zookeeper 3.4.5 (3 nodes, physical machines)

We have a number of rather small collections (~10K or ~100K of documents), that 
we would like to load to all Solr instances (numShards=1, 
replication_factor=3), and access them through local network interface, as the 
load balancing is done in layers above.

We can live (and we actually do it in the test phase) with updating the entire 
collections whenever we need it, switching collection aliases and removing the 
old collections.

We stumbled across following problem: as soon as all three Solr nodes become a 
leader to at least one collection, restarting any node makes it completely 
unresponsive (timeout), both though admin interface and for replication. If we 
restart all solr nodes the cluster end up in some kind of deadlock and only 
remedy we found is Solr clean installation, removing ZooKeeper data and 
re-posting collections.

Apparently, leader is waiting for replicas to come up and they try to 
synchronize but timeout on http requests, so everything ends up in some kind of 
dead lock, maybe related to:

https://issues.apache.org/jira/browse/SOLR-5240

Eventually (after few minutes), leader takes over, mark collections "active" 
but remains blocked on http interface, so other nodes can not synchronize.

In further tests, we loaded 4 collections with numShards=1 and 
replication_factor=2. By chance, one node become the leader for all 4 
collections. Restarting the node which was not the leader is done without the 
problem, but when we restarted the leader it happened that:
- leader shut down, other nodes became leaders of 2 collections each
- leader starts up, 3 collections on it become "active", one collection remains 
”down” and node becomes unresponsive and timeouts on http requests.

As this behavior is completely unexpected for one cluster solution, I wonder if 
somebody else experienced same problems or we are doing something entirely 
wrong.

Best regards

-- 
 
Vladimir Veljkovic
Senior Java Entwickler

Boxalino AG

vladimir.veljko...@boxalino.com 
www.boxalino.com 


Tuning Kit for your Online Shop

Product Search - Recommendations - Landing Pages - Data intelligence - Mobile 
Commerce 
 

Reply via email to