Hello all,
We had this problem twice in 4 days, only in one of our 14 servers (2 shards 7 replicas) in Solr 4.4 : after successful re-connection to Zookeeper (triggered by "Connection expired - starting a new one"), sometimes the core stays down without coming back, and we have to restart the solr instance to make it go back as alive. Most of the time there is no problem reconnecting to ZK, sometimes a LogReplay or recovery process happens and successfully brings the core alive again quickly. But sometimes having a core going down without any error in logs (either on the core itself or on the leader) is worrying. From the logs, (for the problematic server and the leader: http://pastebin.com/CvcEQtwe ) it looks the core is happy to publish itself as 'down' as soon as the connection to ZK is reestablished, but then never tries to go back 'alive'. Maybe someone has already seen this kind of behaviour ? The only post vaguely related I found was this : http://lucene.472066.n3.nabble.com/SolrCloud-looses-connection-to-Zookeeper-but-stays-down-td4093083.html But it seems caused by something different, as we have no WARN in the logs when core goes down. I have not seen any related bugfix in the 4.5 release either. What is surprising is that only this problematic server has more than 30 "zkClient has disconnected" lines in logs for the past 6 days (0 for all the other servers). We did not found any difference between this server (solr-16) and the others, the network interface does not show any errors for example. 1) Maybe we can increase the zkClientTimeout from 15000 to 60000 to avoid having too many ZK disconnects ? 2) Are there other ways to help tracing and solving why one server is affected by expiration of Zookeeper connection ? 3) It seems there is a bug in some cases of the reconnection process that prevents core going back alive ? André -- André Bois-Crettez Software Architect Search Developer http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.