Hi, This morning one of the 2 nodes of our SolrCloud went down. I've tried many ways to recover it but to no avail. I've tried to unload all cores on the failed node and reload it after emptying the data directory, hoping it would sync from scratch. The core is still marked as down and no data is downloaded.
I get a lot of messages like to following ones in the log: WARN - 2016-01-22 14:14:28.535; [ ] org.eclipse.jetty.http.HttpParser; badMessage: 400 Unknown Version for HttpChannelOverHttp@e795880{r=0,c=false,a=IDLE,uri=-} WARN - 2016-01-22 14:15:02.559; [ ] org.eclipse.jetty.http.HttpParser; badMessage: 400 Unknown Version for HttpChannelOverHttp@727c5f10{r=0,c=false,a=IDLE,uri=-} WARN - 2016-01-22 14:15:02.580; [ ] org.eclipse.jetty.http.HttpParser; badMessage: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp@727c5f10{r=0,c=true,a=COMPLETED,uri=null} ERROR - 2016-01-22 14:15:03.496; [ ] org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unexpected method type: [... truncated json data, presumably from a document being updated ...]POST etc. I've restarted all nodes in zk and SolrCloud, and reloaded all cores on the main node (that is, the one that seems to work). After some research, I saw that the zk timeout is set to 60 sec. in the solr config, but at least one entry in the gc logs mentions 134 sec. However, I noticed that the zk logs states a negociated timeout of 30 sec... Here are my questions: - Are the log entries shown above related to the zk session timeout, or should I look elsewhere? - How to make sure the timeout negociated with zk matches the value from the solr config? - What parameter(s) would allow to reduce the gc execution time, presumably at the expense of a more frequent gc? Thanks!