After zk restart SOLR can't update its clusterstate.json

2014-08-28 Thread Ugo Matrangolo
Hi,

just after we finished to restart our zk cluster SOLR started to fail with
tons of zk events.

We shut down all the nodes and restarted them one by one but looks like the
clusterstate.json does not get updated properly.

Example:

core_node11 {

 state:active,

base_url:http://10.140.4.161:9765
http://t.signauxdix.com/link?url=http%3A%2F%2F10.140.4.161%3A9765%2Fukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAwKnt6LQIDAk=70f82f78-368e-46bc-c0e5-2c271f002d3c
,

core:sku_shard1_replica11,

SOLR on the above node is actually down :/ and correctly does not appear in
the live_nodes.

any clue ?


Ugo


Re: After zk restart SOLR can't update its clusterstate.json

2014-08-28 Thread Ugo Matrangolo
Just adding some info:

whan I do:

curl -v 'http://10.140.3.25:9765/zookeeper?wt=json'

it takes ages to come back and on the Admin UI I can't see the Cloud Graph.

Ugo


On Fri, Aug 29, 2014 at 12:52 AM, Ugo Matrangolo ugo.matrang...@gmail.com
wrote:

 Hi,

 just after we finished to restart our zk cluster SOLR started to fail with
 tons of zk events.

 We shut down all the nodes and restarted them one by one but looks like
 the clusterstate.json does not get updated properly.

 Example:

 core_node11 {

  state:active,

 base_url:http://10.140.4.161:9765
 http://t.signauxdix.com/link?url=http%3A%2F%2F10.140.4.161%3A9765%2Fukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAwKnt6LQIDAk=70f82f78-368e-46bc-c0e5-2c271f002d3c
 ,

 core:sku_shard1_replica11,

 SOLR on the above node is actually down :/ and correctly does not appear
 in the live_nodes.

 any clue ?


 Ugo




On Fri, Aug 29, 2014 at 12:52 AM, Ugo Matrangolo ugo.matrang...@gmail.com
wrote:

 Hi,

 just after we finished to restart our zk cluster SOLR started to fail with
 tons of zk events.

 We shut down all the nodes and restarted them one by one but looks like
 the clusterstate.json does not get updated properly.

 Example:

 core_node11 {

  state:active,

 base_url:http://10.140.4.161:9765
 http://t.signauxdix.com/link?url=http%3A%2F%2F10.140.4.161%3A9765%2Fukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAwKnt6LQIDAk=70f82f78-368e-46bc-c0e5-2c271f002d3c
 ,

 core:sku_shard1_replica11,

 SOLR on the above node is actually down :/ and correctly does not appear
 in the live_nodes.

 any clue ?


 Ugo



Re: After zk restart SOLR can't update its clusterstate.json

2014-08-28 Thread Shawn Heisey
On 8/28/2014 5:52 PM, Ugo Matrangolo wrote:
 just after we finished to restart our zk cluster SOLR started to fail with
 tons of zk events.
 
 We shut down all the nodes and restarted them one by one but looks like the
 clusterstate.json does not get updated properly.

On IRC, you mentioned you were on 4.7.2.

I wonder if maybe the overseer queue is not being processed?  Can you
look in that section of zookeeper?

The big overseer queue bug (SOLR-5811) was fixed in 4.7.1, but I know
there was at least one more bug fixed in 4.8 or later.

Thanks,
Shawn