Zookeeper outage recap questions

2010-07-01 Thread Travis Crawford
Hey zookeepers - We just experienced a total zookeeper outage, and here's a quick post-mortem of the issue, and some questions about preventing it going forward. Quick overview of the setup: - RHEL5 2.6.18 kernel - Zookeeper 3.3.0 - ulimit raised to 65k files - 3 cluster members - 4-5k

Re: Zookeeper outage recap questions

2010-07-01 Thread Flavio Junqueira
Hi Travis, Do you think it would be possible for you to open a jira and upload your logs?Thanks,-FlavioOn Jul 1, 2010, at 8:13 AM, Travis Crawford wrote:Hey zookeepers -We just experienced a total zookeeper outage, and here's a quickpost-mortem of the issue, and some questions about preventing it

Re: Zookeeper outage recap questions

2010-07-01 Thread Patrick Hunt
Hi Travis, as Flavio suggested would be great to get the logs. A few questions: 1) how did you eventually recover, restart the zk servers? 2) was the cluster losing quorum during this time? leader re-election? 3) Any chance this could have been initially triggered by a long GC pause on one

Solr Cloud/ Solr integration with zookeeper

2010-07-01 Thread Rakhi Khatwani
Hi, I wanna use solr cloud. i downloaded the code from the trunk, and successfully executed the examples as shown in wiki. but when i try the same with multicore. i cannot access: http://localhost:8983/solr/collection1/admin/zookeeper.jsp it says page not found. Following is my

Re: Zookeeper outage recap questions

2010-07-01 Thread Travis Crawford
I've moved this thread to: https://issues.apache.org/jira/browse/ZOOKEEPER-801 --travis On Thu, Jul 1, 2010 at 12:37 AM, Patrick Hunt ph...@apache.org wrote: Hi Travis, as Flavio suggested would be great to get the logs. A few questions: 1) how did you eventually recover, restart the

Re: Guaranteed message delivery until session timeout?

2010-07-01 Thread Mahadev Konar
When a connectionloss happens all the watches are triggered saying that connectionloss occurred. But on a reconnect the watches are reset automagically on the new server and will be fired if the change has already happened or will be reset! I hope that answers your question. Thanks mahadev On