Re: Recovery issue - how to debug?
Hi Hao, As Vishal already asked, how are you determining if the writes are being received? Also, what was the status of C2 when you checked for these writes? Do you have the output of echo stat | nc localhost port? How long did you wait when you say that C2 did not received the writes? What was the status of C2 (again echo stat | nc localhost port) when you saw the C2 had received the writes? Thanks mahadev On 4/18/10 10:54 PM, Dr Hao He h...@softtouchit.com wrote: I have zookeeper cluster E1 with 3 nodes A,B, and C. I stopped C and did some writes on E1. Both A and B received the writes. I then started C and after a short while, C also received the writes. All seem to go well so I replicated the setup to another cluster E2 with exactly 3 nodes: A2, B2, and C2. I stopped C2 and did some writes on E2. A2 received the writes. I then started C2. However, no matter how long I wait, C2 never received the writes. I then did more writes on E2. Then C2 can receive all the writes including the old writes when it was down. How do I find out what was wrong withe E2 setup? I am running 3.2.2 on all nodes. Regards, Dr Hao He XPE - the truly SOA platform h...@softtouchit.com http://softtouchit.com
Re: Recovery issue - how to debug?
Usually the server logs will shed light on such issues. If we had access to them it might be easier to speculate. Patrick On 04/19/2010 09:22 AM, Mahadev Konar wrote: Hi Hao, As Vishal already asked, how are you determining if the writes are being received? Also, what was the status of C2 when you checked for these writes? Do you have the output of echo stat | nc localhost port? How long did you wait when you say that C2 did not received the writes? What was the status of C2 (again echo stat | nc localhost port) when you saw the C2 had received the writes? Thanks mahadev On 4/18/10 10:54 PM, Dr Hao Heh...@softtouchit.com wrote: I have zookeeper cluster E1 with 3 nodes A,B, and C. I stopped C and did some writes on E1. Both A and B received the writes. I then started C and after a short while, C also received the writes. All seem to go well so I replicated the setup to another cluster E2 with exactly 3 nodes: A2, B2, and C2. I stopped C2 and did some writes on E2. A2 received the writes. I then started C2. However, no matter how long I wait, C2 never received the writes. I then did more writes on E2. Then C2 can receive all the writes including the old writes when it was down. How do I find out what was wrong withe E2 setup? I am running 3.2.2 on all nodes. Regards, Dr Hao He XPE - the truly SOA platform h...@softtouchit.com http://softtouchit.com
Re: Recovery issue - how to debug?
On Mon, Apr 19, 2010 at 2:15 PM, Ted Dunning ted.dunn...@gmail.com wrote: Can you attach the screen shot to the JIRA issue? The mailing list strips these things. Oops. Updated jira: https://issues.apache.org/jira/browse/ZOOKEEPER-744 --travis On Mon, Apr 19, 2010 at 1:18 PM, Travis Crawford traviscrawf...@gmail.comwrote: Filed: https://issues.apache.org/jira/browse/ZOOKEEPER-744 Attached is a screenshot of some JMX output in Ganglia - its currently implemented using a -javaagent tool I happened to find. Having a simple non-java way to fetch monitoring stats and publish to an external monitoring system would be awesome, and probably reusable by others.