Re: Recovery issue - how to debug?

2010-04-19 Thread Mahadev Konar
Hi Hao,
  As Vishal already asked, how are you determining if the writes are being
received? 
 Also, what was the status of C2 when you checked for these writes? Do you
have the output of echo stat | nc localhost port?

How long did you wait when you say that C2 did not received the writes? What
was the status of C2 (again echo stat | nc localhost port) when you saw
the C2 had received the writes?

Thanks
mahadev


On 4/18/10 10:54 PM, Dr Hao He h...@softtouchit.com wrote:

 I have zookeeper cluster E1 with 3 nodes A,B, and C.
 
 I stopped C and did some writes on E1.  Both A and B received the writes.  I
 then started C and after a short while, C also received the writes.
 
 All seem to go well so I replicated the setup to another cluster E2 with
 exactly 3 nodes: A2, B2, and C2.
 
 I stopped C2 and did some writes on E2.  A2 received the writes.  I then
 started C2.  However, no matter how long I wait, C2 never received the writes.
 
 I then did more writes on E2.  Then C2 can receive all the writes including
 the old writes when it was down.
 
 How do I find out what was wrong withe E2 setup?
 
 I am running 3.2.2 on all nodes.
 
 Regards,
 
 Dr Hao He
 
 XPE - the truly SOA platform
 
 h...@softtouchit.com
 http://softtouchit.com
 
 



Re: Recovery issue - how to debug?

2010-04-19 Thread Patrick Hunt
Usually the server logs will shed light on such issues. If we had access 
to them it might be easier to speculate.


Patrick

On 04/19/2010 09:22 AM, Mahadev Konar wrote:

Hi Hao,
   As Vishal already asked, how are you determining if the writes are being
received?
  Also, what was the status of C2 when you checked for these writes? Do you
have the output of echo stat | nc localhost port?

How long did you wait when you say that C2 did not received the writes? What
was the status of C2 (again echo stat | nc localhost port) when you saw
the C2 had received the writes?

Thanks
mahadev


On 4/18/10 10:54 PM, Dr Hao Heh...@softtouchit.com  wrote:


I have zookeeper cluster E1 with 3 nodes A,B, and C.

I stopped C and did some writes on E1.  Both A and B received the writes.  I
then started C and after a short while, C also received the writes.

All seem to go well so I replicated the setup to another cluster E2 with
exactly 3 nodes: A2, B2, and C2.

I stopped C2 and did some writes on E2.  A2 received the writes.  I then
started C2.  However, no matter how long I wait, C2 never received the writes.

I then did more writes on E2.  Then C2 can receive all the writes including
the old writes when it was down.

How do I find out what was wrong withe E2 setup?

I am running 3.2.2 on all nodes.

Regards,

Dr Hao He

XPE - the truly SOA platform

h...@softtouchit.com
http://softtouchit.com






Re: Recovery issue - how to debug?

2010-04-19 Thread Travis Crawford
On Mon, Apr 19, 2010 at 2:15 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 Can you attach the screen shot to the JIRA issue?  The mailing list strips
 these things.

Oops. Updated jira:

https://issues.apache.org/jira/browse/ZOOKEEPER-744

--travis



 On Mon, Apr 19, 2010 at 1:18 PM, Travis Crawford
 traviscrawf...@gmail.comwrote:

 Filed:

    https://issues.apache.org/jira/browse/ZOOKEEPER-744

 Attached is a screenshot of some JMX output in Ganglia - its currently
 implemented using a -javaagent tool I happened to find. Having a
 simple non-java way to fetch monitoring stats and publish to an
 external monitoring system would be awesome, and probably reusable by
 others.