Re: About symbol table of Zookeeper c client

2010-09-08 Thread Qian Ye
Hi mahadev,
thx for the infomation, i'm running zookeeper 3.3.0.

On Sat, Sep 4, 2010 at 1:22 AM, Mahadev Konar maha...@yahoo-inc.com wrote:

 Path and Qian,
  This was fixed in

 https://issues.apache.org/jira/browse/ZOOKEEPER-604

 I have marked ZOOKEEPER-295 referencing ZOOKEEPER-604.

 Qian, what version of zookeeper are you running?

 Thanks
 mahadev




 On 9/3/10 9:51 AM, Patrick Hunt ph...@apache.org wrote:

  This is a long standing issue slated for 4.0
  https://issues.apache.org/jira/browse/ZOOKEEPER-295
 
  Mahadev had done some work to reduce the exported symbols as part of 3.3,
  perhaps this slipped through the net?
 
  Mahadev - can we address this using the current mechanism?
 
  https://issues.apache.org/jira/browse/ZOOKEEPER-295Patrick
 
  On Thu, Sep 2, 2010 at 7:37 AM, Qian Ye yeqian@gmail.com wrote:
 
  Hi all:
 
  I'm writing a application in C which need to link both memcached's lib
 and
  zookeeper's c client lib. I found a symbol table conflict, because both
  libs
  provide implmentation(recordio.h/c) of function htonll. It seems that
 some
  functions of zookeeper c client, which can be accessed externally but
 uesd
  internally, have simple names. I think it will bring much symbol table
  confilct from time to time, and we should do something about it, e.g.
 add a
  specific prefix to these funcitons.
 
  thx
 
  --
  With Regards!
 
  Ye, Qian
 
 




-- 
With Regards!

Ye, Qian


About symbol table of Zookeeper c client

2010-09-02 Thread Qian Ye
Hi all:

I'm writing a application in C which need to link both memcached's lib and
zookeeper's c client lib. I found a symbol table conflict, because both libs
provide implmentation(recordio.h/c) of function htonll. It seems that some
functions of zookeeper c client, which can be accessed externally but uesd
internally, have simple names. I think it will bring much symbol table
confilct from time to time, and we should do something about it, e.g. add a
specific prefix to these funcitons.

thx

-- 
With Regards!

Ye, Qian


[jira] Commented: (ZOOKEEPER-797) c client source with AI_ADDRCONFIG cannot be compiled with early glibc

2010-06-29 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883603#action_12883603
 ] 

Qian Ye commented on ZOOKEEPER-797:
---

Hi guys, do I need to add any tests for this patch?

 c client source with AI_ADDRCONFIG cannot be compiled with early glibc
 --

 Key: ZOOKEEPER-797
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-797
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
 Environment: linux 2.6.9
Reporter: Qian Ye
 Attachments: ZOOKEEPER-797.patch


 c client source with AI_ADDRCONFIG cannot be compiled with early glibc 
 (before 2.3.3)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-797) c client source with AI_ADDRCONFIG cannot be compiled with early glibc

2010-06-28 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-797:
--

Status: Patch Available  (was: Open)

 c client source with AI_ADDRCONFIG cannot be compiled with early glibc
 --

 Key: ZOOKEEPER-797
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-797
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
 Environment: linux 2.6.9
Reporter: Qian Ye
 Attachments: ZOOKEEPER-797.patch


 c client source with AI_ADDRCONFIG cannot be compiled with early glibc 
 (before 2.3.3)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-797) c client source with AI_ADDRCONFIG cannot be compiled with early glibc

2010-06-28 Thread Qian Ye (JIRA)
c client source with AI_ADDRCONFIG cannot be compiled with early glibc
--

 Key: ZOOKEEPER-797
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-797
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
 Environment: linux 2.6.9
Reporter: Qian Ye
 Attachments: ZOOKEEPER-797.patch

c client source with AI_ADDRCONFIG cannot be compiled with early glibc (before 
2.3.3)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-797) c client source with AI_ADDRCONFIG cannot be compiled with early glibc

2010-06-28 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-797:
--

Attachment: ZOOKEEPER-797.patch

 c client source with AI_ADDRCONFIG cannot be compiled with early glibc
 --

 Key: ZOOKEEPER-797
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-797
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
 Environment: linux 2.6.9
Reporter: Qian Ye
 Attachments: ZOOKEEPER-797.patch


 c client source with AI_ADDRCONFIG cannot be compiled with early glibc 
 (before 2.3.3)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init

2010-05-25 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871474#action_12871474
 ] 

Qian Ye commented on ZOOKEEPER-779:
---

OK Patrick, however I'm really busy these days, it may take a week or two 
before I can make it done.

 C Client should check the connectivity to the hosts in zookeeper_init
 -

 Key: ZOOKEEPER-779
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Qian Ye
 Attachments: ZOOKEEPER-779.patch, ZOOKEEPER-779.patch


 In some scenario, whether the client can connect to zookeeper servers is used 
 as a logic condition. If the client cannot connect to the servers, the 
 program should turn to another fork. However, current zookeeper_init could 
 not tell whether the client can connect to one server or not. It could make 
 some users feel confused. I think we should check the connectivity to the 
 host in zookeeper_init, so we can tell whether the hosts are avaiable at that 
 time or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init

2010-05-24 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870545#action_12870545
 ] 

Qian Ye commented on ZOOKEEPER-779:
---

By check connectivity, I mean check whether client can connect to a zookeeper 
server which is listed in the parameters. In my usage, zookeeper is used to 
store some meta infomation. The logic flow of my app is that if it can connect 
to the zookeeper, then obtain the meta info from zookeeper, or  obtain it from 
local file. Becuase the connection to the zookeeper server is not initialized 
when the zookeeper_init return (mt version), I used to make my app sleep a few 
seconds to make sure the connection is initialized, however, if the hosts list 
contains some invalid servers address, the sleep time is hard to estimate. I 
cannot take the initialization method used in load_gen.c, because in some 
situation, I want my app read meta info from local file by give a wrong host to 
zookeepr_init.

In a word, I just want zookeeper_init to check, whether at least one zookeeper 
server in the host list is avaiable at the connecting time. I have made a patch 
for this issue, could you like to check it out?

Anyway, a strategy pattern for connection would be great, I think we should 
to that.

 C Client should check the connectivity to the hosts in zookeeper_init
 -

 Key: ZOOKEEPER-779
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Qian Ye

 In some scenario, whether the client can connect to zookeeper servers is used 
 as a logic condition. If the client cannot connect to the servers, the 
 program should turn to another fork. However, current zookeeper_init could 
 not tell whether the client can connect to one server or not. It could make 
 some users feel confused. I think we should check the connectivity to the 
 host in zookeeper_init, so we can tell whether the hosts are avaiable at that 
 time or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init

2010-05-24 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870946#action_12870946
 ] 

Qian Ye commented on ZOOKEEPER-779:
---

Thx Patrick, i see your point. Here is some explanation. The temporary glitch 
at the starting time will not lead to any harmful result in my system. This 
kind of glitch will be recorded in the log file, so some monitor  process will 
notice that. Moreover, the absence of local meta file or out of sync will not 
lead to any harmful result either. In a word, my system should be able to keep 
running without zookeeper providing the latest meta info.

I would attach my patch soon :-)

 C Client should check the connectivity to the hosts in zookeeper_init
 -

 Key: ZOOKEEPER-779
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Qian Ye

 In some scenario, whether the client can connect to zookeeper servers is used 
 as a logic condition. If the client cannot connect to the servers, the 
 program should turn to another fork. However, current zookeeper_init could 
 not tell whether the client can connect to one server or not. It could make 
 some users feel confused. I think we should check the connectivity to the 
 host in zookeeper_init, so we can tell whether the hosts are avaiable at that 
 time or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init

2010-05-24 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-779:
--

Attachment: ZOOKEEPER-779.patch

 C Client should check the connectivity to the hosts in zookeeper_init
 -

 Key: ZOOKEEPER-779
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Qian Ye
 Attachments: ZOOKEEPER-779.patch


 In some scenario, whether the client can connect to zookeeper servers is used 
 as a logic condition. If the client cannot connect to the servers, the 
 program should turn to another fork. However, current zookeeper_init could 
 not tell whether the client can connect to one server or not. It could make 
 some users feel confused. I think we should check the connectivity to the 
 host in zookeeper_init, so we can tell whether the hosts are avaiable at that 
 time or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init

2010-05-24 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-779:
--

Status: Patch Available  (was: Open)

do the connectivity check in zookeeper_init

 C Client should check the connectivity to the hosts in zookeeper_init
 -

 Key: ZOOKEEPER-779
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Qian Ye
 Attachments: ZOOKEEPER-779.patch


 In some scenario, whether the client can connect to zookeeper servers is used 
 as a logic condition. If the client cannot connect to the servers, the 
 program should turn to another fork. However, current zookeeper_init could 
 not tell whether the client can connect to one server or not. It could make 
 some users feel confused. I think we should check the connectivity to the 
 host in zookeeper_init, so we can tell whether the hosts are avaiable at that 
 time or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init

2010-05-24 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-779:
--

Attachment: ZOOKEEPER-779.patch

 C Client should check the connectivity to the hosts in zookeeper_init
 -

 Key: ZOOKEEPER-779
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Qian Ye
 Attachments: ZOOKEEPER-779.patch, ZOOKEEPER-779.patch


 In some scenario, whether the client can connect to zookeeper servers is used 
 as a logic condition. If the client cannot connect to the servers, the 
 program should turn to another fork. However, current zookeeper_init could 
 not tell whether the client can connect to one server or not. It could make 
 some users feel confused. I think we should check the connectivity to the 
 host in zookeeper_init, so we can tell whether the hosts are avaiable at that 
 time or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init

2010-05-24 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-779:
--

Status: Patch Available  (was: Open)

 C Client should check the connectivity to the hosts in zookeeper_init
 -

 Key: ZOOKEEPER-779
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Qian Ye
 Attachments: ZOOKEEPER-779.patch, ZOOKEEPER-779.patch


 In some scenario, whether the client can connect to zookeeper servers is used 
 as a logic condition. If the client cannot connect to the servers, the 
 program should turn to another fork. However, current zookeeper_init could 
 not tell whether the client can connect to one server or not. It could make 
 some users feel confused. I think we should check the connectivity to the 
 host in zookeeper_init, so we can tell whether the hosts are avaiable at that 
 time or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init

2010-05-24 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-779:
--

Status: Open  (was: Patch Available)

 C Client should check the connectivity to the hosts in zookeeper_init
 -

 Key: ZOOKEEPER-779
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Qian Ye
 Attachments: ZOOKEEPER-779.patch, ZOOKEEPER-779.patch


 In some scenario, whether the client can connect to zookeeper servers is used 
 as a logic condition. If the client cannot connect to the servers, the 
 program should turn to another fork. However, current zookeeper_init could 
 not tell whether the client can connect to one server or not. It could make 
 some users feel confused. I think we should check the connectivity to the 
 host in zookeeper_init, so we can tell whether the hosts are avaiable at that 
 time or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init

2010-05-22 Thread Qian Ye (JIRA)
C Client should check the connectivity to the hosts in zookeeper_init
-

 Key: ZOOKEEPER-779
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.3.1
Reporter: Qian Ye


In some scenario, whether the client can connect to zookeeper servers is used 
as a logic condition. If the client cannot connect to the servers, the program 
should turn to another fork. However, current zookeeper_init could not tell 
whether the client can connect to one server or not. It could make some users 
feel confused. I think we should check the connectivity to the host in 
zookeeper_init, so we can tell whether the hosts are avaiable at that time or 
not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-591) The C Client cannot exit properly in some situation

2010-03-11 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844391#action_12844391
 ] 

Qian Ye commented on ZOOKEEPER-591:
---

This patch works for me, thx mahadev

 The C Client cannot exit properly in some situation
 ---

 Key: ZOOKEEPER-591
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-591
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 
 14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux 
Reporter: Qian Ye
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.3.0

 Attachments: ZOOKEEPER-591.patch, ZOOKEEPER-591.patch, 
 ZOOKEEPER-591.patch, ZOOKEEPER-591.patch, zootest.c


 The following code produce a situation, where the C Client can not exit 
 properly,
 #include include/zookeeper.h
 void default_zoo_watcher(zhandle_t *zzh, int type, int state, const char 
 *path, void* context){
 int zrc = 0;
 struct String_vector str_vec = {0, NULL};
 printf(in the default_zoo_watcher\n);
 zrc = zoo_wget_children(zzh, /mytest, default_zoo_watcher, NULL, 
 str_vec);
 printf(zoo_wget_children, error: %d\n, zrc);
 return;
 }
 int main()
 {
 int zrc = 0;
 int buff_len = 10; 
 char buff[10] = hello;
 char path[512];
 struct Stat stat;
 struct String_vector str_vec = {0, NULL};
 zhandle_t *zh = zookeeper_init(10.81.20.62:2181, NULL, 3, 0, 0, 0); 
 zrc = zoo_create(zh, /mytest, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 
 512);
 printf(zoo_create, error: %d\n, zrc);
 zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, 
 str_vec);
 printf(zoo_wget_children, error: %d\n, zrc);
 zrc = zoo_create(zh, /mytest/test1, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, 
 path, 512);
 printf(zoo_create, error: %d\n, zrc);
 zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, 
 str_vec);
 printf(zoo_wget_children, error: %d\n, zrc);
 zrc = zoo_delete(zh, /mytest/test1, -1);
 printf(zoo_delete, error: %d\n, zrc);
 zookeeper_close(zh);
 return 0;
 }
 running this code can cause the program hang at zookeeper_close(zh);(line 
 38). using gdb to attach the process, I found that the main thread is waiting 
 for do_completion thread to finish,
 (gdb) bt
 #0  0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0
 #1  0x0040de3b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:219
 #2  0x004060ba in zookeeper_close (zh=0x515b60) at 
 src/zookeeper.c:2100
 #3  0x0040220b in main ()
 and the thread which handle the zoo_wget_children(in the default_zoo_watcher) 
 is waiting for sc-cond. 
 (gdb) thread 2
 [Switching to thread 2 (Thread 1094719840 (LWP 25093))]#0  0x00302b8089aa 
 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib64/tls/libpthread.so.0
 (gdb) bt
 #0  0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/tls/libpthread.so.0
 #1  0x0040d88b in wait_sync_completion (sc=0x5167f0) at 
 src/mt_adaptor.c:82
 #2  0x004082c9 in zoo_wget_children (zh=0x515b60, path=0x40ebc0 
 /mytest, watcher=0x401fd8 default_zoo_watcher, watcherCtx=Variable 
 watcherCtx is not available.)
 at src/zookeeper.c:2884
 #3  0x00402037 in default_zoo_watcher ()
 #4  0x0040d664 in deliverWatchers (zh=0x515b60, type=4, state=3, 
 path=0x515100 /mytest, list=0x5177d8) at src/zk_hashtable.c:274
 #5  0x00403861 in process_completions (zh=0x515b60) at 
 src/zookeeper.c:1631
 #6  0x0040e1b5 in do_completion (v=Variable v is not available.) at 
 src/mt_adaptor.c:333
 #7  0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0
 #8  0x00302afc6003 in clone () from /lib64/tls/libc.so.6
 #9  0x in ?? ()
 here, a deadlock presents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-591) The C Client cannot exit properly in some situation

2010-03-10 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843870#action_12843870
 ] 

Qian Ye commented on ZOOKEEPER-591:
---

 The process still hang there, Mahadev. 
(gdb) info thread
  2 Thread 1094719840 (LWP 31877)  0x00302b8089aa in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
  1 Thread 182894113888 (LWP 31875)  0x00302b806ffb in pthread_join () from 
/lib64/tls/libpthread.so.0
(gdb) thread 1
[Switching to thread 1 (Thread 182894113888 (LWP 31875))]#0  0x00302b806ffb 
in pthread_join () from /lib64/tls/libpthread.so.0
(gdb) bt
#0  0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0
#1  0x0040de5b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:218
#2  0x004060da in zookeeper_close (zh=0x515b60) at src/zookeeper.c:2109
#3  0x0040220b in main ()
(gdb) thread 2
[Switching to thread 2 (Thread 1094719840 (LWP 31877))]#0  0x00302b8089aa 
in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/tls/libpthread.so.0
(gdb) bt
#0  0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/tls/libpthread.so.0
#1  0x0040d8ab in wait_sync_completion (sc=0x5167f0) at 
src/mt_adaptor.c:82
#2  0x004082e9 in zoo_wget_children (zh=0x515b60, path=0x40ebe0 
/mytest, watcher=0x401fd8 default_zoo_watcher, watcherCtx=Variable 
watcherCtx is not available.
)
at src/zookeeper.c:2889
#3  0x00402037 in default_zoo_watcher ()
#4  0x0040d684 in deliverWatchers (zh=0x515b60, type=4, state=3, 
path=0x515100 /mytest, list=0x2a95700b08) at src/zk_hashtable.c:271
#5  0x00403771 in process_completions (zh=0x515b60) at 
src/zookeeper.c:1623
#6  0x0040e1d5 in do_completion (v=Variable v is not available.
) at src/mt_adaptor.c:332
#7  0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#8  0x00302afc6003 in clone () from /lib64/tls/libc.so.6
#9  0x in ?? ()

I patched the patch to the c client source code version 3.2.2,  not the working 
copy, I think this won't make any difference, right?

 The C Client cannot exit properly in some situation
 ---

 Key: ZOOKEEPER-591
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-591
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 
 14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux 
Reporter: Qian Ye
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.3.0

 Attachments: ZOOKEEPER-591.patch, ZOOKEEPER-591.patch


 The following code produce a situation, where the C Client can not exit 
 properly,
 #include include/zookeeper.h
 void default_zoo_watcher(zhandle_t *zzh, int type, int state, const char 
 *path, void* context){
 int zrc = 0;
 struct String_vector str_vec = {0, NULL};
 printf(in the default_zoo_watcher\n);
 zrc = zoo_wget_children(zzh, /mytest, default_zoo_watcher, NULL, 
 str_vec);
 printf(zoo_wget_children, error: %d\n, zrc);
 return;
 }
 int main()
 {
 int zrc = 0;
 int buff_len = 10; 
 char buff[10] = hello;
 char path[512];
 struct Stat stat;
 struct String_vector str_vec = {0, NULL};
 zhandle_t *zh = zookeeper_init(10.81.20.62:2181, NULL, 3, 0, 0, 0); 
 zrc = zoo_create(zh, /mytest, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 
 512);
 printf(zoo_create, error: %d\n, zrc);
 zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, 
 str_vec);
 printf(zoo_wget_children, error: %d\n, zrc);
 zrc = zoo_create(zh, /mytest/test1, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, 
 path, 512);
 printf(zoo_create, error: %d\n, zrc);
 zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, 
 str_vec);
 printf(zoo_wget_children, error: %d\n, zrc);
 zrc = zoo_delete(zh, /mytest/test1, -1);
 printf(zoo_delete, error: %d\n, zrc);
 zookeeper_close(zh);
 return 0;
 }
 running this code can cause the program hang at zookeeper_close(zh);(line 
 38). using gdb to attach the process, I found that the main thread is waiting 
 for do_completion thread to finish,
 (gdb) bt
 #0  0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0
 #1  0x0040de3b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:219
 #2  0x004060ba in zookeeper_close (zh=0x515b60) at 
 src/zookeeper.c:2100
 #3  0x0040220b in main ()
 and the thread which handle the zoo_wget_children(in the default_zoo_watcher) 
 is waiting for sc-cond. 
 (gdb) thread 2
 [Switching to thread 2 (Thread 1094719840 (LWP 25093))]#0  0x00302b8089aa 
 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib64/tls/libpthread.so.0
 (gdb) bt
 #0

Re: Google Summer of Code

2010-03-10 Thread Qian Ye
Hi Henry,

I think we should add two kinds of interface to the server:
1. An interface which can return the clients which set watcher on specific
znode of the data tree. This kind of interface can be really helpful for the
administrators.
2. An interface which can return a list of servers in a zookeeper cluster.

Maybe the students can help to do this work.

thx~

On Wed, Mar 10, 2010 at 4:46 AM, Gustavo Niemeyer gust...@niemeyer.netwrote:

 Hi Henry,

  There is a wiki page here:
  http://wiki.apache.org/hadoop/ZooKeeper/SoC2010Ideas that requires that
 you
  sign up to edit. Please post your project ideas up there - I've left one
 as
  an example. You can also mail me directly and I'll post them myself. On
  Friday I'll tidy up the page and send in an application to Google.

 Thanks a lot for organizing this.

 The key things I'd like to see moving forward, as was discussed before
 in the mailing list, are:

 - Encryption of communication between servers
 - Encryption of communication between servers and clients
 - Dynamic cluster membership changes

 I don't know how well these fit in GSoC.

 --
 Gustavo Niemeyer
 http://niemeyer.net
 http://niemeyer.net/blog
 http://niemeyer.net/identi.ca
 http://niemeyer.net/twitter




-- 
With Regards!

Ye, Qian


[jira] Commented: (ZOOKEEPER-591) The C Client cannot exit properly in some situation

2010-03-09 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843443#action_12843443
 ] 

Qian Ye commented on ZOOKEEPER-591:
---

Hi Mahadev,  the patch doesn't work :-(, the deadlock still exist.

(gdb) info thread
  2 Thread 1094719840 (LWP 13889)  0x00302b8089aa in 
pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/tls/libpthread.so.0
  1 Thread 182894113888 (LWP 13887)  0x00302b806ffb in pthread_join () from 
/lib64/tls/libpthread.so.0
(gdb) thread 1
[Switching to thread 1 (Thread 182894113888 (LWP 13887))]#0  0x00302b806ffb 
in pthread_join ()
   from /lib64/tls/libpthread.so.0
(gdb) bt
#0  0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0
#1  0x0040de2b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:218
#2  0x004060aa in zookeeper_close (zh=0x515b60) at src/zookeeper.c:2086
#3  0x0040220b in main ()
(gdb) thread 2
[Switching to thread 2 (Thread 1094719840 (LWP 13889))]#0  0x00302b8089aa 
in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
(gdb) bt 
#0  0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/tls/libpthread.so.0
#1  0x0040d87b in wait_sync_completion (sc=0x517850) at 
src/mt_adaptor.c:82
#2  0x004082b9 in zoo_wget_children (zh=0x515b60, path=0x40eba0 
/mytest, 
watcher=0x401fd8 default_zoo_watcher, watcherCtx=Variable watcherCtx is 
not available.
) at src/zookeeper.c:2866
#3  0x00402037 in default_zoo_watcher ()
#4  0x0040d654 in deliverWatchers (zh=0x515b60, type=4, state=3, 
path=0x516920 /mytest, 
list=0x5177d8) at src/zk_hashtable.c:271
#5  0x00403871 in process_completions (zh=0x515b60) at 
src/zookeeper.c:1620
#6  0x0040e1a5 in do_completion (v=Variable v is not available.
) at src/mt_adaptor.c:332
#7  0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#8  0x00302afc6003 in clone () from /lib64/tls/libc.so.6
#9  0x in ?? ()


 The C Client cannot exit properly in some situation
 ---

 Key: ZOOKEEPER-591
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-591
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 
 14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux 
Reporter: Qian Ye
Assignee: Mahadev konar
Priority: Critical
 Fix For: 3.3.0

 Attachments: ZOOKEEPER-591.patch


 The following code produce a situation, where the C Client can not exit 
 properly,
 #include include/zookeeper.h
 void default_zoo_watcher(zhandle_t *zzh, int type, int state, const char 
 *path, void* context){
 int zrc = 0;
 struct String_vector str_vec = {0, NULL};
 printf(in the default_zoo_watcher\n);
 zrc = zoo_wget_children(zzh, /mytest, default_zoo_watcher, NULL, 
 str_vec);
 printf(zoo_wget_children, error: %d\n, zrc);
 return;
 }
 int main()
 {
 int zrc = 0;
 int buff_len = 10; 
 char buff[10] = hello;
 char path[512];
 struct Stat stat;
 struct String_vector str_vec = {0, NULL};
 zhandle_t *zh = zookeeper_init(10.81.20.62:2181, NULL, 3, 0, 0, 0); 
 zrc = zoo_create(zh, /mytest, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 
 512);
 printf(zoo_create, error: %d\n, zrc);
 zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, 
 str_vec);
 printf(zoo_wget_children, error: %d\n, zrc);
 zrc = zoo_create(zh, /mytest/test1, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, 
 path, 512);
 printf(zoo_create, error: %d\n, zrc);
 zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, 
 str_vec);
 printf(zoo_wget_children, error: %d\n, zrc);
 zrc = zoo_delete(zh, /mytest/test1, -1);
 printf(zoo_delete, error: %d\n, zrc);
 zookeeper_close(zh);
 return 0;
 }
 running this code can cause the program hang at zookeeper_close(zh);(line 
 38). using gdb to attach the process, I found that the main thread is waiting 
 for do_completion thread to finish,
 (gdb) bt
 #0  0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0
 #1  0x0040de3b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:219
 #2  0x004060ba in zookeeper_close (zh=0x515b60) at 
 src/zookeeper.c:2100
 #3  0x0040220b in main ()
 and the thread which handle the zoo_wget_children(in the default_zoo_watcher) 
 is waiting for sc-cond. 
 (gdb) thread 2
 [Switching to thread 2 (Thread 1094719840 (LWP 25093))]#0  0x00302b8089aa 
 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib64/tls/libpthread.so.0
 (gdb) bt
 #0  0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from 
 /lib64/tls/libpthread.so.0
 #1  0x0040d88b

Re: About load balance in Zookeeper server

2010-03-03 Thread Qian Ye
thx mahadev :-)

On Thu, Mar 4, 2010 at 4:05 AM, Mahadev Konar maha...@yahoo-inc.com wrote:

 Hi Qian,
  I am not sure if I did respond to your email or not. Sorry, too many
 emails
 I am catching up on. You are right that if you specify just a single host
 then the client would not be able to switch to another server. There have
 been some ideas around Dynamic configuration and storing zookeeper ensemble
 information on the zookeeper cluster itself.

 http://issues.apache.org/jira/browse/ZOOKEEPER-338
 http://issues.apache.org/jira/browse/ZOOKEEPER-107
 http://issues.apache.org/jira/browse/ZOOKEEPER-390

 This might answer some of the problems you mention, but they are all being
 worked upon!

 Thanks
 mahadev


 On 3/1/10 6:09 PM, Qian Ye yeqian@gmail.com wrote:

  Thanks Mahadev, I see what you mean.
 
  Here is another question, the client need a list of Zookeeper servers to
  initialize the handler, and there is no API for the client to get
 awareness
  of all the Zookeeper servers in one cluster. That means, if I only
 provide
  one Zookeeper server in the client's host list, the client would not
 switch
  to another available Zookeeper server, when the given one was failed. I
  think is strategy is flawed. The client should be able to find out all
 the
  Zookeeper servers in the cluster. Is there any compromise for this issue?
 
  thanks
 
  On Tue, Mar 2, 2010 at 7:29 AM, Mahadev Konar maha...@yahoo-inc.com
 wrote:
 
  HI Qian,
  You are right we do have any way of handling clients dynamically so that
  every server has balanced load. This requires a careful design since we
  would not want client connections to keep flipping around and also
 maintain
  stability as much as we can. We have had some discussions about it but
  nothing concrete has materialized yet.
 
  We do have checks in place that prevent more than a certain number of
  connections (default 10) from the same ip address. This is to keep too
 many
  zookeeper client instances from the same client bogging down the
 zookeeper
  service. Also, we have throttling for number of outstanding requests
 from
  clients (currently set to 1000 by default). This allows zookeeper
 service
  to
  throttle zookeeper clients. This throttling isnt done on per client
 basis
  but is just a check to not bring down the zookeeper service because of
 some
  misbehaved client.
  Any other checks that you specifically were thinking of?
 
  Thanks
  mahadev
 
  On 2/28/10 10:18 PM, Qian Ye yeqian@gmail.com wrote:
 
  Hi guys:
 
  As I know, when a client connected to Zookeeper servers, it would
 choose
  a
  server randomly (without the zoo_deterministic_conn_odrder on), and
 then,
  the client would talk to the server until a failure happened. It seems
  that
  zookeeper server cannot handle the client connection dynamically
  according
  to the load of the server. If some flaw of a client made the client
  connect
  Zookeeper servers frequently, it may prevent other normal clients from
  getting services from Zookeeper, right? So, is there any method to
  resolve
  these two practical problems:
 
  1. Handle and apportion clients dynamically, so every servers would
 have
  balanced load.
  2. Some of frequency controller, which set a frequency threshold on the
  frequency of requests from a client, prevent server resource from being
  exhausted by a few clients.
 
  --
  With Regards!
 
  Ye, Qian
 
 
 




-- 
With Regards!

Ye, Qian


Re: About load balance in Zookeeper server

2010-03-01 Thread Qian Ye
Thanks Mahadev, I see what you mean.

Here is another question, the client need a list of Zookeeper servers to
initialize the handler, and there is no API for the client to get awareness
of all the Zookeeper servers in one cluster. That means, if I only provide
one Zookeeper server in the client's host list, the client would not switch
to another available Zookeeper server, when the given one was failed. I
think is strategy is flawed. The client should be able to find out all the
Zookeeper servers in the cluster. Is there any compromise for this issue?

thanks

On Tue, Mar 2, 2010 at 7:29 AM, Mahadev Konar maha...@yahoo-inc.com wrote:

 HI Qian,
 You are right we do have any way of handling clients dynamically so that
 every server has balanced load. This requires a careful design since we
 would not want client connections to keep flipping around and also maintain
 stability as much as we can. We have had some discussions about it but
 nothing concrete has materialized yet.

 We do have checks in place that prevent more than a certain number of
 connections (default 10) from the same ip address. This is to keep too many
 zookeeper client instances from the same client bogging down the zookeeper
 service. Also, we have throttling for number of outstanding requests from
 clients (currently set to 1000 by default). This allows zookeeper service
 to
 throttle zookeeper clients. This throttling isnt done on per client basis
 but is just a check to not bring down the zookeeper service because of some
 misbehaved client.
 Any other checks that you specifically were thinking of?

 Thanks
 mahadev

 On 2/28/10 10:18 PM, Qian Ye yeqian@gmail.com wrote:

  Hi guys:
 
  As I know, when a client connected to Zookeeper servers, it would choose
 a
  server randomly (without the zoo_deterministic_conn_odrder on), and then,
  the client would talk to the server until a failure happened. It seems
 that
  zookeeper server cannot handle the client connection dynamically
 according
  to the load of the server. If some flaw of a client made the client
 connect
  Zookeeper servers frequently, it may prevent other normal clients from
  getting services from Zookeeper, right? So, is there any method to
 resolve
  these two practical problems:
 
  1. Handle and apportion clients dynamically, so every servers would have
  balanced load.
  2. Some of frequency controller, which set a frequency threshold on the
  frequency of requests from a client, prevent server resource from being
  exhausted by a few clients.
 
  --
  With Regards!
 
  Ye, Qian




-- 
With Regards!

Ye, Qian


About load balance in Zookeeper server

2010-02-28 Thread Qian Ye
Hi guys:

As I know, when a client connected to Zookeeper servers, it would choose a
server randomly (without the zoo_deterministic_conn_odrder on), and then,
the client would talk to the server until a failure happened. It seems that
zookeeper server cannot handle the client connection dynamically according
to the load of the server. If some flaw of a client made the client connect
Zookeeper servers frequently, it may prevent other normal clients from
getting services from Zookeeper, right? So, is there any method to resolve
these two practical problems:

1. Handle and apportion clients dynamically, so every servers would have
balanced load.
2. Some of frequency controller, which set a frequency threshold on the
frequency of requests from a client, prevent server resource from being
exhausted by a few clients.

--
With Regards!

Ye, Qian


[jira] Commented: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server

2010-02-24 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837696#action_12837696
 ] 

Qian Ye commented on ZOOKEEPER-662:
---

This has not happened again yet. Is there anything we can do to find the 
reason? When this kind of thing occurred, it really put our system in risk.

 Too many CLOSE_WAIT socket state on a server
 

 Key: ZOOKEEPER-662
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.2.1
 Environment: Linux 2.6.9
Reporter: Qian Ye
 Fix For: 3.3.0

 Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106


 I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is 
 the content in the configure file, zoo.cfg
 ==
 # The number of milliseconds of each tick
 tickTime=2000
 # The number of ticks that the initial 
 # synchronization phase can take
 initLimit=5
 # The number of ticks that can pass between 
 # sending a request and getting an acknowledgement
 syncLimit=2
 # the directory where the snapshot is stored.
 dataDir=./data/
 # the port at which the clients will connect
 clientPort=8181
 # zookeeper cluster list
 server.100=10.23.253.43:8887:
 server.101=10.23.150.29:8887:
 server.102=10.23.247.141:8887:
 server.200=10.65.20.68:8887:
 server.201=10.65.27.21:8887:
 =
 Before the problem happened, the server.200 was the leader. Yesterday 
 morning, I found the there were many sockets with the state of CLOSE_WAIT on 
 the clientPort (8181),  the total was over about 120. Because of these 
 CLOSE_WAIT, the server.200 could not accept more connections from the 
 clients. The only thing I can do under this situation is restart the 
 server.200, at about 2010-02-01 06:06:35. The related log is attached to the 
 issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server

2010-02-24 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838157#action_12838157
 ] 

Qian Ye commented on ZOOKEEPER-662:
---

Thx Patrick, this situation might be consequence of a network switch 
adjustment. The effect of the adjustment, as I know, is that two Zookeeper 
servers lost connection to the other three Zookeeper servers. This connection 
loss last about several minutes. I have tried to reproduce it, but haven't 
succeeded yet. I would put an eye on this issue, and let you know if I got any 
more information about this. Thank you.

 Too many CLOSE_WAIT socket state on a server
 

 Key: ZOOKEEPER-662
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.2.1
 Environment: Linux 2.6.9
Reporter: Qian Ye
 Fix For: 3.3.0

 Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106


 I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is 
 the content in the configure file, zoo.cfg
 ==
 # The number of milliseconds of each tick
 tickTime=2000
 # The number of ticks that the initial 
 # synchronization phase can take
 initLimit=5
 # The number of ticks that can pass between 
 # sending a request and getting an acknowledgement
 syncLimit=2
 # the directory where the snapshot is stored.
 dataDir=./data/
 # the port at which the clients will connect
 clientPort=8181
 # zookeeper cluster list
 server.100=10.23.253.43:8887:
 server.101=10.23.150.29:8887:
 server.102=10.23.247.141:8887:
 server.200=10.65.20.68:8887:
 server.201=10.65.27.21:8887:
 =
 Before the problem happened, the server.200 was the leader. Yesterday 
 morning, I found the there were many sockets with the state of CLOSE_WAIT on 
 the clientPort (8181),  the total was over about 120. Because of these 
 CLOSE_WAIT, the server.200 could not accept more connections from the 
 clients. The only thing I can do under this situation is restart the 
 server.200, at about 2010-02-01 06:06:35. The related log is attached to the 
 issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server

2010-02-04 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829920#action_12829920
 ] 

Qian Ye commented on ZOOKEEPER-662:
---

Hi Patrick, the c clients all run in a Linux environment, the kernels are 
2.6.9. Some of the servers are 32 bit machines and some of them are 64 bits. It 
seems that the client on the server 10.81.14.81 has some problem, which caused 
the client to fail frequently. Because there is a monitor app which can restart 
the c client when it failed, the client on 10.81.14.81 keep restarting and 
connecting to the zookeeper servers frequently.  

You mentioned that some of the response for request stat didn't reach the 
client, it looks like the behaviors of TCP connection with SO_LINER option on. 
In this kind of situation, the server only put the response on the wire and 
close, however, the response package may be discarded, and the TCP/IP stack 
wouldn't re-send the response. Is it the scenario we met here?

 Too many CLOSE_WAIT socket state on a server
 

 Key: ZOOKEEPER-662
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.2.1
 Environment: Linux 2.6.9
Reporter: Qian Ye
 Fix For: 3.3.0

 Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106


 I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is 
 the content in the configure file, zoo.cfg
 ==
 # The number of milliseconds of each tick
 tickTime=2000
 # The number of ticks that the initial 
 # synchronization phase can take
 initLimit=5
 # The number of ticks that can pass between 
 # sending a request and getting an acknowledgement
 syncLimit=2
 # the directory where the snapshot is stored.
 dataDir=./data/
 # the port at which the clients will connect
 clientPort=8181
 # zookeeper cluster list
 server.100=10.23.253.43:8887:
 server.101=10.23.150.29:8887:
 server.102=10.23.247.141:8887:
 server.200=10.65.20.68:8887:
 server.201=10.65.27.21:8887:
 =
 Before the problem happened, the server.200 was the leader. Yesterday 
 morning, I found the there were many sockets with the state of CLOSE_WAIT on 
 the clientPort (8181),  the total was over about 120. Because of these 
 CLOSE_WAIT, the server.200 could not accept more connections from the 
 clients. The only thing I can do under this situation is restart the 
 server.200, at about 2010-02-01 06:06:35. The related log is attached to the 
 issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server

2010-02-02 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828916#action_12828916
 ] 

Qian Ye commented on ZOOKEEPER-662:
---

I'm using the c client, and there is also a monitor process using echo stat | 
nc zookeeper 8181 every 20 seconds to get the status of the servers.  If the 
monitor process failed to get a valid reply, it would send a sms alarm to my 
cell phone. When the problem happened, I received such an alarm. It said 
connection refused.  I haven't found the backlog for the client port in the 
source code. If it used the default value 128, then so many CLOSE_WAIT states 
would prevent the kernel from accepting new connection, right?

P.S. I cannot tell why the client keep reconnect with the same error, I will 
take a look at it and append more information if I can find something.


 Too many CLOSE_WAIT socket state on a server
 

 Key: ZOOKEEPER-662
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.2.1
 Environment: Linux 2.6.9
Reporter: Qian Ye
 Fix For: 3.3.0

 Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106


 I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is 
 the content in the configure file, zoo.cfg
 ==
 # The number of milliseconds of each tick
 tickTime=2000
 # The number of ticks that the initial 
 # synchronization phase can take
 initLimit=5
 # The number of ticks that can pass between 
 # sending a request and getting an acknowledgement
 syncLimit=2
 # the directory where the snapshot is stored.
 dataDir=./data/
 # the port at which the clients will connect
 clientPort=8181
 # zookeeper cluster list
 server.100=10.23.253.43:8887:
 server.101=10.23.150.29:8887:
 server.102=10.23.247.141:8887:
 server.200=10.65.20.68:8887:
 server.201=10.65.27.21:8887:
 =
 Before the problem happened, the server.200 was the leader. Yesterday 
 morning, I found the there were many sockets with the state of CLOSE_WAIT on 
 the clientPort (8181),  the total was over about 120. Because of these 
 CLOSE_WAIT, the server.200 could not accept more connections from the 
 clients. The only thing I can do under this situation is restart the 
 server.200, at about 2010-02-01 06:06:35. The related log is attached to the 
 issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server

2010-02-01 Thread Qian Ye (JIRA)
Too many CLOSE_WAIT socket state on a server


 Key: ZOOKEEPER-662
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.2.1
 Environment: Linux 2.6.9
Reporter: Qian Ye
 Fix For: 3.3.0


I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is the 
content in the configure file, zoo.cfg

==
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=5
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=2
# the directory where the snapshot is stored.
dataDir=./data/
# the port at which the clients will connect
clientPort=8181

# zookeeper cluster list
server.100=10.23.253.43:8887:
server.101=10.23.150.29:8887:
server.102=10.23.247.141:8887:
server.200=10.65.20.68:8887:
server.201=10.65.27.21:8887:
=

Before the problem happened, the server.200 was the leader. Yesterday morning, 
I found the there were many sockets with the state of CLOSE_WAIT on the 
clientPort (8181),  the total was over about 120. Because of these CLOSE_WAIT, 
the server.200 could not accept more connections from the clients. The only 
thing I can do under this situation is restart the server.200, at about 
2010-02-01 06:06:35. The related log is attached to the issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server

2010-02-01 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-662:
--

Attachment: zookeeper.log.2010020106
zookeeper.log.2010020105

related log to this issue

 Too many CLOSE_WAIT socket state on a server
 

 Key: ZOOKEEPER-662
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.2.1
 Environment: Linux 2.6.9
Reporter: Qian Ye
 Fix For: 3.3.0

 Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106


 I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is 
 the content in the configure file, zoo.cfg
 ==
 # The number of milliseconds of each tick
 tickTime=2000
 # The number of ticks that the initial 
 # synchronization phase can take
 initLimit=5
 # The number of ticks that can pass between 
 # sending a request and getting an acknowledgement
 syncLimit=2
 # the directory where the snapshot is stored.
 dataDir=./data/
 # the port at which the clients will connect
 clientPort=8181
 # zookeeper cluster list
 server.100=10.23.253.43:8887:
 server.101=10.23.150.29:8887:
 server.102=10.23.247.141:8887:
 server.200=10.65.20.68:8887:
 server.201=10.65.27.21:8887:
 =
 Before the problem happened, the server.200 was the leader. Yesterday 
 morning, I found the there were many sockets with the state of CLOSE_WAIT on 
 the clientPort (8181),  the total was over about 120. Because of these 
 CLOSE_WAIT, the server.200 could not accept more connections from the 
 clients. The only thing I can do under this situation is restart the 
 server.200, at about 2010-02-01 06:06:35. The related log is attached to the 
 issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2010-01-23 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-612:
--

Release Note:   (was: fix a semicolon mistake)
  Status: Patch Available  (was: Open)

 Make Zookeeper C client can be compiled by gcc of early version
 ---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye
Assignee: Qian Ye
 Fix For: 3.3.0

 Attachments: patch, patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch, 
 ZOOKEEPER-612.patch


 The original C Client, Version 3.2.1, cannot be compiled successfully by the 
 gcc of early version, due some declaration restriction. To compile the source 
 code on the server with gcc of early version, I made some modification on the 
 original source. What's more, some extra codes are added to make the client 
 be compatible with the hosts list format: ip1:port1, ip2:port2... There is 
 often a space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2010-01-23 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-612:
--

Status: Open  (was: Patch Available)

update the patch

 Make Zookeeper C client can be compiled by gcc of early version
 ---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye
Assignee: Qian Ye
 Fix For: 3.3.0

 Attachments: patch, patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch, 
 ZOOKEEPER-612.patch


 The original C Client, Version 3.2.1, cannot be compiled successfully by the 
 gcc of early version, due some declaration restriction. To compile the source 
 code on the server with gcc of early version, I made some modification on the 
 original source. What's more, some extra codes are added to make the client 
 be compatible with the hosts list format: ip1:port1, ip2:port2... There is 
 often a space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2010-01-23 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-612:
--

Attachment: ZOOKEEPER-612.patch

 Make Zookeeper C client can be compiled by gcc of early version
 ---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye
Assignee: Qian Ye
 Fix For: 3.3.0

 Attachments: patch, patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch, 
 ZOOKEEPER-612.patch, ZOOKEEPER-612.patch


 The original C Client, Version 3.2.1, cannot be compiled successfully by the 
 gcc of early version, due some declaration restriction. To compile the source 
 code on the server with gcc of early version, I made some modification on the 
 original source. What's more, some extra codes are added to make the client 
 be compatible with the hosts list format: ip1:port1, ip2:port2... There is 
 often a space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2010-01-23 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-612:
--

Release Note: update the path, hope it works this time
  Status: Patch Available  (was: Open)

 Make Zookeeper C client can be compiled by gcc of early version
 ---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye
Assignee: Qian Ye
 Fix For: 3.3.0

 Attachments: patch, patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch, 
 ZOOKEEPER-612.patch, ZOOKEEPER-612.patch


 The original C Client, Version 3.2.1, cannot be compiled successfully by the 
 gcc of early version, due some declaration restriction. To compile the source 
 code on the server with gcc of early version, I made some modification on the 
 original source. What's more, some extra codes are added to make the client 
 be compatible with the hosts list format: ip1:port1, ip2:port2... There is 
 often a space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




[jira] Commented: (ZOOKEEPER-650) Servers cannot join in quorum

2010-01-20 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802819#action_12802819
 ] 

Qian Ye commented on ZOOKEEPER-650:
---

Hi all, some more information about this problem, I find that the status of the 
election ports of the three working servers is strange. For example, the server 
10.23.150.29, 

$ netstat -anp | grep 
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp0  0 0.0.0.0:0.0.0.0:*   
LISTEN  -   
tcp9  0 10.23.150.29:   10.23.253.43:23933  
CLOSE_WAIT  -   
tcp   157577  0 10.23.150.29:   10.65.27.21:10482   
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23929  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23672  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23671  
CLOSE_WAIT  -   
tcp   41  0 10.23.150.29:   10.23.247.141:10790 
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23669  
CLOSE_WAIT  -   
tcp  136  0 10.23.150.29:   10.23.247.141:10791 
ESTABLISHED -   
tcp9  0 10.23.150.29:   10.23.253.43:23668  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23667  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23923  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23666  
CLOSE_WAIT  -   
tcp   73  0 10.23.150.29:   10.23.247.141:10786 
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23664  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23663  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23662  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23661  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23660  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23659  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23656  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23651  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23648  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23647  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23646  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23643  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23642  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23640  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23639  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23638  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23637  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23636  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23635  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23634  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23633  
CLOSE_WAIT  -   
tcp9  0 10.23.150.29:   10.23.253.43:23630  
CLOSE_WAIT  -   
tcp6  0 10.23.150.29:   10.23.253.43:23620  
CLOSE_WAIT  -   
tcp  617  0 10.23.150.29:   10.65.27.21:28984   
CLOSE_WAIT  -   
tcp0  0 10.23.150.29:10593  10.23.253.43:   
CLOSE_WAIT  -   
tcp   51  0 10.23.150.29:   10.23.253.43:23712  
CLOSE_WAIT  -   
tcp9  0

[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2010-01-18 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-612:
--

Attachment: ZOOKEEPER-612.patch

New patch against trunk

 Make Zookeeper C client can be compiled by gcc of early version
 ---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye
Assignee: Qian Ye
 Fix For: 3.3.0

 Attachments: patch, patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch


 The original C Client, Version 3.2.1, cannot be compiled successfully by the 
 gcc of early version, due some declaration restriction. To compile the source 
 code on the server with gcc of early version, I made some modification on the 
 original source. What's more, some extra codes are added to make the client 
 be compatible with the hosts list format: ip1:port1, ip2:port2... There is 
 often a space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2010-01-17 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801347#action_12801347
 ] 

Qian Ye commented on ZOOKEEPER-612:
---

Is that because I didn't make the patch based on the latest svn chunk version? 
Should I make a new patch based on it?

 Make Zookeeper C client can be compiled by gcc of early version
 ---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye
Assignee: Qian Ye
 Fix For: 3.3.0

 Attachments: patch, patch, ZOOKEEPER-612.patch


 The original C Client, Version 3.2.1, cannot be compiled successfully by the 
 gcc of early version, due some declaration restriction. To compile the source 
 code on the server with gcc of early version, I made some modification on the 
 original source. What's more, some extra codes are added to make the client 
 be compatible with the hosts list format: ip1:port1, ip2:port2... There is 
 often a space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-628) the ephemeral node wouldn't disapper due to session close error

2010-01-07 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797870#action_12797870
 ] 

Qian Ye commented on ZOOKEEPER-628:
---

err...sorry, i wasn't aware of these comments,
the log files below the level of WARN were not recorded due to a wrong 
log4j.properties, and the data directory and snapshots contains some sensitive 
information, sorry that I cannot upload them. :-(

 the ephemeral node wouldn't disapper due to session close error
 ---

 Key: ZOOKEEPER-628
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-628
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.1
 Environment: Linux 2.6.9 x86_64
Reporter: Qian Ye

 I find a very strange scenario today, I'm not sure how it happen, I just 
 found it like this. Maybe you can give me some information about it, my 
 Zookeeper Server is version 3.2.1.
 My Zookeeper cluster contains three servers, with ip: 
 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create ephemeral 
 node under znode: se/diserver_tc. The client runs on the server with ip 
 10.81.13.173. The client can create a ephemeral node on zookeeper server and 
 write the host ip (10.81.13.173) in to the node as its data. There is only 
 one client process can be running at a time, because the client will listen 
 to a certain port.
 It is strange that I found there were two ephemeral node with the ip 
 10.81.13.173 under znode se/diserver_tc.
 se/diserver_tc/diserver_tc67
 STAT:
 czxid: 124554079820
 mzxid: 124554079820
 ctime: 1260609598547
 mtime: 1260609598547
 version: 0
 cversion: 0
 aversion: 0
 ephemeralOwner: 226627854640480810
 dataLength: 92
 numChildren: 0
 pzxid: 124554079820
 se/diserver_tc/diserver_tc95
 STAT:
 czxid: 128849019107
 mzxid: 128849019107
 ctime: 1260772197356
 mtime: 1260772197356
 version: 0
 cversion: 0
 aversion: 0
 ephemeralOwner: 154673159808876591
 dataLength: 92
 numChildren: 0
 pzxid: 128849019107
 There are TWO with different session id! And after I kill the client process 
 on the server 10.81.13.173, the se/diserver_tc/diserver_tc95 node 
 disappear, but the se/diserver_tc/diserver_tc67 stay the same. That 
 means it is not my coding mistake to create the node twice. I checked several 
 times and I'm sure that there is no another client instance running. And I 
 use the 'stat' command to check the three zookeeper servers, and there is no 
 client from 10.81.13.173,
 $echo stat | nc 10.81.12.144 2181   
 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
 Clients:
  /10.81.13.173:35676[1](queued=0,recved=0,sent=0) # it is caused by the nc 
 process
 Latency min/avg/max: 0/3/254
 Received: 11081
 Sent: 0
 Outstanding: 0
 Zxid: 0x1e01f5
 Mode: follower
 Node count: 32
 $ echo stat | nc 10.81.12.141 2181
 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
 Clients:
  /10.81.12.152:58110[1](queued=0,recved=10374,sent=0)
  /10.81.13.173:35677[1](queued=0,recved=0,sent=0) # it is caused by the nc 
 process
 Latency min/avg/max: 0/0/37
 Received: 37128
 Sent: 0
 Outstanding: 0
 Zxid: 0x1e01f5
 Mode: follower
 Node count: 26
 $ echo stat | nc 10.81.12.145 2181
 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
 Clients:
  /10.81.12.153:19130[1](queued=0,recved=10624,sent=0)
  /10.81.13.173:35678[1](queued=0,recved=0,sent=0) # it is caused by the nc 
 process
 Latency min/avg/max: 0/2/213
 Received: 26700
 Sent: 0
 Outstanding: 0
 Zxid: 0x1e01f5
 Mode: leader
 Node count: 26
 The three 'stat' commands show different Node count! 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2010-01-06 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-612:
--

Attachment: ZOOKEEPER-612.patch

yes Patrick, reasonable tips, a patch for this is attached, thx

 Make Zookeeper C client can be compiled by gcc of early version
 ---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye
Assignee: Qian Ye
 Fix For: 3.3.0

 Attachments: patch, patch, ZOOKEEPER-612.patch


 The original C Client, Version 3.2.1, cannot be compiled successfully by the 
 gcc of early version, due some declaration restriction. To compile the source 
 code on the server with gcc of early version, I made some modification on the 
 original source. What's more, some extra codes are added to make the client 
 be compatible with the hosts list format: ip1:port1, ip2:port2... There is 
 often a space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2009-12-25 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-612:
--

Release Note: fix a semicolon mistake
  Status: Patch Available  (was: Open)

 Make Zookeeper C client can be compiled by gcc of early version
 ---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye
 Attachments: patch


 The original C Client, Version 3.2.1, cannot be compiled successfully by the 
 gcc of early version, due some declaration restriction. To compile the source 
 code on the server with gcc of early version, I made some modification on the 
 original source. What's more, some extra codes are added to make the client 
 be compatible with the hosts list format: ip1:port1, ip2:port2... There is 
 often a space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2009-12-25 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-612:
--

Status: Open  (was: Patch Available)

 Make Zookeeper C client can be compiled by gcc of early version
 ---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye
 Attachments: patch


 The original C Client, Version 3.2.1, cannot be compiled successfully by the 
 gcc of early version, due some declaration restriction. To compile the source 
 code on the server with gcc of early version, I made some modification on the 
 original source. What's more, some extra codes are added to make the client 
 be compatible with the hosts list format: ip1:port1, ip2:port2... There is 
 often a space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2009-12-25 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-612:
--

Attachment: patch

fix a semicolon mistake

 Make Zookeeper C client can be compiled by gcc of early version
 ---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye
 Attachments: patch, patch


 The original C Client, Version 3.2.1, cannot be compiled successfully by the 
 gcc of early version, due some declaration restriction. To compile the source 
 code on the server with gcc of early version, I made some modification on the 
 original source. What's more, some extra codes are added to make the client 
 be compatible with the hosts list format: ip1:port1, ip2:port2... There is 
 often a space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



A very strange scenario, may due to some bug on the server side

2009-12-15 Thread Qian Ye
Hi guys:

I find a very strange scenario today, I'm not sure how it happen, I just
found it like this. Maybe you can give me some information about it, my
Zookeeper Server is version 3.2.1.

My Zookeeper cluster contains three servers, with ip:
10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create ephemeral
node under znode: *se/diserver_tc*. The client runs on the server with ip
10.81.13.173. The client can create a ephemeral node on zookeeper server and
write the host ip (10.81.13.173) in to the node as its data. There is only
one client process can be running at a time, because the client will listen
to a certain port.

It is strange that I found there were two ephemeral node with the ip
10.81.13.173 under znode se/diserver_tc.
*se/diserver_tc/diserver_tc67*
STAT:
czxid: 124554079820
mzxid: 124554079820
ctime: 1260609598547
mtime: 1260609598547
version: 0
cversion: 0
aversion: 0
ephemeralOwner: 226627854640480810
dataLength: 92
numChildren: 0
pzxid: 124554079820

*se/diserver_tc/diserver_tc95
*STAT:
czxid: 128849019107
mzxid: 128849019107
ctime: 1260772197356
mtime: 1260772197356
version: 0
cversion: 0
aversion: 0
ephemeralOwner: 154673159808876591
dataLength: 92
numChildren: 0
pzxid: 128849019107*
*
There are TWO with different session id! And after I kill the client process
on the server 10.81.13.173, the *se/diserver_tc/diserver_tc95 *node
disappear, but the *se/diserver_tc/diserver_tc67 *stay the same.
That means it is not my coding mistake to create the node twice. I checked
several times and I'm sure that there is no another client instance running.
And I use the 'stat' command to check the three zookeeper servers, and there
is no client from 10.81.13.173,

$echo stat | nc 10.81.12.144 2181
Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
Clients:
 /10.81.13.173:35676[1](queued=0,recved=0,sent=0) *# it is caused by the nc
process*

Latency min/avg/max: 0/3/254
Received: 11081
Sent: 0
Outstanding: 0
Zxid: 0x1e01f5
Mode: follower
*Node count: 32
*
$ echo stat | nc 10.81.12.141 2181
Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
Clients:
 /10.81.12.152:58110[1](queued=0,recved=10374,sent=0)
 /10.81.13.173:35677[1](queued=0,recved=0,sent=0) *# it is caused by the nc
process*

Latency min/avg/max: 0/0/37
Received: 37128
Sent: 0
Outstanding: 0
Zxid: 0x1e01f5
Mode: follower
*Node count: 26*

$ echo stat | nc 10.81.12.145 2181
Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
Clients:
 /10.81.12.153:19130[1](queued=0,recved=10624,sent=0)
 /10.81.13.173:35678[1](queued=0,recved=0,sent=0) *# it is caused by the nc
process*

Latency min/avg/max: 0/2/213
Received: 26700
Sent: 0
Outstanding: 0
Zxid: 0x1e01f5
Mode: leader
*Node count: 26*

The three 'stat' commands show different Node count! Just cannot understand
how it happened, can anyone give me some explanation about it?


-- 
With Regards!

Ye, Qian
Made in Zhejiang University


[jira] Commented: (ZOOKEEPER-628) the ephemeral node wouldn't disapper due to session close error

2009-12-15 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791121#action_12791121
 ] 

Qian Ye commented on ZOOKEEPER-628:
---

P.S.
se/diserver_tc/diserver_tc67 is only appear on the server 10.81.12.144, 
the one with the most Node count


 the ephemeral node wouldn't disapper due to session close error
 ---

 Key: ZOOKEEPER-628
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-628
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.1
 Environment: Linux 2.6.9 x86_64
Reporter: Qian Ye

 I find a very strange scenario today, I'm not sure how it happen, I just 
 found it like this. Maybe you can give me some information about it, my 
 Zookeeper Server is version 3.2.1.
 My Zookeeper cluster contains three servers, with ip: 
 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create ephemeral 
 node under znode: se/diserver_tc. The client runs on the server with ip 
 10.81.13.173. The client can create a ephemeral node on zookeeper server and 
 write the host ip (10.81.13.173) in to the node as its data. There is only 
 one client process can be running at a time, because the client will listen 
 to a certain port.
 It is strange that I found there were two ephemeral node with the ip 
 10.81.13.173 under znode se/diserver_tc.
 se/diserver_tc/diserver_tc67
 STAT:
 czxid: 124554079820
 mzxid: 124554079820
 ctime: 1260609598547
 mtime: 1260609598547
 version: 0
 cversion: 0
 aversion: 0
 ephemeralOwner: 226627854640480810
 dataLength: 92
 numChildren: 0
 pzxid: 124554079820
 se/diserver_tc/diserver_tc95
 STAT:
 czxid: 128849019107
 mzxid: 128849019107
 ctime: 1260772197356
 mtime: 1260772197356
 version: 0
 cversion: 0
 aversion: 0
 ephemeralOwner: 154673159808876591
 dataLength: 92
 numChildren: 0
 pzxid: 128849019107
 There are TWO with different session id! And after I kill the client process 
 on the server 10.81.13.173, the se/diserver_tc/diserver_tc95 node 
 disappear, but the se/diserver_tc/diserver_tc67 stay the same. That 
 means it is not my coding mistake to create the node twice. I checked several 
 times and I'm sure that there is no another client instance running. And I 
 use the 'stat' command to check the three zookeeper servers, and there is no 
 client from 10.81.13.173,
 $echo stat | nc 10.81.12.144 2181   
 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
 Clients:
  /10.81.13.173:35676[1](queued=0,recved=0,sent=0) # it is caused by the nc 
 process
 Latency min/avg/max: 0/3/254
 Received: 11081
 Sent: 0
 Outstanding: 0
 Zxid: 0x1e01f5
 Mode: follower
 Node count: 32
 $ echo stat | nc 10.81.12.141 2181
 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
 Clients:
  /10.81.12.152:58110[1](queued=0,recved=10374,sent=0)
  /10.81.13.173:35677[1](queued=0,recved=0,sent=0) # it is caused by the nc 
 process
 Latency min/avg/max: 0/0/37
 Received: 37128
 Sent: 0
 Outstanding: 0
 Zxid: 0x1e01f5
 Mode: follower
 Node count: 26
 $ echo stat | nc 10.81.12.145 2181
 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
 Clients:
  /10.81.12.153:19130[1](queued=0,recved=10624,sent=0)
  /10.81.13.173:35678[1](queued=0,recved=0,sent=0) # it is caused by the nc 
 process
 Latency min/avg/max: 0/2/213
 Received: 26700
 Sent: 0
 Outstanding: 0
 Zxid: 0x1e01f5
 Mode: leader
 Node count: 26
 The three 'stat' commands show different Node count! 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: A very strange scenario, may due to some bug on the server side

2009-12-15 Thread Qian Ye
I have opened a jira for the issue:
https://issues.apache.org/jira/browse/ZOOKEEPER-628
and my zookeeper version is 3.2.1, the se/diserver_tc/diserver_
tc67 only appear on the server 10.81.12.144.

I will attach the additional information to the jira.

thx

On Wed, Dec 16, 2009 at 1:57 AM, Patrick Hunt ph...@apache.org wrote:

 You might also try the dump command for all 3 servers (similar to the
 stat command - it's a 4letterword) and look at it's output -- it includes
 information on ephemeral nodes.

 Patrick


 Qian Ye wrote:

 Hi guys:

 I find a very strange scenario today, I'm not sure how it happen, I just
 found it like this. Maybe you can give me some information about it, my
 Zookeeper Server is version 3.2.1.

 My Zookeeper cluster contains three servers, with ip:
 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create
 ephemeral
 node under znode: *se/diserver_tc*. The client runs on the server with ip
 10.81.13.173. The client can create a ephemeral node on zookeeper server
 and
 write the host ip (10.81.13.173) in to the node as its data. There is only
 one client process can be running at a time, because the client will
 listen
 to a certain port.

 It is strange that I found there were two ephemeral node with the ip
 10.81.13.173 under znode se/diserver_tc.
 *se/diserver_tc/diserver_tc67*
 STAT:
czxid: 124554079820
mzxid: 124554079820
ctime: 1260609598547
mtime: 1260609598547
version: 0
cversion: 0
aversion: 0
ephemeralOwner: 226627854640480810
dataLength: 92
numChildren: 0
pzxid: 124554079820

 *se/diserver_tc/diserver_tc95
 *STAT:
czxid: 128849019107
mzxid: 128849019107
ctime: 1260772197356
mtime: 1260772197356
version: 0
cversion: 0
aversion: 0
ephemeralOwner: 154673159808876591
dataLength: 92
numChildren: 0
pzxid: 128849019107*
 *
 There are TWO with different session id! And after I kill the client
 process
 on the server 10.81.13.173, the *se/diserver_tc/diserver_tc95
 *node
 disappear, but the *se/diserver_tc/diserver_tc67 *stay the same.
 That means it is not my coding mistake to create the node twice. I checked
 several times and I'm sure that there is no another client instance
 running.
 And I use the 'stat' command to check the three zookeeper servers, and
 there
 is no client from 10.81.13.173,

 $echo stat | nc 10.81.12.144 2181
 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
 Clients:
  /10.81.13.173:35676[1](queued=0,recved=0,sent=0) *# it is caused by the
 nc
 process*

 Latency min/avg/max: 0/3/254
 Received: 11081
 Sent: 0
 Outstanding: 0
 Zxid: 0x1e01f5
 Mode: follower
 *Node count: 32
 *
 $ echo stat | nc 10.81.12.141 2181
 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
 Clients:
  /10.81.12.152:58110[1](queued=0,recved=10374,sent=0)
  /10.81.13.173:35677[1](queued=0,recved=0,sent=0) *# it is caused by the
 nc
 process*

 Latency min/avg/max: 0/0/37
 Received: 37128
 Sent: 0
 Outstanding: 0
 Zxid: 0x1e01f5
 Mode: follower
 *Node count: 26*

 $ echo stat | nc 10.81.12.145 2181
 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
 Clients:
  /10.81.12.153:19130[1](queued=0,recved=10624,sent=0)
  /10.81.13.173:35678[1](queued=0,recved=0,sent=0) *# it is caused by the
 nc
 process*

 Latency min/avg/max: 0/2/213
 Received: 26700
 Sent: 0
 Outstanding: 0
 Zxid: 0x1e01f5
 Mode: leader
 *Node count: 26*

 The three 'stat' commands show different Node count! Just cannot
 understand
 how it happened, can anyone give me some explanation about it?





-- 
With Regards!

Ye, Qian
Made in Zhejiang University


[jira] Commented: (ZOOKEEPER-628) the ephemeral node wouldn't disapper due to session close error

2009-12-15 Thread Qian Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791134#action_12791134
 ] 

Qian Ye commented on ZOOKEEPER-628:
---

the 'dump' information for the three servers:

$ echo dump | nc 10.81.12.141 2181 
SessionTracker dump: 
org.apache.zookeeper.server.quorum.followersessiontrac...@5d684e26
ephemeral nodes dump:
Sessions with Ephemerals (3):
0x3258bdcc1c10068:
/se/fras/fras79
/se/fras_tc/fras_tc77
0x225826b5ab2002f:
/se/diserver/diserver000114
/se/diserver_tc/diserver_tc94
0x125826b5c1f0019:
/se/diserver_tc/diserver_tc98
/se/diserver/diserver000118

$ echo dump | nc 10.81.12.144 2181 
SessionTracker dump: 
org.apache.zookeeper.server.quorum.followersessiontrac...@62ebcdbb
ephemeral nodes dump:
Sessions with Ephemerals (7):
0x3258bdcc1c10068:
/se/fras/fras79
/se/fras_tc/fras_tc77
0x3258bc635750001:
/se/diserver_tc/diserver_tc76
b0x32524d5440e022a:/b
b/se/diserver_tc/diserver_tc67/b
0x3258bc63575:
/se/fras/fras49
/se/fras_tc/fras_tc49
0x225826b5ab2002f:
/se/diserver/diserver000114
/se/diserver_tc/diserver_tc94
0x125826b5c1f0019:
/se/diserver_tc/diserver_tc98
/se/diserver/diserver000118
0x225826b5ab20011:
/se/diserver_tc/diserver_tc81
/se/diserver/diserver000107

$ echo dump | nc 10.81.12.145 2181 
SessionTracker dump: 
Session Sets (9):
0 expire at Wed Dec 16 10:05:08 CST 2009:
0 expire at Wed Dec 16 10:05:10 CST 2009:
0 expire at Wed Dec 16 10:05:14 CST 2009:
0 expire at Wed Dec 16 10:05:18 CST 2009:
0 expire at Wed Dec 16 10:05:20 CST 2009:
0 expire at Wed Dec 16 10:05:24 CST 2009:
1 expire at Wed Dec 16 10:05:28 CST 2009:
82615565794869273
1 expire at Wed Dec 16 10:05:30 CST 2009:
226741136511795304
1 expire at Wed Dec 16 10:05:34 CST 2009:
154673159808876591

ephemeral nodes dump:
Sessions with Ephemerals (3):
0x3258bdcc1c10068:
/se/fras/fras79
/se/fras_tc/fras_tc77
0x225826b5ab2002f:
/se/diserver/diserver000114
/se/diserver_tc/diserver_tc94
0x125826b5c1f0019:
/se/diserver_tc/diserver_tc98
/se/diserver/diserver000118

It seems that the server 10.81.12.144 still keep lots of sessions which should 
have be expired 

 the ephemeral node wouldn't disapper due to session close error
 ---

 Key: ZOOKEEPER-628
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-628
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.1
 Environment: Linux 2.6.9 x86_64
Reporter: Qian Ye

 I find a very strange scenario today, I'm not sure how it happen, I just 
 found it like this. Maybe you can give me some information about it, my 
 Zookeeper Server is version 3.2.1.
 My Zookeeper cluster contains three servers, with ip: 
 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create ephemeral 
 node under znode: se/diserver_tc. The client runs on the server with ip 
 10.81.13.173. The client can create a ephemeral node on zookeeper server and 
 write the host ip (10.81.13.173) in to the node as its data. There is only 
 one client process can be running at a time, because the client will listen 
 to a certain port.
 It is strange that I found there were two ephemeral node with the ip 
 10.81.13.173 under znode se/diserver_tc.
 se/diserver_tc/diserver_tc67
 STAT:
 czxid: 124554079820
 mzxid: 124554079820
 ctime: 1260609598547
 mtime: 1260609598547
 version: 0
 cversion: 0
 aversion: 0
 ephemeralOwner: 226627854640480810
 dataLength: 92
 numChildren: 0
 pzxid: 124554079820
 se/diserver_tc/diserver_tc95
 STAT:
 czxid: 128849019107
 mzxid: 128849019107
 ctime: 1260772197356
 mtime: 1260772197356
 version: 0
 cversion: 0
 aversion: 0
 ephemeralOwner: 154673159808876591
 dataLength: 92
 numChildren: 0
 pzxid: 128849019107
 There are TWO with different session id! And after I kill the client process 
 on the server 10.81.13.173, the se/diserver_tc/diserver_tc95 node 
 disappear, but the se/diserver_tc/diserver_tc67 stay the same. That 
 means it is not my coding mistake to create the node twice. I checked several 
 times and I'm sure that there is no another client instance running. And I 
 use the 'stat' command to check the three zookeeper servers, and there is no 
 client from 10.81.13.173,
 $echo stat | nc 10.81.12.144 2181   
 Zookeeper version

Re: A very strange scenario, may due to some bug on the server side

2009-12-15 Thread Qian Ye
Sorry, my friend wrote a wrong log4j.properties, only record the log above
the WARN level. Will it help if I correct the log4j.properties and restart
the zookeeper server on 10.81.12.144. Will the information about session
0x32524d5440e022a be recorded in this way?

On Wed, Dec 16, 2009 at 1:46 AM, Mahadev Konar maha...@yahoo-inc.comwrote:

 Hi Qian,
  This is quite weird. Are you sure the version is 3.2.1?
   If yes, please create a jira for this.

  Also, can you extract the server logs for the session


  ephemeralOwner: 226627854640480810

 And post it on a jira? Ephemeral Owner is the session id. You can convert
 the above number to hex and look through the logs to see what happened to
 this session and post the logs on the jira. Looks like the session close
 for
 the session (226627854640480810) wasn't successful (a bug mostly). So we
 need to trace back on what happened on a close of this session and why it
 did not close.

 Grepping all the server logs for session id (0x32524d5440e022a, this is the
 hex of the the above decimal number) might give us some insight into this.


 Thanks
 mahadev

 On 12/15/09 7:44 AM, Benjamin Reed br...@yahoo-inc.com wrote:

  does  se/diserver_tc/diserver_tc67 appear on all three servers?
 
  ben
 
  Qian Ye wrote:
  Hi guys:
 
  I find a very strange scenario today, I'm not sure how it happen, I just
  found it like this. Maybe you can give me some information about it, my
  Zookeeper Server is version 3.2.1.
 
  My Zookeeper cluster contains three servers, with ip:
  10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create
 ephemeral
  node under znode: *se/diserver_tc*. The client runs on the server with
 ip
  10.81.13.173. The client can create a ephemeral node on zookeeper server
 and
  write the host ip (10.81.13.173) in to the node as its data. There is
 only
  one client process can be running at a time, because the client will
 listen
  to a certain port.
 
  It is strange that I found there were two ephemeral node with the ip
  10.81.13.173 under znode se/diserver_tc.
  *se/diserver_tc/diserver_tc67*
  STAT:
  czxid: 124554079820
  mzxid: 124554079820
  ctime: 1260609598547
  mtime: 1260609598547
  version: 0
  cversion: 0
  aversion: 0
  ephemeralOwner: 226627854640480810
  dataLength: 92
  numChildren: 0
  pzxid: 124554079820
 
  *se/diserver_tc/diserver_tc95
  *STAT:
  czxid: 128849019107
  mzxid: 128849019107
  ctime: 1260772197356
  mtime: 1260772197356
  version: 0
  cversion: 0
  aversion: 0
  ephemeralOwner: 154673159808876591
  dataLength: 92
  numChildren: 0
  pzxid: 128849019107*
  *
  There are TWO with different session id! And after I kill the client
 process
  on the server 10.81.13.173, the *se/diserver_tc/diserver_tc95
 *node
  disappear, but the *se/diserver_tc/diserver_tc67 *stay the same.
  That means it is not my coding mistake to create the node twice. I
 checked
  several times and I'm sure that there is no another client instance
 running.
  And I use the 'stat' command to check the three zookeeper servers, and
 there
  is no client from 10.81.13.173,
 
  $echo stat | nc 10.81.12.144 2181
  Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
  Clients:
   /10.81.13.173:35676[1](queued=0,recved=0,sent=0) *# it is caused by
 the nc
  process*
 
  Latency min/avg/max: 0/3/254
  Received: 11081
  Sent: 0
  Outstanding: 0
  Zxid: 0x1e01f5
  Mode: follower
  *Node count: 32
  *
  $ echo stat | nc 10.81.12.141 2181
  Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
  Clients:
   /10.81.12.152:58110[1](queued=0,recved=10374,sent=0)
   /10.81.13.173:35677[1](queued=0,recved=0,sent=0) *# it is caused by
 the nc
  process*
 
  Latency min/avg/max: 0/0/37
  Received: 37128
  Sent: 0
  Outstanding: 0
  Zxid: 0x1e01f5
  Mode: follower
  *Node count: 26*
 
  $ echo stat | nc 10.81.12.145 2181
  Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
  Clients:
   /10.81.12.153:19130[1](queued=0,recved=10624,sent=0)
   /10.81.13.173:35678[1](queued=0,recved=0,sent=0) *# it is caused by
 the nc
  process*
 
  Latency min/avg/max: 0/2/213
  Received: 26700
  Sent: 0
  Outstanding: 0
  Zxid: 0x1e01f5
  Mode: leader
  *Node count: 26*
 
  The three 'stat' commands show different Node count! Just cannot
 understand
  how it happened, can anyone give me some explanation about it?
 
 
 
 




-- 
With Regards!

Ye, Qian
Made in Zhejiang University


The C Client cause core dump in some situation

2009-12-14 Thread Qian Ye
Hi guys:

I encountered a problem today that the Zookeeper C Client (version 3.2.0)
core dump when reconnected and did some operations on the zookeeper server
which just restarted. The gdb infomation is like:

(gdb) bt
#0  0x00302af71900 in memcpy () from /lib64/tls/libc.so.6
#1  0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not
available.) at src/recordio.c:270
#2  0x0047ed20 in deserialize_CreateResponse (in=0x9cd870,
tag=0x50a74e reply, v=0x409ffe70) at generated/zookeeper.jute.c:679
#3  0x0047a1d0 in zookeeper_process (zh=0x9c8c70, events=Variable
events is not available.) at src/zookeeper.c:1895
#4  0x004815e6 in do_io (v=Variable v is not available.) at
src/mt_adaptor.c:310
#5  0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#6  0x00302afc6003 in clone () from /lib64/tls/libc.so.6
#7  0x in ?? ()
(gdb) f 1
#1  0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not
available.) at src/recordio.c:270
270 in src/recordio.c
(gdb) info locals
priv = (struct buff_struct *) 0x9cd8d0
*len = -1*
rc = Variable rc is not available.

According to the source code,
int ia_deserialize_string(struct iarchive *ia, const char *name, char **s)
{
struct buff_struct *priv = ia-priv;
int32_t len;
*int rc = ia_deserialize_int(ia, len, len);*
if (rc  0)
return rc;
if ((priv-len - priv-off)  len) {
return -E2BIG;
}
*s = malloc(len+1);
if (!*s) {
return -ENOMEM;
}
memcpy(*s, priv-buffer+priv-off, len);
(*s)[len] = '\0';
priv-off += len;
return 0;
}

the variable len is set by ia_deserialize_int, and the returned len doesn't
been checked, so the client segment fault when trying to memcpy -1 byte
data.
I'm not sure why the client got the len variable -1 when deserialize the
response from the server, I'm also not sure whether it is an known issue.
Could any
one give me some information about this problem?

-- 
With Regards!

Ye, Qian
Made in Zhejiang University


[jira] Created: (ZOOKEEPER-624) The C Client cause core dump when receive error data from Zookeeper Server

2009-12-14 Thread Qian Ye (JIRA)
The C Client cause core dump when receive error data from Zookeeper Server
--

 Key: ZOOKEEPER-624
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-624
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.2.0
 Environment: Linux 2.6.9 x86_64
Reporter: Qian Ye


I encountered a problem today that the Zookeeper C Client (version 3.2.0) core 
dump when reconnected and did some operations on the zookeeper server which 
just restarted. The gdb infomation is like:

(gdb) bt
#0  0x00302af71900 in memcpy () from /lib64/tls/libc.so.6
#1  0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not 
available.) at src/recordio.c:270
#2  0x0047ed20 in deserialize_CreateResponse (in=0x9cd870, tag=0x50a74e 
reply, v=0x409ffe70) at generated/zookeeper.jute.c:679
#3  0x0047a1d0 in zookeeper_process (zh=0x9c8c70, events=Variable 
events is not available.) at src/zookeeper.c:1895
#4  0x004815e6 in do_io (v=Variable v is not available.) at 
src/mt_adaptor.c:310
#5  0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#6  0x00302afc6003 in clone () from /lib64/tls/libc.so.6
#7  0x in ?? ()
(gdb) f 1
#1  0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not 
available.) at src/recordio.c:270
270 in src/recordio.c
(gdb) info locals
priv = (struct buff_struct *) 0x9cd8d0
len = -1
rc = Variable rc is not available.

According to the source code,
int ia_deserialize_string(struct iarchive *ia, const char *name, char **s)
{
struct buff_struct *priv = ia-priv;
int32_t len;
int rc = ia_deserialize_int(ia, len, len);
if (rc  0)
return rc;
if ((priv-len - priv-off)  len) {
return -E2BIG;
}
*s = malloc(len+1);
if (!*s) {
return -ENOMEM;
}
memcpy(*s, priv-buffer+priv-off, len);
(*s)[len] = '\0';
priv-off += len;
return 0;
}

the variable len is set by ia_deserialize_int, and the returned len doesn't 
been checked, so the client segment fault when trying to memcpy -1 byte data.
In the source file recordio.c, there are many functions which don't check the 
returned len. They all might cause segment fault in some kind of  situations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: The C Client cause core dump in some situation

2009-12-14 Thread Qian Ye
Hi Mahadev:

I have created a jira for this issue
https://issues.apache.org/jira/browse/ZOOKEEPER-624.
And so far, I haven't found the way to reproduce the segment fault. I tried
about 10 times the same operations and only produced the core dump 1 time.
I would attach the way to the jira if I can find.

Thx


On Tue, Dec 15, 2009 at 1:53 AM, Mahadev Konar maha...@yahoo-inc.comwrote:

 Hi Qian,
  The code that you mention still exists in the trunk and does not check for
 the len before calling memcpy. Please open a jira on this.

 The interesting thing though is that the len is -1. Do you have any test
 case or  a test scenario where it can be reproduced. It would be
 interesting
 to see why this is happening. We should not be getting a -1 len value from
 the server.


 Thanks
 mahadev


 On 12/14/09 6:19 AM, Qian Ye yeqian@gmail.com wrote:

  Hi guys:
 
  I encountered a problem today that the Zookeeper C Client (version 3.2.0)
  core dump when reconnected and did some operations on the zookeeper
 server
  which just restarted. The gdb infomation is like:
 
  (gdb) bt
  #0  0x00302af71900 in memcpy () from /lib64/tls/libc.so.6
  #1  0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not
  available.) at src/recordio.c:270
  #2  0x0047ed20 in deserialize_CreateResponse (in=0x9cd870,
  tag=0x50a74e reply, v=0x409ffe70) at generated/zookeeper.jute.c:679
  #3  0x0047a1d0 in zookeeper_process (zh=0x9c8c70, events=Variable
  events is not available.) at src/zookeeper.c:1895
  #4  0x004815e6 in do_io (v=Variable v is not available.) at
  src/mt_adaptor.c:310
  #5  0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0
  #6  0x00302afc6003 in clone () from /lib64/tls/libc.so.6
  #7  0x in ?? ()
  (gdb) f 1
  #1  0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not
  available.) at src/recordio.c:270
  270 in src/recordio.c
  (gdb) info locals
  priv = (struct buff_struct *) 0x9cd8d0
  *len = -1*
  rc = Variable rc is not available.
 
  According to the source code,
  int ia_deserialize_string(struct iarchive *ia, const char *name, char
 **s)
  {
  struct buff_struct *priv = ia-priv;
  int32_t len;
  *int rc = ia_deserialize_int(ia, len, len);*
  if (rc  0)
  return rc;
  if ((priv-len - priv-off)  len) {
  return -E2BIG;
  }
  *s = malloc(len+1);
  if (!*s) {
  return -ENOMEM;
  }
  memcpy(*s, priv-buffer+priv-off, len);
  (*s)[len] = '\0';
  priv-off += len;
  return 0;
  }
 
  the variable len is set by ia_deserialize_int, and the returned len
 doesn't
  been checked, so the client segment fault when trying to memcpy -1 byte
  data.
  I'm not sure why the client got the len variable -1 when deserialize the
  response from the server, I'm also not sure whether it is an known issue.
  Could any
  one give me some information about this problem?




-- 
With Regards!

Ye, Qian
Made in Zhejiang University


Re: The C Client cause core dump in some situation

2009-12-14 Thread Qian Ye
Yes, I use valgrind, i will try.

On Tue, Dec 15, 2009 at 1:02 PM, Patrick Hunt ph...@apache.org wrote:

 Did you try using valgrind? That might help reproduce.


 Qian Ye wrote:

 Hi Mahadev:

 I have created a jira for this issue
 https://issues.apache.org/jira/browse/ZOOKEEPER-624.
 And so far, I haven't found the way to reproduce the segment fault. I
 tried
 about 10 times the same operations and only produced the core dump 1 time.
 I would attach the way to the jira if I can find.

 Thx


 On Tue, Dec 15, 2009 at 1:53 AM, Mahadev Konar maha...@yahoo-inc.com
 wrote:

  Hi Qian,
  The code that you mention still exists in the trunk and does not check
 for
 the len before calling memcpy. Please open a jira on this.

 The interesting thing though is that the len is -1. Do you have any test
 case or  a test scenario where it can be reproduced. It would be
 interesting
 to see why this is happening. We should not be getting a -1 len value
 from
 the server.


 Thanks
 mahadev


 On 12/14/09 6:19 AM, Qian Ye yeqian@gmail.com wrote:

  Hi guys:

 I encountered a problem today that the Zookeeper C Client (version
 3.2.0)
 core dump when reconnected and did some operations on the zookeeper

 server

 which just restarted. The gdb infomation is like:

 (gdb) bt
 #0  0x00302af71900 in memcpy () from /lib64/tls/libc.so.6
 #1  0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not
 available.) at src/recordio.c:270
 #2  0x0047ed20 in deserialize_CreateResponse (in=0x9cd870,
 tag=0x50a74e reply, v=0x409ffe70) at generated/zookeeper.jute.c:679
 #3  0x0047a1d0 in zookeeper_process (zh=0x9c8c70,
 events=Variable
 events is not available.) at src/zookeeper.c:1895
 #4  0x004815e6 in do_io (v=Variable v is not available.) at
 src/mt_adaptor.c:310
 #5  0x00302b80610a in start_thread () from
 /lib64/tls/libpthread.so.0
 #6  0x00302afc6003 in clone () from /lib64/tls/libc.so.6
 #7  0x in ?? ()
 (gdb) f 1
 #1  0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not
 available.) at src/recordio.c:270
 270 in src/recordio.c
 (gdb) info locals
 priv = (struct buff_struct *) 0x9cd8d0
 *len = -1*
 rc = Variable rc is not available.

 According to the source code,
 int ia_deserialize_string(struct iarchive *ia, const char *name, char

 **s)

 {
struct buff_struct *priv = ia-priv;
int32_t len;
*int rc = ia_deserialize_int(ia, len, len);*
if (rc  0)
return rc;
if ((priv-len - priv-off)  len) {
return -E2BIG;
}
*s = malloc(len+1);
if (!*s) {
return -ENOMEM;
}
memcpy(*s, priv-buffer+priv-off, len);
(*s)[len] = '\0';
priv-off += len;
return 0;
 }

 the variable len is set by ia_deserialize_int, and the returned len

 doesn't

 been checked, so the client segment fault when trying to memcpy -1 byte
 data.
 I'm not sure why the client got the len variable -1 when deserialize the
 response from the server, I'm also not sure whether it is an known
 issue.
 Could any
 one give me some information about this problem?







-- 
With Regards!

Ye, Qian
Made in Zhejiang University


[jira] Created: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2009-12-07 Thread Qian Ye (JIRA)
Make Zookeeper C client can be compiled by gcc of early version
---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye


The original C Client, Version 3.2.1, cannot be compiled successfully by the 
gcc of early version, due some declaration restriction. To compile the source 
code on the server with gcc of early version, I made some modification on the 
original source. What's more, some extra codes are added to make the client be 
compatible with the hosts list format: ip1:port1, ip2:port2... There is often a 
space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version

2009-12-07 Thread Qian Ye (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Ye updated ZOOKEEPER-612:
--

Attachment: patch

can be compiled by gcc 2.96

 Make Zookeeper C client can be compiled by gcc of early version
 ---

 Key: ZOOKEEPER-612
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux
Reporter: Qian Ye
 Attachments: patch


 The original C Client, Version 3.2.1, cannot be compiled successfully by the 
 gcc of early version, due some declaration restriction. To compile the source 
 code on the server with gcc of early version, I made some modification on the 
 original source. What's more, some extra codes are added to make the client 
 be compatible with the hosts list format: ip1:port1, ip2:port2... There is 
 often a space after this kind of comma.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-591) The C Client cannot exit properly in some situation

2009-11-23 Thread Qian Ye (JIRA)
The C Client cannot exit properly in some situation
---

 Key: ZOOKEEPER-591
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-591
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 
14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux 
Reporter: Qian Ye


The following code produce a situation, where the C Client can not exit 
properly,

#include include/zookeeper.h

void default_zoo_watcher(zhandle_t *zzh, int type, int state, const char *path, 
void* context){
int zrc = 0;
struct String_vector str_vec = {0, NULL};
printf(in the default_zoo_watcher\n);
zrc = zoo_wget_children(zzh, /mytest, default_zoo_watcher, NULL, 
str_vec);
printf(zoo_wget_children, error: %d\n, zrc);

return;
}

int main()
{
int zrc = 0;
int buff_len = 10; 
char buff[10] = hello;
char path[512];
struct Stat stat;
struct String_vector str_vec = {0, NULL};

zhandle_t *zh = zookeeper_init(10.81.20.62:2181, NULL, 3, 0, 0, 0); 
zrc = zoo_create(zh, /mytest, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 
512);
printf(zoo_create, error: %d\n, zrc);

zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, str_vec);
printf(zoo_wget_children, error: %d\n, zrc);

zrc = zoo_create(zh, /mytest/test1, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, 
path, 512);
printf(zoo_create, error: %d\n, zrc);

zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, str_vec);
printf(zoo_wget_children, error: %d\n, zrc);

zrc = zoo_delete(zh, /mytest/test1, -1);

printf(zoo_delete, error: %d\n, zrc);
zookeeper_close(zh);
return 0;
}


running this code can cause the program hang at zookeeper_close(zh);(line 38). 
using gdb to attach the process, I found that the main thread is waiting for 
do_completion thread to finish,
(gdb) bt
#0  0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0
#1  0x0040de3b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:219
#2  0x004060ba in zookeeper_close (zh=0x515b60) at src/zookeeper.c:2100
#3  0x0040220b in main ()

and the thread which handle the zoo_wget_children(in the default_zoo_watcher) 
is waiting for sc-cond. 
(gdb) thread 2
[Switching to thread 2 (Thread 1094719840 (LWP 25093))]#0  0x00302b8089aa 
in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/tls/libpthread.so.0
(gdb) bt
#0  0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/tls/libpthread.so.0
#1  0x0040d88b in wait_sync_completion (sc=0x5167f0) at 
src/mt_adaptor.c:82
#2  0x004082c9 in zoo_wget_children (zh=0x515b60, path=0x40ebc0 
/mytest, watcher=0x401fd8 default_zoo_watcher, watcherCtx=Variable 
watcherCtx is not available.)
at src/zookeeper.c:2884
#3  0x00402037 in default_zoo_watcher ()
#4  0x0040d664 in deliverWatchers (zh=0x515b60, type=4, state=3, 
path=0x515100 /mytest, list=0x5177d8) at src/zk_hashtable.c:274
#5  0x00403861 in process_completions (zh=0x515b60) at 
src/zookeeper.c:1631
#6  0x0040e1b5 in do_completion (v=Variable v is not available.) at 
src/mt_adaptor.c:333
#7  0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#8  0x00302afc6003 in clone () from /lib64/tls/libc.so.6
#9  0x in ?? ()

here, a deadlock presents.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-589) When create a znode, a NULL ACL parameter cannot be accepted

2009-11-22 Thread Qian Ye (JIRA)
When create a znode, a NULL ACL parameter cannot be accepted


 Key: ZOOKEEPER-589
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-589
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.2.1
 Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 
14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux

Reporter: Qian Ye


In the comments of client C API which associated with creating znode, eg. 
zoo_acreate, it is said that the initial ACL of the node if null, the ACL of 
the parent will be used. However, the it doesn't work. When execute this kind 
of request at the server side, it raises InvalidACLException. The source code 
show that, the function fixupACL return false when it get a null ACL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



How to build Zookeeper Server in Eclipse?

2009-09-23 Thread Qian Ye
Hi all:

Recently, I built a system for resource discovery based on Zookeeper. It
works well, which makes me want to study the internals of Zookeeper Server.
However, I'm a greenhorn to Java. After loading the source into eclipse and
reading the code for several days, I still don't no how to build the
project(Eclipse showed that there are lots of errors and warnings). Could
anyone help me out? Is these a guide for building the zookeeper server?

Thanks~

-- 
With Regards!

Ye, Qian
Made in Zhejiang University


[jira] Created: (ZOOKEEPER-515) Zookeeper quorum didn't provide service when restart after an Out of memory crash

2009-08-25 Thread Qian Ye (JIRA)
Zookeeper quorum didn't provide service when restart after an Out of memory 
crash
---

 Key: ZOOKEEPER-515
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-515
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.0
 Environment: Linux 2.6.9-52bs-4core #2 SMP Wed Jan 16 14:44:08 EST 
2008 x86_64 x86_64 x86_64 GNU/Linux
Jdk: 1.6.0_14 
Reporter: Qian Ye


The Zookeeper quorum, containing 5 servers, didn't provide service when restart 
after an Out of memory crash. 

It happened as following:
1. we built  a Zookeeper quorum which contained  5 servers, say 1, 3, 4, 5, 6 
(have no 2), and 6 was the leader.
2. we created 18 threads on 6 different servers to set and get data from a 
znode in the Zookeeper at the same time.  The size of the data is 1MB. The test 
threads did their job as fast as possible, no pause between two operation, and 
they repeated the setting and getting 4000 times. 
3. the Zookeeper leader crashed about 10 mins  after the test threads started. 
The leader printed out the log:

2009-08-25 12:00:12,301 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] - 
Exception causing close of session 0x523
4223c2dc00b5 due to java.io.IOException: Read error
2009-08-25 12:00:12,318 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] - 
Exception causing close of session 0x523
4223c2dc00b6 due to java.io.IOException: Read error
2009-08-25 12:03:44,086 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] - 
Exception causing close of session 0x523
4223c2dc00b8 due to java.io.IOException: Read error
2009-08-25 12:04:53,757 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] - 
Exception causing close of session 0x523
4223c2dc00b7 due to java.io.IOException: Read error
2009-08-25 12:15:45,151 - FATAL [SyncThread:0:syncrequestproces...@131] - 
Severe unrecoverable error, exiting
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
at org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55)
at org.apache.zookeeper.txn.SetDataTxn.serialize(SetDataTxn.java:42)
at 
org.apache.zookeeper.server.persistence.Util.marshallTxnEntry(Util.java:262)
at 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:154)
at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:268)
at 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:100)

It is clear that the leader ran out of memory. then the server 4 was down 
almost at the same time, and printed out the log:
2009-08-25 12:15:45,995 - ERROR 
[FollowerRequestProcessor:3:followerrequestproces...@91] - Unexpected exception 
causing
exit
java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
org.apache.jute.BinaryOutputArchive.writeBuffer(BinaryOutputArchive.java:119)
at 
org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:51)
at 
org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
at org.apache.zookeeper.server.quorum.Follower.writePacket(Follower.java:97)
at org.apache.zookeeper.server.quorum.Follower.request(Follower.java:399)
at 
org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:86)
2009-08-25 12:15:45,996 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] - 
Exception causing close of session 0x423
4ab894330075 due to java.net.SocketException: Broken pipe
2009-08-25 12:15:45,996 - FATAL [SyncThread:3:syncrequestproces...@131] - 
Severe unrecoverable error, exiting
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at 
org.apache.zookeeper.server.quorum.Follower.writePacket(Follower.java:100)
at 
org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:52)
at 
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:147)
at 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:92