Re: About symbol table of Zookeeper c client
Hi mahadev, thx for the infomation, i'm running zookeeper 3.3.0. On Sat, Sep 4, 2010 at 1:22 AM, Mahadev Konar maha...@yahoo-inc.com wrote: Path and Qian, This was fixed in https://issues.apache.org/jira/browse/ZOOKEEPER-604 I have marked ZOOKEEPER-295 referencing ZOOKEEPER-604. Qian, what version of zookeeper are you running? Thanks mahadev On 9/3/10 9:51 AM, Patrick Hunt ph...@apache.org wrote: This is a long standing issue slated for 4.0 https://issues.apache.org/jira/browse/ZOOKEEPER-295 Mahadev had done some work to reduce the exported symbols as part of 3.3, perhaps this slipped through the net? Mahadev - can we address this using the current mechanism? https://issues.apache.org/jira/browse/ZOOKEEPER-295Patrick On Thu, Sep 2, 2010 at 7:37 AM, Qian Ye yeqian@gmail.com wrote: Hi all: I'm writing a application in C which need to link both memcached's lib and zookeeper's c client lib. I found a symbol table conflict, because both libs provide implmentation(recordio.h/c) of function htonll. It seems that some functions of zookeeper c client, which can be accessed externally but uesd internally, have simple names. I think it will bring much symbol table confilct from time to time, and we should do something about it, e.g. add a specific prefix to these funcitons. thx -- With Regards! Ye, Qian -- With Regards! Ye, Qian
About symbol table of Zookeeper c client
Hi all: I'm writing a application in C which need to link both memcached's lib and zookeeper's c client lib. I found a symbol table conflict, because both libs provide implmentation(recordio.h/c) of function htonll. It seems that some functions of zookeeper c client, which can be accessed externally but uesd internally, have simple names. I think it will bring much symbol table confilct from time to time, and we should do something about it, e.g. add a specific prefix to these funcitons. thx -- With Regards! Ye, Qian
[jira] Commented: (ZOOKEEPER-797) c client source with AI_ADDRCONFIG cannot be compiled with early glibc
[ https://issues.apache.org/jira/browse/ZOOKEEPER-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883603#action_12883603 ] Qian Ye commented on ZOOKEEPER-797: --- Hi guys, do I need to add any tests for this patch? c client source with AI_ADDRCONFIG cannot be compiled with early glibc -- Key: ZOOKEEPER-797 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-797 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Environment: linux 2.6.9 Reporter: Qian Ye Attachments: ZOOKEEPER-797.patch c client source with AI_ADDRCONFIG cannot be compiled with early glibc (before 2.3.3) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-797) c client source with AI_ADDRCONFIG cannot be compiled with early glibc
[ https://issues.apache.org/jira/browse/ZOOKEEPER-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-797: -- Status: Patch Available (was: Open) c client source with AI_ADDRCONFIG cannot be compiled with early glibc -- Key: ZOOKEEPER-797 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-797 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Environment: linux 2.6.9 Reporter: Qian Ye Attachments: ZOOKEEPER-797.patch c client source with AI_ADDRCONFIG cannot be compiled with early glibc (before 2.3.3) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-797) c client source with AI_ADDRCONFIG cannot be compiled with early glibc
c client source with AI_ADDRCONFIG cannot be compiled with early glibc -- Key: ZOOKEEPER-797 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-797 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Environment: linux 2.6.9 Reporter: Qian Ye Attachments: ZOOKEEPER-797.patch c client source with AI_ADDRCONFIG cannot be compiled with early glibc (before 2.3.3) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-797) c client source with AI_ADDRCONFIG cannot be compiled with early glibc
[ https://issues.apache.org/jira/browse/ZOOKEEPER-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-797: -- Attachment: ZOOKEEPER-797.patch c client source with AI_ADDRCONFIG cannot be compiled with early glibc -- Key: ZOOKEEPER-797 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-797 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Environment: linux 2.6.9 Reporter: Qian Ye Attachments: ZOOKEEPER-797.patch c client source with AI_ADDRCONFIG cannot be compiled with early glibc (before 2.3.3) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init
[ https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871474#action_12871474 ] Qian Ye commented on ZOOKEEPER-779: --- OK Patrick, however I'm really busy these days, it may take a week or two before I can make it done. C Client should check the connectivity to the hosts in zookeeper_init - Key: ZOOKEEPER-779 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Qian Ye Attachments: ZOOKEEPER-779.patch, ZOOKEEPER-779.patch In some scenario, whether the client can connect to zookeeper servers is used as a logic condition. If the client cannot connect to the servers, the program should turn to another fork. However, current zookeeper_init could not tell whether the client can connect to one server or not. It could make some users feel confused. I think we should check the connectivity to the host in zookeeper_init, so we can tell whether the hosts are avaiable at that time or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init
[ https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870545#action_12870545 ] Qian Ye commented on ZOOKEEPER-779: --- By check connectivity, I mean check whether client can connect to a zookeeper server which is listed in the parameters. In my usage, zookeeper is used to store some meta infomation. The logic flow of my app is that if it can connect to the zookeeper, then obtain the meta info from zookeeper, or obtain it from local file. Becuase the connection to the zookeeper server is not initialized when the zookeeper_init return (mt version), I used to make my app sleep a few seconds to make sure the connection is initialized, however, if the hosts list contains some invalid servers address, the sleep time is hard to estimate. I cannot take the initialization method used in load_gen.c, because in some situation, I want my app read meta info from local file by give a wrong host to zookeepr_init. In a word, I just want zookeeper_init to check, whether at least one zookeeper server in the host list is avaiable at the connecting time. I have made a patch for this issue, could you like to check it out? Anyway, a strategy pattern for connection would be great, I think we should to that. C Client should check the connectivity to the hosts in zookeeper_init - Key: ZOOKEEPER-779 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Qian Ye In some scenario, whether the client can connect to zookeeper servers is used as a logic condition. If the client cannot connect to the servers, the program should turn to another fork. However, current zookeeper_init could not tell whether the client can connect to one server or not. It could make some users feel confused. I think we should check the connectivity to the host in zookeeper_init, so we can tell whether the hosts are avaiable at that time or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init
[ https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870946#action_12870946 ] Qian Ye commented on ZOOKEEPER-779: --- Thx Patrick, i see your point. Here is some explanation. The temporary glitch at the starting time will not lead to any harmful result in my system. This kind of glitch will be recorded in the log file, so some monitor process will notice that. Moreover, the absence of local meta file or out of sync will not lead to any harmful result either. In a word, my system should be able to keep running without zookeeper providing the latest meta info. I would attach my patch soon :-) C Client should check the connectivity to the hosts in zookeeper_init - Key: ZOOKEEPER-779 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Qian Ye In some scenario, whether the client can connect to zookeeper servers is used as a logic condition. If the client cannot connect to the servers, the program should turn to another fork. However, current zookeeper_init could not tell whether the client can connect to one server or not. It could make some users feel confused. I think we should check the connectivity to the host in zookeeper_init, so we can tell whether the hosts are avaiable at that time or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init
[ https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-779: -- Attachment: ZOOKEEPER-779.patch C Client should check the connectivity to the hosts in zookeeper_init - Key: ZOOKEEPER-779 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Qian Ye Attachments: ZOOKEEPER-779.patch In some scenario, whether the client can connect to zookeeper servers is used as a logic condition. If the client cannot connect to the servers, the program should turn to another fork. However, current zookeeper_init could not tell whether the client can connect to one server or not. It could make some users feel confused. I think we should check the connectivity to the host in zookeeper_init, so we can tell whether the hosts are avaiable at that time or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init
[ https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-779: -- Status: Patch Available (was: Open) do the connectivity check in zookeeper_init C Client should check the connectivity to the hosts in zookeeper_init - Key: ZOOKEEPER-779 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Qian Ye Attachments: ZOOKEEPER-779.patch In some scenario, whether the client can connect to zookeeper servers is used as a logic condition. If the client cannot connect to the servers, the program should turn to another fork. However, current zookeeper_init could not tell whether the client can connect to one server or not. It could make some users feel confused. I think we should check the connectivity to the host in zookeeper_init, so we can tell whether the hosts are avaiable at that time or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init
[ https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-779: -- Attachment: ZOOKEEPER-779.patch C Client should check the connectivity to the hosts in zookeeper_init - Key: ZOOKEEPER-779 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Qian Ye Attachments: ZOOKEEPER-779.patch, ZOOKEEPER-779.patch In some scenario, whether the client can connect to zookeeper servers is used as a logic condition. If the client cannot connect to the servers, the program should turn to another fork. However, current zookeeper_init could not tell whether the client can connect to one server or not. It could make some users feel confused. I think we should check the connectivity to the host in zookeeper_init, so we can tell whether the hosts are avaiable at that time or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init
[ https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-779: -- Status: Patch Available (was: Open) C Client should check the connectivity to the hosts in zookeeper_init - Key: ZOOKEEPER-779 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Qian Ye Attachments: ZOOKEEPER-779.patch, ZOOKEEPER-779.patch In some scenario, whether the client can connect to zookeeper servers is used as a logic condition. If the client cannot connect to the servers, the program should turn to another fork. However, current zookeeper_init could not tell whether the client can connect to one server or not. It could make some users feel confused. I think we should check the connectivity to the host in zookeeper_init, so we can tell whether the hosts are avaiable at that time or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init
[ https://issues.apache.org/jira/browse/ZOOKEEPER-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-779: -- Status: Open (was: Patch Available) C Client should check the connectivity to the hosts in zookeeper_init - Key: ZOOKEEPER-779 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Qian Ye Attachments: ZOOKEEPER-779.patch, ZOOKEEPER-779.patch In some scenario, whether the client can connect to zookeeper servers is used as a logic condition. If the client cannot connect to the servers, the program should turn to another fork. However, current zookeeper_init could not tell whether the client can connect to one server or not. It could make some users feel confused. I think we should check the connectivity to the host in zookeeper_init, so we can tell whether the hosts are avaiable at that time or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-779) C Client should check the connectivity to the hosts in zookeeper_init
C Client should check the connectivity to the hosts in zookeeper_init - Key: ZOOKEEPER-779 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-779 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Qian Ye In some scenario, whether the client can connect to zookeeper servers is used as a logic condition. If the client cannot connect to the servers, the program should turn to another fork. However, current zookeeper_init could not tell whether the client can connect to one server or not. It could make some users feel confused. I think we should check the connectivity to the host in zookeeper_init, so we can tell whether the hosts are avaiable at that time or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-591) The C Client cannot exit properly in some situation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844391#action_12844391 ] Qian Ye commented on ZOOKEEPER-591: --- This patch works for me, thx mahadev The C Client cannot exit properly in some situation --- Key: ZOOKEEPER-591 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-591 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.2.1 Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux Reporter: Qian Ye Assignee: Mahadev konar Priority: Blocker Fix For: 3.3.0 Attachments: ZOOKEEPER-591.patch, ZOOKEEPER-591.patch, ZOOKEEPER-591.patch, ZOOKEEPER-591.patch, zootest.c The following code produce a situation, where the C Client can not exit properly, #include include/zookeeper.h void default_zoo_watcher(zhandle_t *zzh, int type, int state, const char *path, void* context){ int zrc = 0; struct String_vector str_vec = {0, NULL}; printf(in the default_zoo_watcher\n); zrc = zoo_wget_children(zzh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); return; } int main() { int zrc = 0; int buff_len = 10; char buff[10] = hello; char path[512]; struct Stat stat; struct String_vector str_vec = {0, NULL}; zhandle_t *zh = zookeeper_init(10.81.20.62:2181, NULL, 3, 0, 0, 0); zrc = zoo_create(zh, /mytest, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 512); printf(zoo_create, error: %d\n, zrc); zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); zrc = zoo_create(zh, /mytest/test1, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 512); printf(zoo_create, error: %d\n, zrc); zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); zrc = zoo_delete(zh, /mytest/test1, -1); printf(zoo_delete, error: %d\n, zrc); zookeeper_close(zh); return 0; } running this code can cause the program hang at zookeeper_close(zh);(line 38). using gdb to attach the process, I found that the main thread is waiting for do_completion thread to finish, (gdb) bt #0 0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x0040de3b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:219 #2 0x004060ba in zookeeper_close (zh=0x515b60) at src/zookeeper.c:2100 #3 0x0040220b in main () and the thread which handle the zoo_wget_children(in the default_zoo_watcher) is waiting for sc-cond. (gdb) thread 2 [Switching to thread 2 (Thread 1094719840 (LWP 25093))]#0 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 (gdb) bt #0 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 #1 0x0040d88b in wait_sync_completion (sc=0x5167f0) at src/mt_adaptor.c:82 #2 0x004082c9 in zoo_wget_children (zh=0x515b60, path=0x40ebc0 /mytest, watcher=0x401fd8 default_zoo_watcher, watcherCtx=Variable watcherCtx is not available.) at src/zookeeper.c:2884 #3 0x00402037 in default_zoo_watcher () #4 0x0040d664 in deliverWatchers (zh=0x515b60, type=4, state=3, path=0x515100 /mytest, list=0x5177d8) at src/zk_hashtable.c:274 #5 0x00403861 in process_completions (zh=0x515b60) at src/zookeeper.c:1631 #6 0x0040e1b5 in do_completion (v=Variable v is not available.) at src/mt_adaptor.c:333 #7 0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #8 0x00302afc6003 in clone () from /lib64/tls/libc.so.6 #9 0x in ?? () here, a deadlock presents. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-591) The C Client cannot exit properly in some situation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843870#action_12843870 ] Qian Ye commented on ZOOKEEPER-591: --- The process still hang there, Mahadev. (gdb) info thread 2 Thread 1094719840 (LWP 31877) 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 1 Thread 182894113888 (LWP 31875) 0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0 (gdb) thread 1 [Switching to thread 1 (Thread 182894113888 (LWP 31875))]#0 0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0 (gdb) bt #0 0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x0040de5b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:218 #2 0x004060da in zookeeper_close (zh=0x515b60) at src/zookeeper.c:2109 #3 0x0040220b in main () (gdb) thread 2 [Switching to thread 2 (Thread 1094719840 (LWP 31877))]#0 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 (gdb) bt #0 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 #1 0x0040d8ab in wait_sync_completion (sc=0x5167f0) at src/mt_adaptor.c:82 #2 0x004082e9 in zoo_wget_children (zh=0x515b60, path=0x40ebe0 /mytest, watcher=0x401fd8 default_zoo_watcher, watcherCtx=Variable watcherCtx is not available. ) at src/zookeeper.c:2889 #3 0x00402037 in default_zoo_watcher () #4 0x0040d684 in deliverWatchers (zh=0x515b60, type=4, state=3, path=0x515100 /mytest, list=0x2a95700b08) at src/zk_hashtable.c:271 #5 0x00403771 in process_completions (zh=0x515b60) at src/zookeeper.c:1623 #6 0x0040e1d5 in do_completion (v=Variable v is not available. ) at src/mt_adaptor.c:332 #7 0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #8 0x00302afc6003 in clone () from /lib64/tls/libc.so.6 #9 0x in ?? () I patched the patch to the c client source code version 3.2.2, not the working copy, I think this won't make any difference, right? The C Client cannot exit properly in some situation --- Key: ZOOKEEPER-591 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-591 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.2.1 Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux Reporter: Qian Ye Assignee: Mahadev konar Priority: Blocker Fix For: 3.3.0 Attachments: ZOOKEEPER-591.patch, ZOOKEEPER-591.patch The following code produce a situation, where the C Client can not exit properly, #include include/zookeeper.h void default_zoo_watcher(zhandle_t *zzh, int type, int state, const char *path, void* context){ int zrc = 0; struct String_vector str_vec = {0, NULL}; printf(in the default_zoo_watcher\n); zrc = zoo_wget_children(zzh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); return; } int main() { int zrc = 0; int buff_len = 10; char buff[10] = hello; char path[512]; struct Stat stat; struct String_vector str_vec = {0, NULL}; zhandle_t *zh = zookeeper_init(10.81.20.62:2181, NULL, 3, 0, 0, 0); zrc = zoo_create(zh, /mytest, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 512); printf(zoo_create, error: %d\n, zrc); zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); zrc = zoo_create(zh, /mytest/test1, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 512); printf(zoo_create, error: %d\n, zrc); zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); zrc = zoo_delete(zh, /mytest/test1, -1); printf(zoo_delete, error: %d\n, zrc); zookeeper_close(zh); return 0; } running this code can cause the program hang at zookeeper_close(zh);(line 38). using gdb to attach the process, I found that the main thread is waiting for do_completion thread to finish, (gdb) bt #0 0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x0040de3b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:219 #2 0x004060ba in zookeeper_close (zh=0x515b60) at src/zookeeper.c:2100 #3 0x0040220b in main () and the thread which handle the zoo_wget_children(in the default_zoo_watcher) is waiting for sc-cond. (gdb) thread 2 [Switching to thread 2 (Thread 1094719840 (LWP 25093))]#0 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 (gdb) bt #0
Re: Google Summer of Code
Hi Henry, I think we should add two kinds of interface to the server: 1. An interface which can return the clients which set watcher on specific znode of the data tree. This kind of interface can be really helpful for the administrators. 2. An interface which can return a list of servers in a zookeeper cluster. Maybe the students can help to do this work. thx~ On Wed, Mar 10, 2010 at 4:46 AM, Gustavo Niemeyer gust...@niemeyer.netwrote: Hi Henry, There is a wiki page here: http://wiki.apache.org/hadoop/ZooKeeper/SoC2010Ideas that requires that you sign up to edit. Please post your project ideas up there - I've left one as an example. You can also mail me directly and I'll post them myself. On Friday I'll tidy up the page and send in an application to Google. Thanks a lot for organizing this. The key things I'd like to see moving forward, as was discussed before in the mailing list, are: - Encryption of communication between servers - Encryption of communication between servers and clients - Dynamic cluster membership changes I don't know how well these fit in GSoC. -- Gustavo Niemeyer http://niemeyer.net http://niemeyer.net/blog http://niemeyer.net/identi.ca http://niemeyer.net/twitter -- With Regards! Ye, Qian
[jira] Commented: (ZOOKEEPER-591) The C Client cannot exit properly in some situation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843443#action_12843443 ] Qian Ye commented on ZOOKEEPER-591: --- Hi Mahadev, the patch doesn't work :-(, the deadlock still exist. (gdb) info thread 2 Thread 1094719840 (LWP 13889) 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 1 Thread 182894113888 (LWP 13887) 0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0 (gdb) thread 1 [Switching to thread 1 (Thread 182894113888 (LWP 13887))]#0 0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0 (gdb) bt #0 0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x0040de2b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:218 #2 0x004060aa in zookeeper_close (zh=0x515b60) at src/zookeeper.c:2086 #3 0x0040220b in main () (gdb) thread 2 [Switching to thread 2 (Thread 1094719840 (LWP 13889))]#0 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 (gdb) bt #0 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 #1 0x0040d87b in wait_sync_completion (sc=0x517850) at src/mt_adaptor.c:82 #2 0x004082b9 in zoo_wget_children (zh=0x515b60, path=0x40eba0 /mytest, watcher=0x401fd8 default_zoo_watcher, watcherCtx=Variable watcherCtx is not available. ) at src/zookeeper.c:2866 #3 0x00402037 in default_zoo_watcher () #4 0x0040d654 in deliverWatchers (zh=0x515b60, type=4, state=3, path=0x516920 /mytest, list=0x5177d8) at src/zk_hashtable.c:271 #5 0x00403871 in process_completions (zh=0x515b60) at src/zookeeper.c:1620 #6 0x0040e1a5 in do_completion (v=Variable v is not available. ) at src/mt_adaptor.c:332 #7 0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #8 0x00302afc6003 in clone () from /lib64/tls/libc.so.6 #9 0x in ?? () The C Client cannot exit properly in some situation --- Key: ZOOKEEPER-591 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-591 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.2.1 Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux Reporter: Qian Ye Assignee: Mahadev konar Priority: Critical Fix For: 3.3.0 Attachments: ZOOKEEPER-591.patch The following code produce a situation, where the C Client can not exit properly, #include include/zookeeper.h void default_zoo_watcher(zhandle_t *zzh, int type, int state, const char *path, void* context){ int zrc = 0; struct String_vector str_vec = {0, NULL}; printf(in the default_zoo_watcher\n); zrc = zoo_wget_children(zzh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); return; } int main() { int zrc = 0; int buff_len = 10; char buff[10] = hello; char path[512]; struct Stat stat; struct String_vector str_vec = {0, NULL}; zhandle_t *zh = zookeeper_init(10.81.20.62:2181, NULL, 3, 0, 0, 0); zrc = zoo_create(zh, /mytest, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 512); printf(zoo_create, error: %d\n, zrc); zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); zrc = zoo_create(zh, /mytest/test1, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 512); printf(zoo_create, error: %d\n, zrc); zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); zrc = zoo_delete(zh, /mytest/test1, -1); printf(zoo_delete, error: %d\n, zrc); zookeeper_close(zh); return 0; } running this code can cause the program hang at zookeeper_close(zh);(line 38). using gdb to attach the process, I found that the main thread is waiting for do_completion thread to finish, (gdb) bt #0 0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x0040de3b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:219 #2 0x004060ba in zookeeper_close (zh=0x515b60) at src/zookeeper.c:2100 #3 0x0040220b in main () and the thread which handle the zoo_wget_children(in the default_zoo_watcher) is waiting for sc-cond. (gdb) thread 2 [Switching to thread 2 (Thread 1094719840 (LWP 25093))]#0 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 (gdb) bt #0 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 #1 0x0040d88b
Re: About load balance in Zookeeper server
thx mahadev :-) On Thu, Mar 4, 2010 at 4:05 AM, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Qian, I am not sure if I did respond to your email or not. Sorry, too many emails I am catching up on. You are right that if you specify just a single host then the client would not be able to switch to another server. There have been some ideas around Dynamic configuration and storing zookeeper ensemble information on the zookeeper cluster itself. http://issues.apache.org/jira/browse/ZOOKEEPER-338 http://issues.apache.org/jira/browse/ZOOKEEPER-107 http://issues.apache.org/jira/browse/ZOOKEEPER-390 This might answer some of the problems you mention, but they are all being worked upon! Thanks mahadev On 3/1/10 6:09 PM, Qian Ye yeqian@gmail.com wrote: Thanks Mahadev, I see what you mean. Here is another question, the client need a list of Zookeeper servers to initialize the handler, and there is no API for the client to get awareness of all the Zookeeper servers in one cluster. That means, if I only provide one Zookeeper server in the client's host list, the client would not switch to another available Zookeeper server, when the given one was failed. I think is strategy is flawed. The client should be able to find out all the Zookeeper servers in the cluster. Is there any compromise for this issue? thanks On Tue, Mar 2, 2010 at 7:29 AM, Mahadev Konar maha...@yahoo-inc.com wrote: HI Qian, You are right we do have any way of handling clients dynamically so that every server has balanced load. This requires a careful design since we would not want client connections to keep flipping around and also maintain stability as much as we can. We have had some discussions about it but nothing concrete has materialized yet. We do have checks in place that prevent more than a certain number of connections (default 10) from the same ip address. This is to keep too many zookeeper client instances from the same client bogging down the zookeeper service. Also, we have throttling for number of outstanding requests from clients (currently set to 1000 by default). This allows zookeeper service to throttle zookeeper clients. This throttling isnt done on per client basis but is just a check to not bring down the zookeeper service because of some misbehaved client. Any other checks that you specifically were thinking of? Thanks mahadev On 2/28/10 10:18 PM, Qian Ye yeqian@gmail.com wrote: Hi guys: As I know, when a client connected to Zookeeper servers, it would choose a server randomly (without the zoo_deterministic_conn_odrder on), and then, the client would talk to the server until a failure happened. It seems that zookeeper server cannot handle the client connection dynamically according to the load of the server. If some flaw of a client made the client connect Zookeeper servers frequently, it may prevent other normal clients from getting services from Zookeeper, right? So, is there any method to resolve these two practical problems: 1. Handle and apportion clients dynamically, so every servers would have balanced load. 2. Some of frequency controller, which set a frequency threshold on the frequency of requests from a client, prevent server resource from being exhausted by a few clients. -- With Regards! Ye, Qian -- With Regards! Ye, Qian
Re: About load balance in Zookeeper server
Thanks Mahadev, I see what you mean. Here is another question, the client need a list of Zookeeper servers to initialize the handler, and there is no API for the client to get awareness of all the Zookeeper servers in one cluster. That means, if I only provide one Zookeeper server in the client's host list, the client would not switch to another available Zookeeper server, when the given one was failed. I think is strategy is flawed. The client should be able to find out all the Zookeeper servers in the cluster. Is there any compromise for this issue? thanks On Tue, Mar 2, 2010 at 7:29 AM, Mahadev Konar maha...@yahoo-inc.com wrote: HI Qian, You are right we do have any way of handling clients dynamically so that every server has balanced load. This requires a careful design since we would not want client connections to keep flipping around and also maintain stability as much as we can. We have had some discussions about it but nothing concrete has materialized yet. We do have checks in place that prevent more than a certain number of connections (default 10) from the same ip address. This is to keep too many zookeeper client instances from the same client bogging down the zookeeper service. Also, we have throttling for number of outstanding requests from clients (currently set to 1000 by default). This allows zookeeper service to throttle zookeeper clients. This throttling isnt done on per client basis but is just a check to not bring down the zookeeper service because of some misbehaved client. Any other checks that you specifically were thinking of? Thanks mahadev On 2/28/10 10:18 PM, Qian Ye yeqian@gmail.com wrote: Hi guys: As I know, when a client connected to Zookeeper servers, it would choose a server randomly (without the zoo_deterministic_conn_odrder on), and then, the client would talk to the server until a failure happened. It seems that zookeeper server cannot handle the client connection dynamically according to the load of the server. If some flaw of a client made the client connect Zookeeper servers frequently, it may prevent other normal clients from getting services from Zookeeper, right? So, is there any method to resolve these two practical problems: 1. Handle and apportion clients dynamically, so every servers would have balanced load. 2. Some of frequency controller, which set a frequency threshold on the frequency of requests from a client, prevent server resource from being exhausted by a few clients. -- With Regards! Ye, Qian -- With Regards! Ye, Qian
About load balance in Zookeeper server
Hi guys: As I know, when a client connected to Zookeeper servers, it would choose a server randomly (without the zoo_deterministic_conn_odrder on), and then, the client would talk to the server until a failure happened. It seems that zookeeper server cannot handle the client connection dynamically according to the load of the server. If some flaw of a client made the client connect Zookeeper servers frequently, it may prevent other normal clients from getting services from Zookeeper, right? So, is there any method to resolve these two practical problems: 1. Handle and apportion clients dynamically, so every servers would have balanced load. 2. Some of frequency controller, which set a frequency threshold on the frequency of requests from a client, prevent server resource from being exhausted by a few clients. -- With Regards! Ye, Qian
[jira] Commented: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837696#action_12837696 ] Qian Ye commented on ZOOKEEPER-662: --- This has not happened again yet. Is there anything we can do to find the reason? When this kind of thing occurred, it really put our system in risk. Too many CLOSE_WAIT socket state on a server Key: ZOOKEEPER-662 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.2.1 Environment: Linux 2.6.9 Reporter: Qian Ye Fix For: 3.3.0 Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106 I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is the content in the configure file, zoo.cfg == # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=5 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=2 # the directory where the snapshot is stored. dataDir=./data/ # the port at which the clients will connect clientPort=8181 # zookeeper cluster list server.100=10.23.253.43:8887: server.101=10.23.150.29:8887: server.102=10.23.247.141:8887: server.200=10.65.20.68:8887: server.201=10.65.27.21:8887: = Before the problem happened, the server.200 was the leader. Yesterday morning, I found the there were many sockets with the state of CLOSE_WAIT on the clientPort (8181), the total was over about 120. Because of these CLOSE_WAIT, the server.200 could not accept more connections from the clients. The only thing I can do under this situation is restart the server.200, at about 2010-02-01 06:06:35. The related log is attached to the issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838157#action_12838157 ] Qian Ye commented on ZOOKEEPER-662: --- Thx Patrick, this situation might be consequence of a network switch adjustment. The effect of the adjustment, as I know, is that two Zookeeper servers lost connection to the other three Zookeeper servers. This connection loss last about several minutes. I have tried to reproduce it, but haven't succeeded yet. I would put an eye on this issue, and let you know if I got any more information about this. Thank you. Too many CLOSE_WAIT socket state on a server Key: ZOOKEEPER-662 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.2.1 Environment: Linux 2.6.9 Reporter: Qian Ye Fix For: 3.3.0 Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106 I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is the content in the configure file, zoo.cfg == # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=5 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=2 # the directory where the snapshot is stored. dataDir=./data/ # the port at which the clients will connect clientPort=8181 # zookeeper cluster list server.100=10.23.253.43:8887: server.101=10.23.150.29:8887: server.102=10.23.247.141:8887: server.200=10.65.20.68:8887: server.201=10.65.27.21:8887: = Before the problem happened, the server.200 was the leader. Yesterday morning, I found the there were many sockets with the state of CLOSE_WAIT on the clientPort (8181), the total was over about 120. Because of these CLOSE_WAIT, the server.200 could not accept more connections from the clients. The only thing I can do under this situation is restart the server.200, at about 2010-02-01 06:06:35. The related log is attached to the issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829920#action_12829920 ] Qian Ye commented on ZOOKEEPER-662: --- Hi Patrick, the c clients all run in a Linux environment, the kernels are 2.6.9. Some of the servers are 32 bit machines and some of them are 64 bits. It seems that the client on the server 10.81.14.81 has some problem, which caused the client to fail frequently. Because there is a monitor app which can restart the c client when it failed, the client on 10.81.14.81 keep restarting and connecting to the zookeeper servers frequently. You mentioned that some of the response for request stat didn't reach the client, it looks like the behaviors of TCP connection with SO_LINER option on. In this kind of situation, the server only put the response on the wire and close, however, the response package may be discarded, and the TCP/IP stack wouldn't re-send the response. Is it the scenario we met here? Too many CLOSE_WAIT socket state on a server Key: ZOOKEEPER-662 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.2.1 Environment: Linux 2.6.9 Reporter: Qian Ye Fix For: 3.3.0 Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106 I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is the content in the configure file, zoo.cfg == # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=5 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=2 # the directory where the snapshot is stored. dataDir=./data/ # the port at which the clients will connect clientPort=8181 # zookeeper cluster list server.100=10.23.253.43:8887: server.101=10.23.150.29:8887: server.102=10.23.247.141:8887: server.200=10.65.20.68:8887: server.201=10.65.27.21:8887: = Before the problem happened, the server.200 was the leader. Yesterday morning, I found the there were many sockets with the state of CLOSE_WAIT on the clientPort (8181), the total was over about 120. Because of these CLOSE_WAIT, the server.200 could not accept more connections from the clients. The only thing I can do under this situation is restart the server.200, at about 2010-02-01 06:06:35. The related log is attached to the issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828916#action_12828916 ] Qian Ye commented on ZOOKEEPER-662: --- I'm using the c client, and there is also a monitor process using echo stat | nc zookeeper 8181 every 20 seconds to get the status of the servers. If the monitor process failed to get a valid reply, it would send a sms alarm to my cell phone. When the problem happened, I received such an alarm. It said connection refused. I haven't found the backlog for the client port in the source code. If it used the default value 128, then so many CLOSE_WAIT states would prevent the kernel from accepting new connection, right? P.S. I cannot tell why the client keep reconnect with the same error, I will take a look at it and append more information if I can find something. Too many CLOSE_WAIT socket state on a server Key: ZOOKEEPER-662 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.2.1 Environment: Linux 2.6.9 Reporter: Qian Ye Fix For: 3.3.0 Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106 I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is the content in the configure file, zoo.cfg == # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=5 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=2 # the directory where the snapshot is stored. dataDir=./data/ # the port at which the clients will connect clientPort=8181 # zookeeper cluster list server.100=10.23.253.43:8887: server.101=10.23.150.29:8887: server.102=10.23.247.141:8887: server.200=10.65.20.68:8887: server.201=10.65.27.21:8887: = Before the problem happened, the server.200 was the leader. Yesterday morning, I found the there were many sockets with the state of CLOSE_WAIT on the clientPort (8181), the total was over about 120. Because of these CLOSE_WAIT, the server.200 could not accept more connections from the clients. The only thing I can do under this situation is restart the server.200, at about 2010-02-01 06:06:35. The related log is attached to the issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server
Too many CLOSE_WAIT socket state on a server Key: ZOOKEEPER-662 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.2.1 Environment: Linux 2.6.9 Reporter: Qian Ye Fix For: 3.3.0 I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is the content in the configure file, zoo.cfg == # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=5 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=2 # the directory where the snapshot is stored. dataDir=./data/ # the port at which the clients will connect clientPort=8181 # zookeeper cluster list server.100=10.23.253.43:8887: server.101=10.23.150.29:8887: server.102=10.23.247.141:8887: server.200=10.65.20.68:8887: server.201=10.65.27.21:8887: = Before the problem happened, the server.200 was the leader. Yesterday morning, I found the there were many sockets with the state of CLOSE_WAIT on the clientPort (8181), the total was over about 120. Because of these CLOSE_WAIT, the server.200 could not accept more connections from the clients. The only thing I can do under this situation is restart the server.200, at about 2010-02-01 06:06:35. The related log is attached to the issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-662) Too many CLOSE_WAIT socket state on a server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-662: -- Attachment: zookeeper.log.2010020106 zookeeper.log.2010020105 related log to this issue Too many CLOSE_WAIT socket state on a server Key: ZOOKEEPER-662 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-662 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.2.1 Environment: Linux 2.6.9 Reporter: Qian Ye Fix For: 3.3.0 Attachments: zookeeper.log.2010020105, zookeeper.log.2010020106 I have a zookeeper cluster with 5 servers, zookeeper version 3.2.1, here is the content in the configure file, zoo.cfg == # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=5 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=2 # the directory where the snapshot is stored. dataDir=./data/ # the port at which the clients will connect clientPort=8181 # zookeeper cluster list server.100=10.23.253.43:8887: server.101=10.23.150.29:8887: server.102=10.23.247.141:8887: server.200=10.65.20.68:8887: server.201=10.65.27.21:8887: = Before the problem happened, the server.200 was the leader. Yesterday morning, I found the there were many sockets with the state of CLOSE_WAIT on the clientPort (8181), the total was over about 120. Because of these CLOSE_WAIT, the server.200 could not accept more connections from the clients. The only thing I can do under this situation is restart the server.200, at about 2010-02-01 06:06:35. The related log is attached to the issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-612: -- Release Note: (was: fix a semicolon mistake) Status: Patch Available (was: Open) Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye Assignee: Qian Ye Fix For: 3.3.0 Attachments: patch, patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-612: -- Status: Open (was: Patch Available) update the patch Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye Assignee: Qian Ye Fix For: 3.3.0 Attachments: patch, patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-612: -- Attachment: ZOOKEEPER-612.patch Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye Assignee: Qian Ye Fix For: 3.3.0 Attachments: patch, patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-612: -- Release Note: update the path, hope it works this time Status: Patch Available (was: Open) Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye Assignee: Qian Ye Fix For: 3.3.0 Attachments: patch, patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-650) Servers cannot join in quorum
[ https://issues.apache.org/jira/browse/ZOOKEEPER-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802819#action_12802819 ] Qian Ye commented on ZOOKEEPER-650: --- Hi all, some more information about this problem, I find that the status of the election ports of the three working servers is strange. For example, the server 10.23.150.29, $ netstat -anp | grep (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp0 0 0.0.0.0:0.0.0.0:* LISTEN - tcp9 0 10.23.150.29: 10.23.253.43:23933 CLOSE_WAIT - tcp 157577 0 10.23.150.29: 10.65.27.21:10482 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23929 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23672 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23671 CLOSE_WAIT - tcp 41 0 10.23.150.29: 10.23.247.141:10790 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23669 CLOSE_WAIT - tcp 136 0 10.23.150.29: 10.23.247.141:10791 ESTABLISHED - tcp9 0 10.23.150.29: 10.23.253.43:23668 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23667 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23923 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23666 CLOSE_WAIT - tcp 73 0 10.23.150.29: 10.23.247.141:10786 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23664 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23663 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23662 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23661 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23660 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23659 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23656 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23651 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23648 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23647 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23646 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23643 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23642 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23640 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23639 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23638 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23637 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23636 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23635 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23634 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23633 CLOSE_WAIT - tcp9 0 10.23.150.29: 10.23.253.43:23630 CLOSE_WAIT - tcp6 0 10.23.150.29: 10.23.253.43:23620 CLOSE_WAIT - tcp 617 0 10.23.150.29: 10.65.27.21:28984 CLOSE_WAIT - tcp0 0 10.23.150.29:10593 10.23.253.43: CLOSE_WAIT - tcp 51 0 10.23.150.29: 10.23.253.43:23712 CLOSE_WAIT - tcp9 0
[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-612: -- Attachment: ZOOKEEPER-612.patch New patch against trunk Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye Assignee: Qian Ye Fix For: 3.3.0 Attachments: patch, patch, ZOOKEEPER-612.patch, ZOOKEEPER-612.patch The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801347#action_12801347 ] Qian Ye commented on ZOOKEEPER-612: --- Is that because I didn't make the patch based on the latest svn chunk version? Should I make a new patch based on it? Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye Assignee: Qian Ye Fix For: 3.3.0 Attachments: patch, patch, ZOOKEEPER-612.patch The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-628) the ephemeral node wouldn't disapper due to session close error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797870#action_12797870 ] Qian Ye commented on ZOOKEEPER-628: --- err...sorry, i wasn't aware of these comments, the log files below the level of WARN were not recorded due to a wrong log4j.properties, and the data directory and snapshots contains some sensitive information, sorry that I cannot upload them. :-( the ephemeral node wouldn't disapper due to session close error --- Key: ZOOKEEPER-628 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-628 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.1 Environment: Linux 2.6.9 x86_64 Reporter: Qian Ye I find a very strange scenario today, I'm not sure how it happen, I just found it like this. Maybe you can give me some information about it, my Zookeeper Server is version 3.2.1. My Zookeeper cluster contains three servers, with ip: 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create ephemeral node under znode: se/diserver_tc. The client runs on the server with ip 10.81.13.173. The client can create a ephemeral node on zookeeper server and write the host ip (10.81.13.173) in to the node as its data. There is only one client process can be running at a time, because the client will listen to a certain port. It is strange that I found there were two ephemeral node with the ip 10.81.13.173 under znode se/diserver_tc. se/diserver_tc/diserver_tc67 STAT: czxid: 124554079820 mzxid: 124554079820 ctime: 1260609598547 mtime: 1260609598547 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 226627854640480810 dataLength: 92 numChildren: 0 pzxid: 124554079820 se/diserver_tc/diserver_tc95 STAT: czxid: 128849019107 mzxid: 128849019107 ctime: 1260772197356 mtime: 1260772197356 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 154673159808876591 dataLength: 92 numChildren: 0 pzxid: 128849019107 There are TWO with different session id! And after I kill the client process on the server 10.81.13.173, the se/diserver_tc/diserver_tc95 node disappear, but the se/diserver_tc/diserver_tc67 stay the same. That means it is not my coding mistake to create the node twice. I checked several times and I'm sure that there is no another client instance running. And I use the 'stat' command to check the three zookeeper servers, and there is no client from 10.81.13.173, $echo stat | nc 10.81.12.144 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.13.173:35676[1](queued=0,recved=0,sent=0) # it is caused by the nc process Latency min/avg/max: 0/3/254 Received: 11081 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: follower Node count: 32 $ echo stat | nc 10.81.12.141 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.12.152:58110[1](queued=0,recved=10374,sent=0) /10.81.13.173:35677[1](queued=0,recved=0,sent=0) # it is caused by the nc process Latency min/avg/max: 0/0/37 Received: 37128 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: follower Node count: 26 $ echo stat | nc 10.81.12.145 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.12.153:19130[1](queued=0,recved=10624,sent=0) /10.81.13.173:35678[1](queued=0,recved=0,sent=0) # it is caused by the nc process Latency min/avg/max: 0/2/213 Received: 26700 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: leader Node count: 26 The three 'stat' commands show different Node count! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-612: -- Attachment: ZOOKEEPER-612.patch yes Patrick, reasonable tips, a patch for this is attached, thx Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye Assignee: Qian Ye Fix For: 3.3.0 Attachments: patch, patch, ZOOKEEPER-612.patch The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-612: -- Release Note: fix a semicolon mistake Status: Patch Available (was: Open) Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye Attachments: patch The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-612: -- Status: Open (was: Patch Available) Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye Attachments: patch The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-612: -- Attachment: patch fix a semicolon mistake Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye Attachments: patch, patch The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
A very strange scenario, may due to some bug on the server side
Hi guys: I find a very strange scenario today, I'm not sure how it happen, I just found it like this. Maybe you can give me some information about it, my Zookeeper Server is version 3.2.1. My Zookeeper cluster contains three servers, with ip: 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create ephemeral node under znode: *se/diserver_tc*. The client runs on the server with ip 10.81.13.173. The client can create a ephemeral node on zookeeper server and write the host ip (10.81.13.173) in to the node as its data. There is only one client process can be running at a time, because the client will listen to a certain port. It is strange that I found there were two ephemeral node with the ip 10.81.13.173 under znode se/diserver_tc. *se/diserver_tc/diserver_tc67* STAT: czxid: 124554079820 mzxid: 124554079820 ctime: 1260609598547 mtime: 1260609598547 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 226627854640480810 dataLength: 92 numChildren: 0 pzxid: 124554079820 *se/diserver_tc/diserver_tc95 *STAT: czxid: 128849019107 mzxid: 128849019107 ctime: 1260772197356 mtime: 1260772197356 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 154673159808876591 dataLength: 92 numChildren: 0 pzxid: 128849019107* * There are TWO with different session id! And after I kill the client process on the server 10.81.13.173, the *se/diserver_tc/diserver_tc95 *node disappear, but the *se/diserver_tc/diserver_tc67 *stay the same. That means it is not my coding mistake to create the node twice. I checked several times and I'm sure that there is no another client instance running. And I use the 'stat' command to check the three zookeeper servers, and there is no client from 10.81.13.173, $echo stat | nc 10.81.12.144 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.13.173:35676[1](queued=0,recved=0,sent=0) *# it is caused by the nc process* Latency min/avg/max: 0/3/254 Received: 11081 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: follower *Node count: 32 * $ echo stat | nc 10.81.12.141 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.12.152:58110[1](queued=0,recved=10374,sent=0) /10.81.13.173:35677[1](queued=0,recved=0,sent=0) *# it is caused by the nc process* Latency min/avg/max: 0/0/37 Received: 37128 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: follower *Node count: 26* $ echo stat | nc 10.81.12.145 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.12.153:19130[1](queued=0,recved=10624,sent=0) /10.81.13.173:35678[1](queued=0,recved=0,sent=0) *# it is caused by the nc process* Latency min/avg/max: 0/2/213 Received: 26700 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: leader *Node count: 26* The three 'stat' commands show different Node count! Just cannot understand how it happened, can anyone give me some explanation about it? -- With Regards! Ye, Qian Made in Zhejiang University
[jira] Commented: (ZOOKEEPER-628) the ephemeral node wouldn't disapper due to session close error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791121#action_12791121 ] Qian Ye commented on ZOOKEEPER-628: --- P.S. se/diserver_tc/diserver_tc67 is only appear on the server 10.81.12.144, the one with the most Node count the ephemeral node wouldn't disapper due to session close error --- Key: ZOOKEEPER-628 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-628 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.1 Environment: Linux 2.6.9 x86_64 Reporter: Qian Ye I find a very strange scenario today, I'm not sure how it happen, I just found it like this. Maybe you can give me some information about it, my Zookeeper Server is version 3.2.1. My Zookeeper cluster contains three servers, with ip: 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create ephemeral node under znode: se/diserver_tc. The client runs on the server with ip 10.81.13.173. The client can create a ephemeral node on zookeeper server and write the host ip (10.81.13.173) in to the node as its data. There is only one client process can be running at a time, because the client will listen to a certain port. It is strange that I found there were two ephemeral node with the ip 10.81.13.173 under znode se/diserver_tc. se/diserver_tc/diserver_tc67 STAT: czxid: 124554079820 mzxid: 124554079820 ctime: 1260609598547 mtime: 1260609598547 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 226627854640480810 dataLength: 92 numChildren: 0 pzxid: 124554079820 se/diserver_tc/diserver_tc95 STAT: czxid: 128849019107 mzxid: 128849019107 ctime: 1260772197356 mtime: 1260772197356 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 154673159808876591 dataLength: 92 numChildren: 0 pzxid: 128849019107 There are TWO with different session id! And after I kill the client process on the server 10.81.13.173, the se/diserver_tc/diserver_tc95 node disappear, but the se/diserver_tc/diserver_tc67 stay the same. That means it is not my coding mistake to create the node twice. I checked several times and I'm sure that there is no another client instance running. And I use the 'stat' command to check the three zookeeper servers, and there is no client from 10.81.13.173, $echo stat | nc 10.81.12.144 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.13.173:35676[1](queued=0,recved=0,sent=0) # it is caused by the nc process Latency min/avg/max: 0/3/254 Received: 11081 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: follower Node count: 32 $ echo stat | nc 10.81.12.141 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.12.152:58110[1](queued=0,recved=10374,sent=0) /10.81.13.173:35677[1](queued=0,recved=0,sent=0) # it is caused by the nc process Latency min/avg/max: 0/0/37 Received: 37128 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: follower Node count: 26 $ echo stat | nc 10.81.12.145 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.12.153:19130[1](queued=0,recved=10624,sent=0) /10.81.13.173:35678[1](queued=0,recved=0,sent=0) # it is caused by the nc process Latency min/avg/max: 0/2/213 Received: 26700 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: leader Node count: 26 The three 'stat' commands show different Node count! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: A very strange scenario, may due to some bug on the server side
I have opened a jira for the issue: https://issues.apache.org/jira/browse/ZOOKEEPER-628 and my zookeeper version is 3.2.1, the se/diserver_tc/diserver_ tc67 only appear on the server 10.81.12.144. I will attach the additional information to the jira. thx On Wed, Dec 16, 2009 at 1:57 AM, Patrick Hunt ph...@apache.org wrote: You might also try the dump command for all 3 servers (similar to the stat command - it's a 4letterword) and look at it's output -- it includes information on ephemeral nodes. Patrick Qian Ye wrote: Hi guys: I find a very strange scenario today, I'm not sure how it happen, I just found it like this. Maybe you can give me some information about it, my Zookeeper Server is version 3.2.1. My Zookeeper cluster contains three servers, with ip: 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create ephemeral node under znode: *se/diserver_tc*. The client runs on the server with ip 10.81.13.173. The client can create a ephemeral node on zookeeper server and write the host ip (10.81.13.173) in to the node as its data. There is only one client process can be running at a time, because the client will listen to a certain port. It is strange that I found there were two ephemeral node with the ip 10.81.13.173 under znode se/diserver_tc. *se/diserver_tc/diserver_tc67* STAT: czxid: 124554079820 mzxid: 124554079820 ctime: 1260609598547 mtime: 1260609598547 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 226627854640480810 dataLength: 92 numChildren: 0 pzxid: 124554079820 *se/diserver_tc/diserver_tc95 *STAT: czxid: 128849019107 mzxid: 128849019107 ctime: 1260772197356 mtime: 1260772197356 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 154673159808876591 dataLength: 92 numChildren: 0 pzxid: 128849019107* * There are TWO with different session id! And after I kill the client process on the server 10.81.13.173, the *se/diserver_tc/diserver_tc95 *node disappear, but the *se/diserver_tc/diserver_tc67 *stay the same. That means it is not my coding mistake to create the node twice. I checked several times and I'm sure that there is no another client instance running. And I use the 'stat' command to check the three zookeeper servers, and there is no client from 10.81.13.173, $echo stat | nc 10.81.12.144 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.13.173:35676[1](queued=0,recved=0,sent=0) *# it is caused by the nc process* Latency min/avg/max: 0/3/254 Received: 11081 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: follower *Node count: 32 * $ echo stat | nc 10.81.12.141 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.12.152:58110[1](queued=0,recved=10374,sent=0) /10.81.13.173:35677[1](queued=0,recved=0,sent=0) *# it is caused by the nc process* Latency min/avg/max: 0/0/37 Received: 37128 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: follower *Node count: 26* $ echo stat | nc 10.81.12.145 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.12.153:19130[1](queued=0,recved=10624,sent=0) /10.81.13.173:35678[1](queued=0,recved=0,sent=0) *# it is caused by the nc process* Latency min/avg/max: 0/2/213 Received: 26700 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: leader *Node count: 26* The three 'stat' commands show different Node count! Just cannot understand how it happened, can anyone give me some explanation about it? -- With Regards! Ye, Qian Made in Zhejiang University
[jira] Commented: (ZOOKEEPER-628) the ephemeral node wouldn't disapper due to session close error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791134#action_12791134 ] Qian Ye commented on ZOOKEEPER-628: --- the 'dump' information for the three servers: $ echo dump | nc 10.81.12.141 2181 SessionTracker dump: org.apache.zookeeper.server.quorum.followersessiontrac...@5d684e26 ephemeral nodes dump: Sessions with Ephemerals (3): 0x3258bdcc1c10068: /se/fras/fras79 /se/fras_tc/fras_tc77 0x225826b5ab2002f: /se/diserver/diserver000114 /se/diserver_tc/diserver_tc94 0x125826b5c1f0019: /se/diserver_tc/diserver_tc98 /se/diserver/diserver000118 $ echo dump | nc 10.81.12.144 2181 SessionTracker dump: org.apache.zookeeper.server.quorum.followersessiontrac...@62ebcdbb ephemeral nodes dump: Sessions with Ephemerals (7): 0x3258bdcc1c10068: /se/fras/fras79 /se/fras_tc/fras_tc77 0x3258bc635750001: /se/diserver_tc/diserver_tc76 b0x32524d5440e022a:/b b/se/diserver_tc/diserver_tc67/b 0x3258bc63575: /se/fras/fras49 /se/fras_tc/fras_tc49 0x225826b5ab2002f: /se/diserver/diserver000114 /se/diserver_tc/diserver_tc94 0x125826b5c1f0019: /se/diserver_tc/diserver_tc98 /se/diserver/diserver000118 0x225826b5ab20011: /se/diserver_tc/diserver_tc81 /se/diserver/diserver000107 $ echo dump | nc 10.81.12.145 2181 SessionTracker dump: Session Sets (9): 0 expire at Wed Dec 16 10:05:08 CST 2009: 0 expire at Wed Dec 16 10:05:10 CST 2009: 0 expire at Wed Dec 16 10:05:14 CST 2009: 0 expire at Wed Dec 16 10:05:18 CST 2009: 0 expire at Wed Dec 16 10:05:20 CST 2009: 0 expire at Wed Dec 16 10:05:24 CST 2009: 1 expire at Wed Dec 16 10:05:28 CST 2009: 82615565794869273 1 expire at Wed Dec 16 10:05:30 CST 2009: 226741136511795304 1 expire at Wed Dec 16 10:05:34 CST 2009: 154673159808876591 ephemeral nodes dump: Sessions with Ephemerals (3): 0x3258bdcc1c10068: /se/fras/fras79 /se/fras_tc/fras_tc77 0x225826b5ab2002f: /se/diserver/diserver000114 /se/diserver_tc/diserver_tc94 0x125826b5c1f0019: /se/diserver_tc/diserver_tc98 /se/diserver/diserver000118 It seems that the server 10.81.12.144 still keep lots of sessions which should have be expired the ephemeral node wouldn't disapper due to session close error --- Key: ZOOKEEPER-628 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-628 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.1 Environment: Linux 2.6.9 x86_64 Reporter: Qian Ye I find a very strange scenario today, I'm not sure how it happen, I just found it like this. Maybe you can give me some information about it, my Zookeeper Server is version 3.2.1. My Zookeeper cluster contains three servers, with ip: 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create ephemeral node under znode: se/diserver_tc. The client runs on the server with ip 10.81.13.173. The client can create a ephemeral node on zookeeper server and write the host ip (10.81.13.173) in to the node as its data. There is only one client process can be running at a time, because the client will listen to a certain port. It is strange that I found there were two ephemeral node with the ip 10.81.13.173 under znode se/diserver_tc. se/diserver_tc/diserver_tc67 STAT: czxid: 124554079820 mzxid: 124554079820 ctime: 1260609598547 mtime: 1260609598547 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 226627854640480810 dataLength: 92 numChildren: 0 pzxid: 124554079820 se/diserver_tc/diserver_tc95 STAT: czxid: 128849019107 mzxid: 128849019107 ctime: 1260772197356 mtime: 1260772197356 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 154673159808876591 dataLength: 92 numChildren: 0 pzxid: 128849019107 There are TWO with different session id! And after I kill the client process on the server 10.81.13.173, the se/diserver_tc/diserver_tc95 node disappear, but the se/diserver_tc/diserver_tc67 stay the same. That means it is not my coding mistake to create the node twice. I checked several times and I'm sure that there is no another client instance running. And I use the 'stat' command to check the three zookeeper servers, and there is no client from 10.81.13.173, $echo stat | nc 10.81.12.144 2181 Zookeeper version
Re: A very strange scenario, may due to some bug on the server side
Sorry, my friend wrote a wrong log4j.properties, only record the log above the WARN level. Will it help if I correct the log4j.properties and restart the zookeeper server on 10.81.12.144. Will the information about session 0x32524d5440e022a be recorded in this way? On Wed, Dec 16, 2009 at 1:46 AM, Mahadev Konar maha...@yahoo-inc.comwrote: Hi Qian, This is quite weird. Are you sure the version is 3.2.1? If yes, please create a jira for this. Also, can you extract the server logs for the session ephemeralOwner: 226627854640480810 And post it on a jira? Ephemeral Owner is the session id. You can convert the above number to hex and look through the logs to see what happened to this session and post the logs on the jira. Looks like the session close for the session (226627854640480810) wasn't successful (a bug mostly). So we need to trace back on what happened on a close of this session and why it did not close. Grepping all the server logs for session id (0x32524d5440e022a, this is the hex of the the above decimal number) might give us some insight into this. Thanks mahadev On 12/15/09 7:44 AM, Benjamin Reed br...@yahoo-inc.com wrote: does se/diserver_tc/diserver_tc67 appear on all three servers? ben Qian Ye wrote: Hi guys: I find a very strange scenario today, I'm not sure how it happen, I just found it like this. Maybe you can give me some information about it, my Zookeeper Server is version 3.2.1. My Zookeeper cluster contains three servers, with ip: 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create ephemeral node under znode: *se/diserver_tc*. The client runs on the server with ip 10.81.13.173. The client can create a ephemeral node on zookeeper server and write the host ip (10.81.13.173) in to the node as its data. There is only one client process can be running at a time, because the client will listen to a certain port. It is strange that I found there were two ephemeral node with the ip 10.81.13.173 under znode se/diserver_tc. *se/diserver_tc/diserver_tc67* STAT: czxid: 124554079820 mzxid: 124554079820 ctime: 1260609598547 mtime: 1260609598547 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 226627854640480810 dataLength: 92 numChildren: 0 pzxid: 124554079820 *se/diserver_tc/diserver_tc95 *STAT: czxid: 128849019107 mzxid: 128849019107 ctime: 1260772197356 mtime: 1260772197356 version: 0 cversion: 0 aversion: 0 ephemeralOwner: 154673159808876591 dataLength: 92 numChildren: 0 pzxid: 128849019107* * There are TWO with different session id! And after I kill the client process on the server 10.81.13.173, the *se/diserver_tc/diserver_tc95 *node disappear, but the *se/diserver_tc/diserver_tc67 *stay the same. That means it is not my coding mistake to create the node twice. I checked several times and I'm sure that there is no another client instance running. And I use the 'stat' command to check the three zookeeper servers, and there is no client from 10.81.13.173, $echo stat | nc 10.81.12.144 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.13.173:35676[1](queued=0,recved=0,sent=0) *# it is caused by the nc process* Latency min/avg/max: 0/3/254 Received: 11081 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: follower *Node count: 32 * $ echo stat | nc 10.81.12.141 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.12.152:58110[1](queued=0,recved=10374,sent=0) /10.81.13.173:35677[1](queued=0,recved=0,sent=0) *# it is caused by the nc process* Latency min/avg/max: 0/0/37 Received: 37128 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: follower *Node count: 26* $ echo stat | nc 10.81.12.145 2181 Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT Clients: /10.81.12.153:19130[1](queued=0,recved=10624,sent=0) /10.81.13.173:35678[1](queued=0,recved=0,sent=0) *# it is caused by the nc process* Latency min/avg/max: 0/2/213 Received: 26700 Sent: 0 Outstanding: 0 Zxid: 0x1e01f5 Mode: leader *Node count: 26* The three 'stat' commands show different Node count! Just cannot understand how it happened, can anyone give me some explanation about it? -- With Regards! Ye, Qian Made in Zhejiang University
The C Client cause core dump in some situation
Hi guys: I encountered a problem today that the Zookeeper C Client (version 3.2.0) core dump when reconnected and did some operations on the zookeeper server which just restarted. The gdb infomation is like: (gdb) bt #0 0x00302af71900 in memcpy () from /lib64/tls/libc.so.6 #1 0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not available.) at src/recordio.c:270 #2 0x0047ed20 in deserialize_CreateResponse (in=0x9cd870, tag=0x50a74e reply, v=0x409ffe70) at generated/zookeeper.jute.c:679 #3 0x0047a1d0 in zookeeper_process (zh=0x9c8c70, events=Variable events is not available.) at src/zookeeper.c:1895 #4 0x004815e6 in do_io (v=Variable v is not available.) at src/mt_adaptor.c:310 #5 0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #6 0x00302afc6003 in clone () from /lib64/tls/libc.so.6 #7 0x in ?? () (gdb) f 1 #1 0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not available.) at src/recordio.c:270 270 in src/recordio.c (gdb) info locals priv = (struct buff_struct *) 0x9cd8d0 *len = -1* rc = Variable rc is not available. According to the source code, int ia_deserialize_string(struct iarchive *ia, const char *name, char **s) { struct buff_struct *priv = ia-priv; int32_t len; *int rc = ia_deserialize_int(ia, len, len);* if (rc 0) return rc; if ((priv-len - priv-off) len) { return -E2BIG; } *s = malloc(len+1); if (!*s) { return -ENOMEM; } memcpy(*s, priv-buffer+priv-off, len); (*s)[len] = '\0'; priv-off += len; return 0; } the variable len is set by ia_deserialize_int, and the returned len doesn't been checked, so the client segment fault when trying to memcpy -1 byte data. I'm not sure why the client got the len variable -1 when deserialize the response from the server, I'm also not sure whether it is an known issue. Could any one give me some information about this problem? -- With Regards! Ye, Qian Made in Zhejiang University
[jira] Created: (ZOOKEEPER-624) The C Client cause core dump when receive error data from Zookeeper Server
The C Client cause core dump when receive error data from Zookeeper Server -- Key: ZOOKEEPER-624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-624 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.2.0 Environment: Linux 2.6.9 x86_64 Reporter: Qian Ye I encountered a problem today that the Zookeeper C Client (version 3.2.0) core dump when reconnected and did some operations on the zookeeper server which just restarted. The gdb infomation is like: (gdb) bt #0 0x00302af71900 in memcpy () from /lib64/tls/libc.so.6 #1 0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not available.) at src/recordio.c:270 #2 0x0047ed20 in deserialize_CreateResponse (in=0x9cd870, tag=0x50a74e reply, v=0x409ffe70) at generated/zookeeper.jute.c:679 #3 0x0047a1d0 in zookeeper_process (zh=0x9c8c70, events=Variable events is not available.) at src/zookeeper.c:1895 #4 0x004815e6 in do_io (v=Variable v is not available.) at src/mt_adaptor.c:310 #5 0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #6 0x00302afc6003 in clone () from /lib64/tls/libc.so.6 #7 0x in ?? () (gdb) f 1 #1 0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not available.) at src/recordio.c:270 270 in src/recordio.c (gdb) info locals priv = (struct buff_struct *) 0x9cd8d0 len = -1 rc = Variable rc is not available. According to the source code, int ia_deserialize_string(struct iarchive *ia, const char *name, char **s) { struct buff_struct *priv = ia-priv; int32_t len; int rc = ia_deserialize_int(ia, len, len); if (rc 0) return rc; if ((priv-len - priv-off) len) { return -E2BIG; } *s = malloc(len+1); if (!*s) { return -ENOMEM; } memcpy(*s, priv-buffer+priv-off, len); (*s)[len] = '\0'; priv-off += len; return 0; } the variable len is set by ia_deserialize_int, and the returned len doesn't been checked, so the client segment fault when trying to memcpy -1 byte data. In the source file recordio.c, there are many functions which don't check the returned len. They all might cause segment fault in some kind of situations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: The C Client cause core dump in some situation
Hi Mahadev: I have created a jira for this issue https://issues.apache.org/jira/browse/ZOOKEEPER-624. And so far, I haven't found the way to reproduce the segment fault. I tried about 10 times the same operations and only produced the core dump 1 time. I would attach the way to the jira if I can find. Thx On Tue, Dec 15, 2009 at 1:53 AM, Mahadev Konar maha...@yahoo-inc.comwrote: Hi Qian, The code that you mention still exists in the trunk and does not check for the len before calling memcpy. Please open a jira on this. The interesting thing though is that the len is -1. Do you have any test case or a test scenario where it can be reproduced. It would be interesting to see why this is happening. We should not be getting a -1 len value from the server. Thanks mahadev On 12/14/09 6:19 AM, Qian Ye yeqian@gmail.com wrote: Hi guys: I encountered a problem today that the Zookeeper C Client (version 3.2.0) core dump when reconnected and did some operations on the zookeeper server which just restarted. The gdb infomation is like: (gdb) bt #0 0x00302af71900 in memcpy () from /lib64/tls/libc.so.6 #1 0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not available.) at src/recordio.c:270 #2 0x0047ed20 in deserialize_CreateResponse (in=0x9cd870, tag=0x50a74e reply, v=0x409ffe70) at generated/zookeeper.jute.c:679 #3 0x0047a1d0 in zookeeper_process (zh=0x9c8c70, events=Variable events is not available.) at src/zookeeper.c:1895 #4 0x004815e6 in do_io (v=Variable v is not available.) at src/mt_adaptor.c:310 #5 0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #6 0x00302afc6003 in clone () from /lib64/tls/libc.so.6 #7 0x in ?? () (gdb) f 1 #1 0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not available.) at src/recordio.c:270 270 in src/recordio.c (gdb) info locals priv = (struct buff_struct *) 0x9cd8d0 *len = -1* rc = Variable rc is not available. According to the source code, int ia_deserialize_string(struct iarchive *ia, const char *name, char **s) { struct buff_struct *priv = ia-priv; int32_t len; *int rc = ia_deserialize_int(ia, len, len);* if (rc 0) return rc; if ((priv-len - priv-off) len) { return -E2BIG; } *s = malloc(len+1); if (!*s) { return -ENOMEM; } memcpy(*s, priv-buffer+priv-off, len); (*s)[len] = '\0'; priv-off += len; return 0; } the variable len is set by ia_deserialize_int, and the returned len doesn't been checked, so the client segment fault when trying to memcpy -1 byte data. I'm not sure why the client got the len variable -1 when deserialize the response from the server, I'm also not sure whether it is an known issue. Could any one give me some information about this problem? -- With Regards! Ye, Qian Made in Zhejiang University
Re: The C Client cause core dump in some situation
Yes, I use valgrind, i will try. On Tue, Dec 15, 2009 at 1:02 PM, Patrick Hunt ph...@apache.org wrote: Did you try using valgrind? That might help reproduce. Qian Ye wrote: Hi Mahadev: I have created a jira for this issue https://issues.apache.org/jira/browse/ZOOKEEPER-624. And so far, I haven't found the way to reproduce the segment fault. I tried about 10 times the same operations and only produced the core dump 1 time. I would attach the way to the jira if I can find. Thx On Tue, Dec 15, 2009 at 1:53 AM, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Qian, The code that you mention still exists in the trunk and does not check for the len before calling memcpy. Please open a jira on this. The interesting thing though is that the len is -1. Do you have any test case or a test scenario where it can be reproduced. It would be interesting to see why this is happening. We should not be getting a -1 len value from the server. Thanks mahadev On 12/14/09 6:19 AM, Qian Ye yeqian@gmail.com wrote: Hi guys: I encountered a problem today that the Zookeeper C Client (version 3.2.0) core dump when reconnected and did some operations on the zookeeper server which just restarted. The gdb infomation is like: (gdb) bt #0 0x00302af71900 in memcpy () from /lib64/tls/libc.so.6 #1 0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not available.) at src/recordio.c:270 #2 0x0047ed20 in deserialize_CreateResponse (in=0x9cd870, tag=0x50a74e reply, v=0x409ffe70) at generated/zookeeper.jute.c:679 #3 0x0047a1d0 in zookeeper_process (zh=0x9c8c70, events=Variable events is not available.) at src/zookeeper.c:1895 #4 0x004815e6 in do_io (v=Variable v is not available.) at src/mt_adaptor.c:310 #5 0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #6 0x00302afc6003 in clone () from /lib64/tls/libc.so.6 #7 0x in ?? () (gdb) f 1 #1 0x0047bfe4 in ia_deserialize_string (ia=Variable ia is not available.) at src/recordio.c:270 270 in src/recordio.c (gdb) info locals priv = (struct buff_struct *) 0x9cd8d0 *len = -1* rc = Variable rc is not available. According to the source code, int ia_deserialize_string(struct iarchive *ia, const char *name, char **s) { struct buff_struct *priv = ia-priv; int32_t len; *int rc = ia_deserialize_int(ia, len, len);* if (rc 0) return rc; if ((priv-len - priv-off) len) { return -E2BIG; } *s = malloc(len+1); if (!*s) { return -ENOMEM; } memcpy(*s, priv-buffer+priv-off, len); (*s)[len] = '\0'; priv-off += len; return 0; } the variable len is set by ia_deserialize_int, and the returned len doesn't been checked, so the client segment fault when trying to memcpy -1 byte data. I'm not sure why the client got the len variable -1 when deserialize the response from the server, I'm also not sure whether it is an known issue. Could any one give me some information about this problem? -- With Regards! Ye, Qian Made in Zhejiang University
[jira] Created: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-612) Make Zookeeper C client can be compiled by gcc of early version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Ye updated ZOOKEEPER-612: -- Attachment: patch can be compiled by gcc 2.96 Make Zookeeper C client can be compiled by gcc of early version --- Key: ZOOKEEPER-612 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-612 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.2.1 Environment: Linux Reporter: Qian Ye Attachments: patch The original C Client, Version 3.2.1, cannot be compiled successfully by the gcc of early version, due some declaration restriction. To compile the source code on the server with gcc of early version, I made some modification on the original source. What's more, some extra codes are added to make the client be compatible with the hosts list format: ip1:port1, ip2:port2... There is often a space after this kind of comma. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-591) The C Client cannot exit properly in some situation
The C Client cannot exit properly in some situation --- Key: ZOOKEEPER-591 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-591 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.2.1 Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux Reporter: Qian Ye The following code produce a situation, where the C Client can not exit properly, #include include/zookeeper.h void default_zoo_watcher(zhandle_t *zzh, int type, int state, const char *path, void* context){ int zrc = 0; struct String_vector str_vec = {0, NULL}; printf(in the default_zoo_watcher\n); zrc = zoo_wget_children(zzh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); return; } int main() { int zrc = 0; int buff_len = 10; char buff[10] = hello; char path[512]; struct Stat stat; struct String_vector str_vec = {0, NULL}; zhandle_t *zh = zookeeper_init(10.81.20.62:2181, NULL, 3, 0, 0, 0); zrc = zoo_create(zh, /mytest, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 512); printf(zoo_create, error: %d\n, zrc); zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); zrc = zoo_create(zh, /mytest/test1, buff, 10, ZOO_OPEN_ACL_UNSAFE, 0, path, 512); printf(zoo_create, error: %d\n, zrc); zrc = zoo_wget_children(zh, /mytest, default_zoo_watcher, NULL, str_vec); printf(zoo_wget_children, error: %d\n, zrc); zrc = zoo_delete(zh, /mytest/test1, -1); printf(zoo_delete, error: %d\n, zrc); zookeeper_close(zh); return 0; } running this code can cause the program hang at zookeeper_close(zh);(line 38). using gdb to attach the process, I found that the main thread is waiting for do_completion thread to finish, (gdb) bt #0 0x00302b806ffb in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x0040de3b in adaptor_finish (zh=0x515b60) at src/mt_adaptor.c:219 #2 0x004060ba in zookeeper_close (zh=0x515b60) at src/zookeeper.c:2100 #3 0x0040220b in main () and the thread which handle the zoo_wget_children(in the default_zoo_watcher) is waiting for sc-cond. (gdb) thread 2 [Switching to thread 2 (Thread 1094719840 (LWP 25093))]#0 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 (gdb) bt #0 0x00302b8089aa in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 #1 0x0040d88b in wait_sync_completion (sc=0x5167f0) at src/mt_adaptor.c:82 #2 0x004082c9 in zoo_wget_children (zh=0x515b60, path=0x40ebc0 /mytest, watcher=0x401fd8 default_zoo_watcher, watcherCtx=Variable watcherCtx is not available.) at src/zookeeper.c:2884 #3 0x00402037 in default_zoo_watcher () #4 0x0040d664 in deliverWatchers (zh=0x515b60, type=4, state=3, path=0x515100 /mytest, list=0x5177d8) at src/zk_hashtable.c:274 #5 0x00403861 in process_completions (zh=0x515b60) at src/zookeeper.c:1631 #6 0x0040e1b5 in do_completion (v=Variable v is not available.) at src/mt_adaptor.c:333 #7 0x00302b80610a in start_thread () from /lib64/tls/libpthread.so.0 #8 0x00302afc6003 in clone () from /lib64/tls/libc.so.6 #9 0x in ?? () here, a deadlock presents. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-589) When create a znode, a NULL ACL parameter cannot be accepted
When create a znode, a NULL ACL parameter cannot be accepted Key: ZOOKEEPER-589 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-589 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.2.1 Environment: Linux db-passport-test05.vm 2.6.9_5-4-0-5 #1 SMP Tue Apr 14 15:56:24 CST 2009 x86_64 x86_64 x86_64 GNU/Linux Reporter: Qian Ye In the comments of client C API which associated with creating znode, eg. zoo_acreate, it is said that the initial ACL of the node if null, the ACL of the parent will be used. However, the it doesn't work. When execute this kind of request at the server side, it raises InvalidACLException. The source code show that, the function fixupACL return false when it get a null ACL. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
How to build Zookeeper Server in Eclipse?
Hi all: Recently, I built a system for resource discovery based on Zookeeper. It works well, which makes me want to study the internals of Zookeeper Server. However, I'm a greenhorn to Java. After loading the source into eclipse and reading the code for several days, I still don't no how to build the project(Eclipse showed that there are lots of errors and warnings). Could anyone help me out? Is these a guide for building the zookeeper server? Thanks~ -- With Regards! Ye, Qian Made in Zhejiang University
[jira] Created: (ZOOKEEPER-515) Zookeeper quorum didn't provide service when restart after an Out of memory crash
Zookeeper quorum didn't provide service when restart after an Out of memory crash --- Key: ZOOKEEPER-515 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-515 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.0 Environment: Linux 2.6.9-52bs-4core #2 SMP Wed Jan 16 14:44:08 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Jdk: 1.6.0_14 Reporter: Qian Ye The Zookeeper quorum, containing 5 servers, didn't provide service when restart after an Out of memory crash. It happened as following: 1. we built a Zookeeper quorum which contained 5 servers, say 1, 3, 4, 5, 6 (have no 2), and 6 was the leader. 2. we created 18 threads on 6 different servers to set and get data from a znode in the Zookeeper at the same time. The size of the data is 1MB. The test threads did their job as fast as possible, no pause between two operation, and they repeated the setting and getting 4000 times. 3. the Zookeeper leader crashed about 10 mins after the test threads started. The leader printed out the log: 2009-08-25 12:00:12,301 - WARN [NIOServerCxn.Factory:2181:nioserverc...@497] - Exception causing close of session 0x523 4223c2dc00b5 due to java.io.IOException: Read error 2009-08-25 12:00:12,318 - WARN [NIOServerCxn.Factory:2181:nioserverc...@497] - Exception causing close of session 0x523 4223c2dc00b6 due to java.io.IOException: Read error 2009-08-25 12:03:44,086 - WARN [NIOServerCxn.Factory:2181:nioserverc...@497] - Exception causing close of session 0x523 4223c2dc00b8 due to java.io.IOException: Read error 2009-08-25 12:04:53,757 - WARN [NIOServerCxn.Factory:2181:nioserverc...@497] - Exception causing close of session 0x523 4223c2dc00b7 due to java.io.IOException: Read error 2009-08-25 12:15:45,151 - FATAL [SyncThread:0:syncrequestproces...@131] - Severe unrecoverable error, exiting java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71) at java.io.DataOutputStream.writeInt(DataOutputStream.java:180) at org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55) at org.apache.zookeeper.txn.SetDataTxn.serialize(SetDataTxn.java:42) at org.apache.zookeeper.server.persistence.Util.marshallTxnEntry(Util.java:262) at org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:154) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:268) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:100) It is clear that the leader ran out of memory. then the server 4 was down almost at the same time, and printed out the log: 2009-08-25 12:15:45,995 - ERROR [FollowerRequestProcessor:3:followerrequestproces...@91] - Unexpected exception causing exit java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.jute.BinaryOutputArchive.writeBuffer(BinaryOutputArchive.java:119) at org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:51) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.server.quorum.Follower.writePacket(Follower.java:97) at org.apache.zookeeper.server.quorum.Follower.request(Follower.java:399) at org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:86) 2009-08-25 12:15:45,996 - WARN [NIOServerCxn.Factory:2181:nioserverc...@497] - Exception causing close of session 0x423 4ab894330075 due to java.net.SocketException: Broken pipe 2009-08-25 12:15:45,996 - FATAL [SyncThread:3:syncrequestproces...@131] - Severe unrecoverable error, exiting java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.quorum.Follower.writePacket(Follower.java:100) at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:52) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:147) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:92