[jira] Updated: (ZOOKEEPER-732) Improper translation of error into Python exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Zhang updated ZOOKEEPER-732: Attachment: (was: ZOOKEEPER-732.patch) > Improper translation of error into Python exception > --- > > Key: ZOOKEEPER-732 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-732 > Project: Zookeeper > Issue Type: Bug > Components: contrib-bindings >Affects Versions: 3.2.2 >Reporter: Gustavo Niemeyer >Assignee: Lei Zhang >Priority: Minor > Attachments: ZOOKEEPER-732.patch > > > Apparently errors returned by the C library are not being correctly converted > into a Python exception in some cases: > >>> zookeeper.get_children(0, "/", None) > Traceback (most recent call last): > File "", line 1, in > SystemError: error return without exception set -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-732) Improper translation of error into Python exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Zhang updated ZOOKEEPER-732: Attachment: ZOOKEEPER-732.patch > Improper translation of error into Python exception > --- > > Key: ZOOKEEPER-732 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-732 > Project: Zookeeper > Issue Type: Bug > Components: contrib-bindings >Affects Versions: 3.2.2 >Reporter: Gustavo Niemeyer >Assignee: Lei Zhang >Priority: Minor > Attachments: ZOOKEEPER-732.patch > > > Apparently errors returned by the C library are not being correctly converted > into a Python exception in some cases: > >>> zookeeper.get_children(0, "/", None) > Traceback (most recent call last): > File "", line 1, in > SystemError: error return without exception set -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-792) zkpython memory leak
[ https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897986#action_12897986 ] Lei Zhang commented on ZOOKEEPER-792: - We've been using this patch in production (16-node cluster) for over a month. I'd like to have it go into never release. Can somebody please code review? > zkpython memory leak > > > Key: ZOOKEEPER-792 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792 > Project: Zookeeper > Issue Type: Bug > Components: contrib-bindings >Affects Versions: 3.3.1 > Environment: vmware workstation - guest OS:Linux python:2.4.3 >Reporter: Lei Zhang >Assignee: Lei Zhang > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-792.patch > > > We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less > client deadlock on session expiration, which is a definite plus! > Unfortunately we are seeing memory leak that requires our zk clients to be > restarted every half-day. Valgrind result: > ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in > loss record 255 of 670 > ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418) > ==8804==by 0x5047B42: parse_acls (zookeeper.c:369) > ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009) > ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0) > ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0) > ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-732) Improper translation of error into Python exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Zhang updated ZOOKEEPER-732: Attachment: ZOOKEEPER-732.patch Attached is a patch that fixes issue 732. Can somebody please review? > Improper translation of error into Python exception > --- > > Key: ZOOKEEPER-732 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-732 > Project: Zookeeper > Issue Type: Bug > Components: contrib-bindings >Affects Versions: 3.2.2 >Reporter: Gustavo Niemeyer >Assignee: Lei Zhang >Priority: Minor > Attachments: ZOOKEEPER-732.patch > > > Apparently errors returned by the C library are not being correctly converted > into a Python exception in some cases: > >>> zookeeper.get_children(0, "/", None) > Traceback (most recent call last): > File "", line 1, in > SystemError: error return without exception set -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-732) Improper translation of error into Python exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Zhang reassigned ZOOKEEPER-732: --- Assignee: Lei Zhang > Improper translation of error into Python exception > --- > > Key: ZOOKEEPER-732 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-732 > Project: Zookeeper > Issue Type: Bug > Components: contrib-bindings >Affects Versions: 3.2.2 >Reporter: Gustavo Niemeyer >Assignee: Lei Zhang >Priority: Minor > > Apparently errors returned by the C library are not being correctly converted > into a Python exception in some cases: > >>> zookeeper.get_children(0, "/", None) > Traceback (most recent call last): > File "", line 1, in > SystemError: error return without exception set -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-603) zkpython should do a better job of freeing memory under error conditions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Zhang resolved ZOOKEEPER-603. - Resolution: Duplicate > zkpython should do a better job of freeing memory under error conditions > > > Key: ZOOKEEPER-603 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-603 > Project: Zookeeper > Issue Type: Bug > Components: contrib-bindings >Affects Versions: 3.2.1 >Reporter: Henry Robinson >Assignee: Lei Zhang > Fix For: 3.4.0 > > > The general pattern is that the construction of a collection might fail, but > the module is not freeing the memory that it has already allocated. > Exceptions that are raised during this process aren't always propagated back > to the Python side either. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-603) zkpython should do a better job of freeing memory under error conditions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Zhang reassigned ZOOKEEPER-603: --- Assignee: Lei Zhang (was: Henry Robinson) > zkpython should do a better job of freeing memory under error conditions > > > Key: ZOOKEEPER-603 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-603 > Project: Zookeeper > Issue Type: Bug > Components: contrib-bindings >Affects Versions: 3.2.1 >Reporter: Henry Robinson >Assignee: Lei Zhang > Fix For: 3.4.0 > > > The general pattern is that the construction of a collection might fail, but > the module is not freeing the memory that it has already allocated. > Exceptions that are raised during this process aren't always propagated back > to the Python side either. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-559) valgrind warnings running zkpython bindings
[ https://issues.apache.org/jira/browse/ZOOKEEPER-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Zhang updated ZOOKEEPER-559: Assignee: Lei Zhang (was: Henry Robinson) Can we make this a duplicate of 792? > valgrind warnings running zkpython bindings > --- > > Key: ZOOKEEPER-559 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-559 > Project: Zookeeper > Issue Type: Bug > Components: contrib-bindings >Affects Versions: 3.3.0 >Reporter: Patrick Hunt >Assignee: Lei Zhang > Fix For: 3.4.0 > > Attachments: valgrind-zk.tar.gz > > > I'm seeing some weird behavior running zk-latencies.py > http://github.com/phunt/zk-smoketest > don't know if it's related to zkbindings itself, but I ran valgrind to see if > it noticed any issues. see attached. > afaict these issues are related to zkpython binding, however I'm not sure. I > did run valgrind against the > zookeeper c library tests and these issues were not highlighted. So I'm > thinking this is zkpython errors, however > I'm not 100% sure. > Henry can you take a look? > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897883#action_12897883 ] Patrick Hunt commented on ZOOKEEPER-845: last I looked there wasn't much processing logic - that was the thing. Most of the commands we call out to toString or similar. Only in a few cases was there common code other than the class/thread wrapper. (there is probably some embedded logic that could be extracted though... we should do that, agree, but just saying it's not huge.). > remove duplicate code from netty and nio ServerCnxn classes > --- > > Key: ZOOKEEPER-845 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845 > Project: Zookeeper > Issue Type: Improvement > Components: server >Reporter: Benjamin Reed > Fix For: 3.4.0 > > > the code for handling the 4-letter words is duplicated between the nio and > netty versions of ServerCnxn. this makes maintenance problematic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897880#action_12897880 ] Benjamin Reed commented on ZOOKEEPER-845: - perhaps we could extract the actual processing logic from the threading model. > remove duplicate code from netty and nio ServerCnxn classes > --- > > Key: ZOOKEEPER-845 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845 > Project: Zookeeper > Issue Type: Improvement > Components: server >Reporter: Benjamin Reed > Fix For: 3.4.0 > > > the code for handling the 4-letter words is duplicated between the nio and > netty versions of ServerCnxn. this makes maintenance problematic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-794: --- Status: Open (was: Patch Available) cancelling patch - hudson failing. > Callbacks are not invoked when the client is closed > --- > > Key: ZOOKEEPER-794 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 > Project: Zookeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.3.1 >Reporter: Alexis Midon >Assignee: Alexis Midon > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, > ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch > > > I noticed that ZooKeeper has different behaviors when calling synchronous or > asynchronous actions on a closed ZooKeeper client. > Actually a synchronous call will throw a "session expired" exception while an > asynchronous call will do nothing. No exception, no callback invocation. > Actually, even if the EventThread receives the Packet with the session > expired err code, the packet is never processed since the thread has been > killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897857#action_12897857 ] Alexis Midon commented on ZOOKEEPER-794: Yes, with the first patches, the call back ordering might be different from the event ordering. And this is one of the ZK core garanties right? Although in our case this is happening during the shutdown procedure. I'll double check the patch. > Callbacks are not invoked when the client is closed > --- > > Key: ZOOKEEPER-794 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 > Project: Zookeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.3.1 >Reporter: Alexis Midon >Assignee: Alexis Midon > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, > ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch > > > I noticed that ZooKeeper has different behaviors when calling synchronous or > asynchronous actions on a closed ZooKeeper client. > Actually a synchronous call will throw a "session expired" exception while an > asynchronous call will do nothing. No exception, no callback invocation. > Actually, even if the EventThread receives the Packet with the session > expired err code, the packet is never processed since the thread has been > killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897855#action_12897855 ] Patrick Hunt commented on ZOOKEEPER-794: Would be great to get this into 3.3.2, any update? > Callbacks are not invoked when the client is closed > --- > > Key: ZOOKEEPER-794 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 > Project: Zookeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.3.1 >Reporter: Alexis Midon >Assignee: Alexis Midon > Fix For: 3.3.2, 3.4.0 > > Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, > ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch > > > I noticed that ZooKeeper has different behaviors when calling synchronous or > asynchronous actions on a closed ZooKeeper client. > Actually a synchronous call will throw a "session expired" exception while an > asynchronous call will do nothing. No exception, no callback invocation. > Actually, even if the EventThread receives the Packet with the session > expired err code, the packet is never processed since the thread has been > killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-795) eventThread isn't shutdown after a connection "session expired" event coming
[ https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897853#action_12897853 ] Patrick Hunt commented on ZOOKEEPER-795: This is blocking a release candidate for 3.3.2, if we can get this in soon I'll start running through the release process. > eventThread isn't shutdown after a connection "session expired" event coming > > > Key: ZOOKEEPER-795 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795 > Project: Zookeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.3.1 > Environment: ubuntu 10.04 >Reporter: mathieu barcikowski >Assignee: Sergey Doroshenko >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch, > ZOOKEEPER-795.patch > > > Hi, > I notice a problem with the eventThread located in ClientCnxn.java file. > The eventThread isn't shutdown after a connection "session expired" event > coming (i.e. never receive EventOfDeath). > When a session timeout occurs and the session is marked as expired, the > connexion is fully closed (socket, SendThread...) expect for the eventThread. > As a result, if i create a new zookeeper object and connect through it, I got > a zombi thread which will never be kill (as for the previous zookeeper > object, the state is already close, calling close again don't do anything). > So everytime I will create a new zookeeper connection after a expired > session, I will have a one more zombi EventThread. > How to reproduce : > - Start a zookeeper client connection in debug mode > - Pause the jvm enough time to the expired event occur > - Watch for example with jvisualvm the list of threads, the sendThread is > succesfully killed, but the EventThread go to wait state for a infinity of > time > - if you reopen a new zookeeper connection, and do again the previous steps, > another EventThread will be present in infinite wait state -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-845: --- Fix Version/s: 3.4.0 > remove duplicate code from netty and nio ServerCnxn classes > --- > > Key: ZOOKEEPER-845 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845 > Project: Zookeeper > Issue Type: Improvement > Components: server >Reporter: Benjamin Reed > Fix For: 3.4.0 > > > the code for handling the 4-letter words is duplicated between the nio and > netty versions of ServerCnxn. this makes maintenance problematic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897852#action_12897852 ] Patrick Hunt commented on ZOOKEEPER-845: Agree, this is ugly, however if you look the issue is related to the fact that nio uses threads while netty doesn't. One question in my mind, should netty be using a thread as well? The problem is that the channel could be closed before netty responds if the response is not handled in the worker request thread. In NIO we handle this by "handing off" ownership of the socket from the main run routine to the 4lw response thread. afaik we cannot do this in netty. So really it's not a simple answer afaict. That's why I didn't do it cleanly in the first place. > remove duplicate code from netty and nio ServerCnxn classes > --- > > Key: ZOOKEEPER-845 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845 > Project: Zookeeper > Issue Type: Improvement > Components: server >Reporter: Benjamin Reed > > the code for handling the 4-letter words is duplicated between the nio and > netty versions of ServerCnxn. this makes maintenance problematic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: zookeeper seems to hang
Please see: https://issues.apache.org/jira/browse/ZOOKEEPER-846 On Thu, Aug 12, 2010 at 10:00 AM, Patrick Hunt wrote: > Great bug report Ted, the stack trace in particular is very useful. > > It looks like a timing bug where the client is not shutting down cleanly on > the close call. I reviewed the code in question but nothing pops out at me. > Also the logs just show us shutting down, nothing else from zk in there. > > Create a jira and attach all the detail you have available. > > Patrick > > > On 08/11/2010 03:21 PM, Ted Yu wrote: > >> Hi, >> Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where >> Regionserver >> process was shutting down and seemed to hang. >> >> Here is the bottom of region server log: >> http://pastebin.com/YYawJ4jA >> >> zookeeper-3.2.2 is used. >> >> Your comment is welcome. >> >> Here is relevant portion from jstack - I attempted to attach jstack twice >> in >> my email to d...@hbase.apache.org but failed: >> >> "DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on >> condition [0x] >>java.lang.Thread.State: RUNNABLE >> >> "regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000 >> nid=0x6c81 >> in Object.wait() [0x43755000] >>java.lang.Thread.State: WAITING (on object monitor) >> at java.lang.Object.wait(Native Method) >> - waiting on<0x2aaab76633c0> (a >> org.apache.zookeeper.ClientCnxn$Packet) >> at java.lang.Object.wait(Object.java:485) >> at >> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) >> - locked<0x2aaab76633c0> (a >> org.apache.zookeeper.ClientCnxn$Packet) >> at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) >> at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) >> - locked<0x2aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper) >> at >> >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) >> at java.lang.Thread.run(Thread.java:619) >> >> "main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80 >> waiting >> on condition [0x413f3000] >>java.lang.Thread.State: WAITING (parking) >> at sun.misc.Unsafe.park(Native Method) >> - parking to wait for<0x2aaabf6e9150> (a >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) >> at >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >> at >> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) >> at >> >> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) >> at >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) >> >> "RMI TCP Accept-0" daemon prio=10 tid=0x2aabb822c800 nid=0x6c7d >> runnable >> [0x40752000] >>java.lang.Thread.State: RUNNABLE >> at java.net.PlainSocketImpl.socketAccept(Native Method) >> at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390) >> - locked<0x2aaabf585578> (a java.net.SocksSocketImpl) >> at java.net.ServerSocket.implAccept(ServerSocket.java:453) >> at java.net.ServerSocket.accept(ServerSocket.java:421) >> at >> >> sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34) >> at >> >> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369) >> at >> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341) >> at java.lang.Thread.run(Thread.java:619) >> >>
Re: High WTF count in ZooKeeper client code
Thomas, btw, if you'd like (anyone really) to do a patch extracting deleterecursive from zk into some helper class I think that would be a good idea to get sooner rather than later. Patrick On 08/11/2010 11:36 PM, Thomas Koch wrote: Patrick, I saw your patch and was afraid you wouldn't like to wait for me and change it. :-) I'll continue to work on my issues and also put them into jira for review so that my team can start to work on the new API. After your patch is applied, I'll adapt my patches, which should not change anything to the user facing API of ZK. Thomas Patrick Hunt: Thomas, I see some patches already, which is great, however there's a big/complicated refactoring that's pending here: https://issues.apache.org/jira/browse/ZOOKEEPER-823 and to some extent here: https://issues.apache.org/jira/browse/ZOOKEEPER-733 and refactorings in this code prior to 733/823 going in are going to cause me much pain. (esp as I'm moving code around, creating new classes, etc) Could you hold off a bit on changes in this area until these two are committed? Ben is working on the reviews now. Ben please prioritize review/commit of these two. Patrick On 08/11/2010 08:23 AM, Thomas Koch wrote: Hallo Mahadev, thank you for your nice answer. Yes, we'll of cause preserve compatibility. Otherwise there is no chance to get accepted. I assume the following things must keep their interfaces: ZooKeeper (It'll call the new interface in the background), ASyncCallback, Watcher We may want to change: ClientCnxn (faktor out some things, remove dep on ZooKeeper) I think other classes should not be involved at all in our issues. My collegue Patrick was so kind to fill the jira issues. Best regards, Thomas Mahadev Konar: Also, I am assuming you have backwards compatability in mind when you suggest these changes right? The interfaces of zookeeper client should not be changing as part of this, though the recursive delete hasn't been introduced yet (its only available in 3.4, so we can move it out into a helper class). Thanks mahadev On 8/11/10 7:40 AM, "Mahadev Konar" wrote: HI Thomas, I read through the list of issues you posted, most of them seem reasonable to fix. The one's you have mentioned below might take quite a bit of time to fix and again a lot of testing! (just a warning :)). It would be great if you'd want to clean this up for 3.4. Please go ahead and file a jira. These improvements would be good to have in the zookeeper java client. For deleteRecursive, I definitely agree that it should be a helper class. I don't believe it should be in the direct zookeeper api! Thanks mahadev On 8/11/10 2:45 AM, "Thomas Koch" wrote: Hi, I started yesterday to work on my idea of an alternative ZooKeeper client interface.[1] Instead of methods on a ZooKeeper class, a user should instantiate an Operation (Create, Delete, ...) and forward it to an Executor which handles session loss errors and alikes. By doing that, I got shocked by the sheer number of WTF issues I found. I'm sorry for ranting now, but it gets quicker to the poing. - Hostlist as string The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of not doing (too much) work in a ctor. Instead the ClientCnxn should receive an object of class "HostSet". HostSet could then be instantiated e.g. with a comma separated string. - cyclic dependency ClientCnxn, ZooKeeper ZooKeeper instantiates ClientCnxn in its ctor with this and therefor builds a cyclic dependency graph between both objects. This means, you can't have the one without the other. So why did you bother do make them to separate classes in the first place? ClientCnxn accesses ZooKeeper.state. State should rather be a property of ClientCnxn. And ClientCnxn accesses zooKeeper.get???Watches() in its method primeConnection(). I've not yet checked, how this dependency should be resolved better. - Chroot is an attribute of ClientCnxn I'd like to have one process that uses ZooKeeper for different things (managing a list of work, locking some unrelated locks elsewhere). So I've components that do this work inside the same process. These components should get the same zookeeper-client reference chroot'ed for their needs. So it'd be much better, if the ClientCnxn would not care about the chroot. - deleteRecursive does not belong to the other methods DeleteRecursive has been committed to trunk already as a method to the zookeeper class. So in the API it has the same level as the atomic operations create, delete, getData, setData, etc. The user must get the false impression, that deleteRecursive is also an atomic operation. It would be better to have deleteRecursive in some helper class but not that deep in zookeeper's core code. Maybe I'd like to have another policy on how to react if deleteRecursive fails in the middle of its work? - massive code duplication in zookeeper class Each operation calls validatePath, handles the chroot, calls ClientCnxn and checks the retu
[jira] Updated: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call
[ https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-846: - Description: Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: "DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE "regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x2aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked <0x2aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked <0x2aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) "main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x2aaabf6e9150> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) was: Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Your comment is welcome. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: "DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE "regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x2aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked <0x2aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked <0x2aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) "main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x2aaabf6e9150> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) > zookeeper client doesn't shut down cleanly on the close call > > > Key: ZOOKEEPER-846 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 > Project: Zookeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.2.2 >Reporter: Ted Yu > Attachments: rs-13.stack > > > Using
[jira] Updated: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call
[ https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-846: - Attachment: rs-13.stack jstack for Region Server > zookeeper client doesn't shut down cleanly on the close call > > > Key: ZOOKEEPER-846 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 > Project: Zookeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.2.2 >Reporter: Ted Yu > Attachments: rs-13.stack > > > Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where > Regionserver > process was shutting down and seemed to hang. > Here is the bottom of region server log: > http://pastebin.com/YYawJ4jA > zookeeper-3.2.2 is used. > Your comment is welcome. > Here is relevant portion from jstack - I attempted to attach jstack twice in > my email to d...@hbase.apache.org but failed: > "DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on > condition [0x] >java.lang.Thread.State: RUNNABLE > "regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000 nid=0x6c81 > in Object.wait() [0x43755000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x2aaab76633c0> (a > org.apache.zookeeper.ClientCnxn$Packet) > at java.lang.Object.wait(Object.java:485) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) > - locked <0x2aaab76633c0> (a > org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) > at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) > - locked <0x2aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) > at java.lang.Thread.run(Thread.java:619) > "main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80 waiting > on condition [0x413f3000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x2aaabf6e9150> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call
zookeeper client doesn't shut down cleanly on the close call Key: ZOOKEEPER-846 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.2 Reporter: Ted Yu Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Your comment is welcome. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: "DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE "regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x2aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked <0x2aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked <0x2aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) "main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x2aaabf6e9150> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: C client unit test failure
If you figure out what it is let us know, would be good to identify a "fix" if others run into the same problem. Regards, Patrick On 08/12/2010 09:42 AM, Michi Mutsuzaki wrote: Yeah, I tried installing libtool 2, but that caused some other issue. I'll play around a bit more, and let you know if I find anything. --Michi On 8/12/10 1:40 AM, "Patrick Hunt" wrote: I've been running with v4 for a while and never noticed that issue... You might try googling it, a quick search turned up: "The meaning of "-static" changed between libtool 1.5 and libtool 2.x, and libtool 2.x introduced "-static-libtool-libs" to provide the old behavior." ... Patrick On 08/09/2010 11:28 AM, Michi Mutsuzaki wrote: Running "ant jar" fixed the unit test failure. I'm using g++ 3.4.6. Do I need later version to get rid of -static-libtool-libs error? $ g++ --version g++ (GCC) 3.4.6 20060404 (Red Hat 3.4.6-9) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Thanks! --Michi On 8/7/10 11:57 PM, "Patrick Hunt" wrote: What version of g++ do you have? Capture the test output and attach to your response. However I suspect that the server is not running (it's necessary to test the c client), did you "ant jar" (or similar - ie build the server) before testing the client? Patrick On 08/07/2010 04:57 PM, Michi Mutsuzaki wrote: Hello, I'm having 2 issues while compiling/running c client unit test in branch-3.3. 1. I get this error from "make check": g++: unrecognized option `-static-libtool-libs' 2. testAsyncWatcherAutoReset is not working for me. Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after throwing an instance of 'CppUnit::Exception' what(): equality assertion failed- Expected: -101- Actual : -4 Let me know if anybody has seen these errors. Thanks! --Michi
Re: zookeeper seems to hang
Great bug report Ted, the stack trace in particular is very useful. It looks like a timing bug where the client is not shutting down cleanly on the close call. I reviewed the code in question but nothing pops out at me. Also the logs just show us shutting down, nothing else from zk in there. Create a jira and attach all the detail you have available. Patrick On 08/11/2010 03:21 PM, Ted Yu wrote: Hi, Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Your comment is welcome. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: "DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE "regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on<0x2aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked<0x2aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked<0x2aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) "main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for<0x2aaabf6e9150> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) "RMI TCP Accept-0" daemon prio=10 tid=0x2aabb822c800 nid=0x6c7d runnable [0x40752000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390) - locked<0x2aaabf585578> (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341) at java.lang.Thread.run(Thread.java:619)
[jira] Created: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes
remove duplicate code from netty and nio ServerCnxn classes --- Key: ZOOKEEPER-845 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Benjamin Reed the code for handling the 4-letter words is duplicated between the nio and netty versions of ServerCnxn. this makes maintenance problematic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-733) use netty to handle client connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-733: Hadoop Flags: [Reviewed] +1 looks good to commit. Sergey raises a valid point, but i think it should be addressed in a separate jira given the size of this patch. > use netty to handle client connections > -- > > Key: ZOOKEEPER-733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733 > Project: Zookeeper > Issue Type: Improvement > Components: server >Reporter: Benjamin Reed >Assignee: Patrick Hunt > Fix For: 3.4.0 > > Attachments: accessive.jar, flowctl.zip, moved.zip, > QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, > ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, > ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, > ZOOKEEPER-733.patch, ZOOKEEPER-733.patch > > > we currently have our own asynchronous NIO socket engine to be able to handle > lots of clients with a single thread. over time the engine has become more > complicated. we would also like the engine to use multiple threads on > machines with lots of cores. plus, we would like to be able to support things > like SSL. if we switch to netty, we can simplify our code and get the > previously mentioned benefits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: C client unit test failure
Yeah, I tried installing libtool 2, but that caused some other issue. I'll play around a bit more, and let you know if I find anything. --Michi On 8/12/10 1:40 AM, "Patrick Hunt" wrote: > I've been running with v4 for a while and never noticed that issue... > > You might try googling it, a quick search turned up: > > "The meaning of "-static" changed between libtool 1.5 and libtool 2.x, > and libtool 2.x introduced "-static-libtool-libs" to provide the old > behavior." ... > > Patrick > > On 08/09/2010 11:28 AM, Michi Mutsuzaki wrote: >> Running "ant jar" fixed the unit test failure. >> >> I'm using g++ 3.4.6. Do I need later version to get rid of >> -static-libtool-libs error? >> >> $ g++ --version >> g++ (GCC) 3.4.6 20060404 (Red Hat 3.4.6-9) >> Copyright (C) 2006 Free Software Foundation, Inc. >> This is free software; see the source for copying conditions. There is NO >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. >> >> Thanks! >> --Michi >> >> >> On 8/7/10 11:57 PM, "Patrick Hunt" wrote: >> >>> What version of g++ do you have? Capture the test output and attach to >>> your response. However I suspect that the server is not running (it's >>> necessary to test the c client), did you "ant jar" (or similar - ie >>> build the server) before testing the client? >>> >>> Patrick >>> >>> On 08/07/2010 04:57 PM, Michi Mutsuzaki wrote: Hello, I'm having 2 issues while compiling/running c client unit test in branch-3.3. 1. I get this error from "make check": g++: unrecognized option `-static-libtool-libs' 2. testAsyncWatcherAutoReset is not working for me. Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after throwing an instance of 'CppUnit::Exception' what(): equality assertion failed- Expected: -101- Actual : -4 Let me know if anybody has seen these errors. Thanks! --Michi >>> >> >
[jira] Created: (ZOOKEEPER-844) handle auth failure in java client
handle auth failure in java client -- Key: ZOOKEEPER-844 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-844 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Camille Fournier ClientCnxn.java currently has the following code: if (replyHdr.getXid() == -4) { // -2 is the xid for AuthPacket // TODO: process AuthPacket here if (LOG.isDebugEnabled()) { LOG.debug("Got auth sessionid:0x" + Long.toHexString(sessionId)); } return; } Auth failures appear to cause the server to disconnect but the client never gets a proper state change or notification that auth has failed, which makes handling this scenario very difficult as it causes the client to go into a loop of sending bad auth, getting disconnected, trying to reconnect, sending bad auth again, over and over. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-838) Chroot is an attribute of ClientCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897725#action_12897725 ] Thomas Koch commented on ZOOKEEPER-838: --- I need to change the internal logic of watch registration: The watches will now be registered under the full serverPath, since we want to have multiple ChangeRoots in the client. For each watcher I'll additionally save its ChRoot in the watchManager. > Chroot is an attribute of ClientCnxn > > > Key: ZOOKEEPER-838 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-838 > Project: Zookeeper > Issue Type: Sub-task >Reporter: Patrick Datko > > It would be better to have one process that uses ZooKeeper for different > things > (managing a list of work, locking some unrelated locks elsewhere). So there > are > components that do this work inside the same process. These components should > get the same zookeeper-client reference chroot'ed for their needs. > So it'd be much better, if the ClientCnxn would not care about the chroot. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: C client unit test failure
I've been running with v4 for a while and never noticed that issue... You might try googling it, a quick search turned up: "The meaning of "-static" changed between libtool 1.5 and libtool 2.x, and libtool 2.x introduced "-static-libtool-libs" to provide the old behavior." ... Patrick On 08/09/2010 11:28 AM, Michi Mutsuzaki wrote: Running "ant jar" fixed the unit test failure. I'm using g++ 3.4.6. Do I need later version to get rid of -static-libtool-libs error? $ g++ --version g++ (GCC) 3.4.6 20060404 (Red Hat 3.4.6-9) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Thanks! --Michi On 8/7/10 11:57 PM, "Patrick Hunt" wrote: What version of g++ do you have? Capture the test output and attach to your response. However I suspect that the server is not running (it's necessary to test the c client), did you "ant jar" (or similar - ie build the server) before testing the client? Patrick On 08/07/2010 04:57 PM, Michi Mutsuzaki wrote: Hello, I'm having 2 issues while compiling/running c client unit test in branch-3.3. 1. I get this error from "make check": g++: unrecognized option `-static-libtool-libs' 2. testAsyncWatcherAutoReset is not working for me. Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after throwing an instance of 'CppUnit::Exception' what(): equality assertion failed- Expected: -101- Actual : -4 Let me know if anybody has seen these errors. Thanks! --Michi
Re: High WTF count in ZooKeeper client code
On 08/11/2010 11:36 PM, Thomas Koch wrote: I saw your patch and was afraid you wouldn't like to wait for me and change it. :-) I'll continue to work on my issues and also put them into jira for review so that my team can start to work on the new API. After your patch is applied, I'll adapt my patches, which should not change anything to the user facing API of ZK. No problem, just didn't want you to get frustrated in case you didn't notice. As long as you go in with eyes wide open it's ok w/me. :-) Patrick Patrick Hunt: Thomas, I see some patches already, which is great, however there's a big/complicated refactoring that's pending here: https://issues.apache.org/jira/browse/ZOOKEEPER-823 and to some extent here: https://issues.apache.org/jira/browse/ZOOKEEPER-733 and refactorings in this code prior to 733/823 going in are going to cause me much pain. (esp as I'm moving code around, creating new classes, etc) Could you hold off a bit on changes in this area until these two are committed? Ben is working on the reviews now. Ben please prioritize review/commit of these two. Patrick On 08/11/2010 08:23 AM, Thomas Koch wrote: Hallo Mahadev, thank you for your nice answer. Yes, we'll of cause preserve compatibility. Otherwise there is no chance to get accepted. I assume the following things must keep their interfaces: ZooKeeper (It'll call the new interface in the background), ASyncCallback, Watcher We may want to change: ClientCnxn (faktor out some things, remove dep on ZooKeeper) I think other classes should not be involved at all in our issues. My collegue Patrick was so kind to fill the jira issues. Best regards, Thomas Mahadev Konar: Also, I am assuming you have backwards compatability in mind when you suggest these changes right? The interfaces of zookeeper client should not be changing as part of this, though the recursive delete hasn't been introduced yet (its only available in 3.4, so we can move it out into a helper class). Thanks mahadev On 8/11/10 7:40 AM, "Mahadev Konar" wrote: HI Thomas, I read through the list of issues you posted, most of them seem reasonable to fix. The one's you have mentioned below might take quite a bit of time to fix and again a lot of testing! (just a warning :)). It would be great if you'd want to clean this up for 3.4. Please go ahead and file a jira. These improvements would be good to have in the zookeeper java client. For deleteRecursive, I definitely agree that it should be a helper class. I don't believe it should be in the direct zookeeper api! Thanks mahadev On 8/11/10 2:45 AM, "Thomas Koch" wrote: Hi, I started yesterday to work on my idea of an alternative ZooKeeper client interface.[1] Instead of methods on a ZooKeeper class, a user should instantiate an Operation (Create, Delete, ...) and forward it to an Executor which handles session loss errors and alikes. By doing that, I got shocked by the sheer number of WTF issues I found. I'm sorry for ranting now, but it gets quicker to the poing. - Hostlist as string The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of not doing (too much) work in a ctor. Instead the ClientCnxn should receive an object of class "HostSet". HostSet could then be instantiated e.g. with a comma separated string. - cyclic dependency ClientCnxn, ZooKeeper ZooKeeper instantiates ClientCnxn in its ctor with this and therefor builds a cyclic dependency graph between both objects. This means, you can't have the one without the other. So why did you bother do make them to separate classes in the first place? ClientCnxn accesses ZooKeeper.state. State should rather be a property of ClientCnxn. And ClientCnxn accesses zooKeeper.get???Watches() in its method primeConnection(). I've not yet checked, how this dependency should be resolved better. - Chroot is an attribute of ClientCnxn I'd like to have one process that uses ZooKeeper for different things (managing a list of work, locking some unrelated locks elsewhere). So I've components that do this work inside the same process. These components should get the same zookeeper-client reference chroot'ed for their needs. So it'd be much better, if the ClientCnxn would not care about the chroot. - deleteRecursive does not belong to the other methods DeleteRecursive has been committed to trunk already as a method to the zookeeper class. So in the API it has the same level as the atomic operations create, delete, getData, setData, etc. The user must get the false impression, that deleteRecursive is also an atomic operation. It would be better to have deleteRecursive in some helper class but not that deep in zookeeper's core code. Maybe I'd like to have another policy on how to react if deleteRecursive fails in the middle of its work? - massive code duplication in zookeeper class Each operation calls validatePath, handles the chroot, calls ClientCnxn and checks the return header for error. I'd like to address this with the oper