[jira] Updated: (ZOOKEEPER-732) Improper translation of error into Python exception

2010-08-12 Thread Lei Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Zhang updated ZOOKEEPER-732:


Attachment: (was: ZOOKEEPER-732.patch)

> Improper translation of error into Python exception
> ---
>
> Key: ZOOKEEPER-732
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-732
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.2.2
>Reporter: Gustavo Niemeyer
>Assignee: Lei Zhang
>Priority: Minor
> Attachments: ZOOKEEPER-732.patch
>
>
> Apparently errors returned by the C library are not being correctly converted 
> into a Python exception in some cases: 
> >>> zookeeper.get_children(0, "/", None)
> Traceback (most recent call last):
>   File "", line 1, in 
> SystemError: error return without exception set

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-732) Improper translation of error into Python exception

2010-08-12 Thread Lei Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Zhang updated ZOOKEEPER-732:


Attachment: ZOOKEEPER-732.patch

> Improper translation of error into Python exception
> ---
>
> Key: ZOOKEEPER-732
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-732
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.2.2
>Reporter: Gustavo Niemeyer
>Assignee: Lei Zhang
>Priority: Minor
> Attachments: ZOOKEEPER-732.patch
>
>
> Apparently errors returned by the C library are not being correctly converted 
> into a Python exception in some cases: 
> >>> zookeeper.get_children(0, "/", None)
> Traceback (most recent call last):
>   File "", line 1, in 
> SystemError: error return without exception set

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-792) zkpython memory leak

2010-08-12 Thread Lei Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897986#action_12897986
 ] 

Lei Zhang commented on ZOOKEEPER-792:
-

We've been using this patch in production (16-node cluster) for over a month. 
I'd like to have it go into never release. Can somebody please code review?

> zkpython memory leak
> 
>
> Key: ZOOKEEPER-792
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.1
> Environment: vmware workstation - guest OS:Linux python:2.4.3
>Reporter: Lei Zhang
>Assignee: Lei Zhang
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-792.patch
>
>
> We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
> client deadlock on session expiration, which is a definite plus!
> Unfortunately we are seeing memory leak that requires our zk clients to be 
> restarted every half-day. Valgrind result:
> ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
> loss record 255 of 670
> ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
> ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
> ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
> ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-732) Improper translation of error into Python exception

2010-08-12 Thread Lei Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Zhang updated ZOOKEEPER-732:


Attachment: ZOOKEEPER-732.patch

Attached is a patch that fixes issue 732. Can somebody please review?

> Improper translation of error into Python exception
> ---
>
> Key: ZOOKEEPER-732
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-732
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.2.2
>Reporter: Gustavo Niemeyer
>Assignee: Lei Zhang
>Priority: Minor
> Attachments: ZOOKEEPER-732.patch
>
>
> Apparently errors returned by the C library are not being correctly converted 
> into a Python exception in some cases: 
> >>> zookeeper.get_children(0, "/", None)
> Traceback (most recent call last):
>   File "", line 1, in 
> SystemError: error return without exception set

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-732) Improper translation of error into Python exception

2010-08-12 Thread Lei Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Zhang reassigned ZOOKEEPER-732:
---

Assignee: Lei Zhang

> Improper translation of error into Python exception
> ---
>
> Key: ZOOKEEPER-732
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-732
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.2.2
>Reporter: Gustavo Niemeyer
>Assignee: Lei Zhang
>Priority: Minor
>
> Apparently errors returned by the C library are not being correctly converted 
> into a Python exception in some cases: 
> >>> zookeeper.get_children(0, "/", None)
> Traceback (most recent call last):
>   File "", line 1, in 
> SystemError: error return without exception set

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-603) zkpython should do a better job of freeing memory under error conditions

2010-08-12 Thread Lei Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Zhang resolved ZOOKEEPER-603.
-

Resolution: Duplicate

> zkpython should do a better job of freeing memory under error conditions
> 
>
> Key: ZOOKEEPER-603
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-603
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.2.1
>Reporter: Henry Robinson
>Assignee: Lei Zhang
> Fix For: 3.4.0
>
>
> The general pattern is that the construction of a collection might fail, but 
> the module is not freeing the memory that it has already allocated. 
> Exceptions that are raised during this process aren't always propagated back 
> to the Python side either. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-603) zkpython should do a better job of freeing memory under error conditions

2010-08-12 Thread Lei Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Zhang reassigned ZOOKEEPER-603:
---

Assignee: Lei Zhang  (was: Henry Robinson)

> zkpython should do a better job of freeing memory under error conditions
> 
>
> Key: ZOOKEEPER-603
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-603
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.2.1
>Reporter: Henry Robinson
>Assignee: Lei Zhang
> Fix For: 3.4.0
>
>
> The general pattern is that the construction of a collection might fail, but 
> the module is not freeing the memory that it has already allocated. 
> Exceptions that are raised during this process aren't always propagated back 
> to the Python side either. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-559) valgrind warnings running zkpython bindings

2010-08-12 Thread Lei Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Zhang updated ZOOKEEPER-559:


Assignee: Lei Zhang  (was: Henry Robinson)

Can we make this a duplicate of 792?

> valgrind warnings running zkpython bindings
> ---
>
> Key: ZOOKEEPER-559
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-559
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0
>Reporter: Patrick Hunt
>Assignee: Lei Zhang
> Fix For: 3.4.0
>
> Attachments: valgrind-zk.tar.gz
>
>
> I'm seeing some weird behavior running zk-latencies.py
> http://github.com/phunt/zk-smoketest
> don't know if it's related to zkbindings itself, but I ran valgrind to see if 
> it noticed any issues. see attached.
> afaict these issues are related to zkpython binding, however I'm not sure. I 
> did run valgrind against the
> zookeeper c library tests and these issues were not highlighted. So I'm 
> thinking this is zkpython errors, however
> I'm not 100% sure. 
> Henry can you take a look?
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes

2010-08-12 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897883#action_12897883
 ] 

Patrick Hunt commented on ZOOKEEPER-845:


last I looked there wasn't much processing logic - that was the thing. Most of 
the commands we call out to toString or similar. Only in a few cases was there 
common code other than the class/thread wrapper. (there is probably some 
embedded logic that could be extracted though... we should do that, agree, but 
just saying it's not huge.).

> remove duplicate code from netty and nio ServerCnxn classes
> ---
>
> Key: ZOOKEEPER-845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Benjamin Reed
> Fix For: 3.4.0
>
>
> the code for handling the 4-letter words is duplicated between the nio and 
> netty versions of ServerCnxn. this makes maintenance problematic. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes

2010-08-12 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897880#action_12897880
 ] 

Benjamin Reed commented on ZOOKEEPER-845:
-

perhaps we could extract the actual processing logic from the threading model.

> remove duplicate code from netty and nio ServerCnxn classes
> ---
>
> Key: ZOOKEEPER-845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Benjamin Reed
> Fix For: 3.4.0
>
>
> the code for handling the 4-letter words is duplicated between the nio and 
> netty versions of ServerCnxn. this makes maintenance problematic. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed

2010-08-12 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-794:
---

Status: Open  (was: Patch Available)

cancelling patch - hudson failing.

> Callbacks are not invoked when the client is closed
> ---
>
> Key: ZOOKEEPER-794
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Alexis Midon
>Assignee: Alexis Midon
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, 
> ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch
>
>
> I noticed that ZooKeeper has different behaviors when calling synchronous or 
> asynchronous actions on a closed ZooKeeper client.
> Actually a synchronous call will throw a "session expired" exception while an 
> asynchronous call will do nothing. No exception, no callback invocation.
> Actually, even if the EventThread receives the Packet with the session 
> expired err code, the packet is never processed since the thread has been 
> killed by the ventOfDeath. So the call back is not invoked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed

2010-08-12 Thread Alexis Midon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897857#action_12897857
 ] 

Alexis Midon commented on ZOOKEEPER-794:


Yes, with the first patches, the call back ordering might be different from the 
event ordering. And this is one of the ZK core garanties right? Although in our 
case this is happening during the shutdown procedure.

I'll double check the patch.

> Callbacks are not invoked when the client is closed
> ---
>
> Key: ZOOKEEPER-794
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Alexis Midon
>Assignee: Alexis Midon
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, 
> ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch
>
>
> I noticed that ZooKeeper has different behaviors when calling synchronous or 
> asynchronous actions on a closed ZooKeeper client.
> Actually a synchronous call will throw a "session expired" exception while an 
> asynchronous call will do nothing. No exception, no callback invocation.
> Actually, even if the EventThread receives the Packet with the session 
> expired err code, the packet is never processed since the thread has been 
> killed by the ventOfDeath. So the call back is not invoked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed

2010-08-12 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897855#action_12897855
 ] 

Patrick Hunt commented on ZOOKEEPER-794:


Would be great to get this into 3.3.2, any update?

> Callbacks are not invoked when the client is closed
> ---
>
> Key: ZOOKEEPER-794
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Alexis Midon
>Assignee: Alexis Midon
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, 
> ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch
>
>
> I noticed that ZooKeeper has different behaviors when calling synchronous or 
> asynchronous actions on a closed ZooKeeper client.
> Actually a synchronous call will throw a "session expired" exception while an 
> asynchronous call will do nothing. No exception, no callback invocation.
> Actually, even if the EventThread receives the Packet with the session 
> expired err code, the packet is never processed since the thread has been 
> killed by the ventOfDeath. So the call back is not invoked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-795) eventThread isn't shutdown after a connection "session expired" event coming

2010-08-12 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897853#action_12897853
 ] 

Patrick Hunt commented on ZOOKEEPER-795:


This is blocking a release candidate for 3.3.2, if we can get this in soon I'll 
start running through the release process.

> eventThread isn't shutdown after a connection "session expired" event coming
> 
>
> Key: ZOOKEEPER-795
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.3.1
> Environment: ubuntu 10.04
>Reporter: mathieu barcikowski
>Assignee: Sergey Doroshenko
>Priority: Blocker
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch, 
> ZOOKEEPER-795.patch
>
>
> Hi,
> I notice a problem with the eventThread located in ClientCnxn.java file.
> The eventThread isn't shutdown after a connection "session expired" event 
> coming (i.e. never receive EventOfDeath).
> When a session timeout occurs and the session is marked as expired, the 
> connexion is fully closed (socket, SendThread...) expect for the eventThread.
> As a result, if i create a new zookeeper object and connect through it, I got 
> a zombi thread which will never be kill (as for the previous zookeeper 
> object, the state is already close, calling close again don't do anything).
> So everytime I will create a new zookeeper connection after a expired 
> session, I will have a one more zombi EventThread.
> How to reproduce :
> - Start a zookeeper client connection in debug mode
> - Pause the jvm enough time to the expired event occur
> - Watch for example with jvisualvm the list of threads, the sendThread is 
> succesfully killed, but the EventThread go to wait state for a infinity of 
> time
> - if you reopen a new zookeeper connection, and do again the previous steps, 
> another EventThread will be present in infinite wait state

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes

2010-08-12 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-845:
---

Fix Version/s: 3.4.0

> remove duplicate code from netty and nio ServerCnxn classes
> ---
>
> Key: ZOOKEEPER-845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Benjamin Reed
> Fix For: 3.4.0
>
>
> the code for handling the 4-letter words is duplicated between the nio and 
> netty versions of ServerCnxn. this makes maintenance problematic. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes

2010-08-12 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897852#action_12897852
 ] 

Patrick Hunt commented on ZOOKEEPER-845:


Agree, this is ugly, however if you look the issue is related to the fact that 
nio uses threads while netty doesn't.

One question in my mind, should netty be using a thread as well? The problem is 
that the channel could be closed before netty responds if the response is not 
handled in the worker request thread. In NIO we handle this by "handing off" 
ownership of the socket from the main run routine to the 4lw response thread. 
afaik we cannot do this in netty.

So really it's not a simple answer afaict. That's why I didn't do it cleanly in 
the first place.



> remove duplicate code from netty and nio ServerCnxn classes
> ---
>
> Key: ZOOKEEPER-845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Benjamin Reed
>
> the code for handling the 4-letter words is duplicated between the nio and 
> netty versions of ServerCnxn. this makes maintenance problematic. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: zookeeper seems to hang

2010-08-12 Thread Ted Yu
Please see:
https://issues.apache.org/jira/browse/ZOOKEEPER-846

On Thu, Aug 12, 2010 at 10:00 AM, Patrick Hunt  wrote:

> Great bug report Ted, the stack trace in particular is very useful.
>
> It looks like a timing bug where the client is not shutting down cleanly on
> the close call. I reviewed the code in question but nothing pops out at me.
> Also the logs just show us shutting down, nothing else from zk in there.
>
> Create a jira and attach all the detail you have available.
>
> Patrick
>
>
> On 08/11/2010 03:21 PM, Ted Yu wrote:
>
>> Hi,
>> Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where
>> Regionserver
>> process was shutting down and seemed to hang.
>>
>> Here is the bottom of region server log:
>> http://pastebin.com/YYawJ4jA
>>
>> zookeeper-3.2.2 is used.
>>
>> Your comment is welcome.
>>
>> Here is relevant portion from jstack - I attempted to attach jstack twice
>> in
>> my email to d...@hbase.apache.org but failed:
>>
>> "DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on
>> condition [0x]
>>java.lang.Thread.State: RUNNABLE
>>
>> "regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000
>> nid=0x6c81
>> in Object.wait() [0x43755000]
>>java.lang.Thread.State: WAITING (on object monitor)
>> at java.lang.Object.wait(Native Method)
>> - waiting on<0x2aaab76633c0>  (a
>> org.apache.zookeeper.ClientCnxn$Packet)
>> at java.lang.Object.wait(Object.java:485)
>> at
>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
>> - locked<0x2aaab76633c0>  (a
>> org.apache.zookeeper.ClientCnxn$Packet)
>> at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
>> at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
>> - locked<0x2aaabf5e0c30>  (a org.apache.zookeeper.ZooKeeper)
>> at
>>
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
>> at java.lang.Thread.run(Thread.java:619)
>>
>> "main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80
>> waiting
>> on condition [0x413f3000]
>>java.lang.Thread.State: WAITING (parking)
>> at sun.misc.Unsafe.park(Native Method)
>> - parking to wait for<0x2aaabf6e9150>  (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> at
>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>> at
>>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>> at
>>
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>> at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)
>>
>> "RMI TCP Accept-0" daemon prio=10 tid=0x2aabb822c800 nid=0x6c7d
>> runnable
>> [0x40752000]
>>java.lang.Thread.State: RUNNABLE
>> at java.net.PlainSocketImpl.socketAccept(Native Method)
>> at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
>> - locked<0x2aaabf585578>  (a java.net.SocksSocketImpl)
>> at java.net.ServerSocket.implAccept(ServerSocket.java:453)
>> at java.net.ServerSocket.accept(ServerSocket.java:421)
>> at
>>
>> sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
>> at
>>
>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
>> at
>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
>> at java.lang.Thread.run(Thread.java:619)
>>
>>


Re: High WTF count in ZooKeeper client code

2010-08-12 Thread Patrick Hunt
Thomas, btw, if you'd like (anyone really) to do a patch extracting 
deleterecursive from zk into some helper class I think that would be a 
good idea to get sooner rather than later.


Patrick

On 08/11/2010 11:36 PM, Thomas Koch wrote:

Patrick,

I saw your patch and was afraid you wouldn't like to wait for me and change
it. :-) I'll continue to work on my issues and also put them into jira for
review so that my team can start to work on the new API.
After your patch is applied, I'll adapt my patches, which should not change
anything to the user facing API of ZK.

Thomas

Patrick Hunt:

Thomas,

I see some patches already, which is great, however there's a
big/complicated refactoring that's pending here:

https://issues.apache.org/jira/browse/ZOOKEEPER-823

and to some extent here:
https://issues.apache.org/jira/browse/ZOOKEEPER-733

and refactorings in this code prior to 733/823 going in are going to
cause me much pain. (esp as I'm moving code around, creating new
classes, etc)

Could you hold off a bit on changes in this area until these two are
committed? Ben is working on the reviews now. Ben please prioritize
review/commit of these two.

Patrick

On 08/11/2010 08:23 AM, Thomas Koch wrote:

Hallo Mahadev,

thank you for your nice answer. Yes, we'll of cause preserve
compatibility. Otherwise there is no chance to get accepted.

I assume the following things must keep their interfaces:
ZooKeeper (It'll call the new interface in the background),
ASyncCallback, Watcher
We may want to change: ClientCnxn (faktor out some things, remove dep on
ZooKeeper)

I think other classes should not be involved at all in our issues. My
collegue Patrick was so kind to fill the jira issues.

Best regards,

Thomas

Mahadev Konar:

Also, I am assuming you have backwards compatability in mind when you
suggest these changes right?

The interfaces of zookeeper client should not be changing as part of
this, though the recursive delete hasn't been introduced yet (its only
available in 3.4, so we can move it out into a helper class).

Thanks
mahadev


On 8/11/10 7:40 AM, "Mahadev Konar"   wrote:

HI Thomas,

I read through the list of issues you posted, most of them seem

reasonable to fix. The one's you have mentioned below might take quite a
bit of time to fix and again a lot of testing! (just a warning :)). It
would be great if you'd want to clean this up for 3.4. Please go ahead
and file a jira. These improvements would be good to have in the
zookeeper java client.

For deleteRecursive, I definitely agree that it should be a helper
class. I don't believe it should be in the direct zookeeper api!

Thanks
mahadev


On 8/11/10 2:45 AM, "Thomas Koch"   wrote:

Hi,

I started yesterday to work on my idea of an alternative ZooKeeper
client interface.[1] Instead of methods on a ZooKeeper class, a user
should instantiate an Operation (Create, Delete, ...) and forward it to
an Executor which handles session loss errors and alikes.

By doing that, I got shocked by the sheer number of WTF issues I found.
I'm sorry for ranting now, but it gets quicker to the poing.

- Hostlist as string

The hostlist is parsed in the ctor of ClientCnxn. This violates the rule
of not doing (too much) work in a ctor. Instead the ClientCnxn should
receive an object of class "HostSet". HostSet could then be
instantiated e.g. with a comma separated string.

- cyclic dependency ClientCnxn, ZooKeeper

ZooKeeper instantiates ClientCnxn in its ctor with this and therefor
builds a cyclic dependency graph between both objects. This means, you
can't have the one without the other. So why did you bother do make
them to separate classes in the first place?
ClientCnxn accesses ZooKeeper.state. State should rather be a property
of ClientCnxn. And ClientCnxn accesses zooKeeper.get???Watches() in its
method primeConnection(). I've not yet checked, how this dependency
should be resolved better.

- Chroot is an attribute of ClientCnxn

I'd like to have one process that uses ZooKeeper for different things
(managing a list of work, locking some unrelated locks elsewhere). So
I've components that do this work inside the same process. These
components should get the same zookeeper-client reference chroot'ed for
their needs. So it'd be much better, if the ClientCnxn would not care
about the chroot.

- deleteRecursive does not belong to the other methods

DeleteRecursive has been committed to trunk already as a method to the
zookeeper class. So in the API it has the same level as the atomic
operations create, delete, getData, setData, etc. The user must get the
false impression, that deleteRecursive is also an atomic operation.
It would be better to have deleteRecursive in some helper class but not
that deep in zookeeper's core code. Maybe I'd like to have another
policy on how to react if deleteRecursive fails in the middle of its
work?

- massive code duplication in zookeeper class

Each operation calls validatePath, handles the chroot, calls ClientCnxn
and checks the retu

[jira] Updated: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call

2010-08-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-846:
-

Description: 
Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where 
Regionserver
process was shutting down and seemed to hang.

Here is the bottom of region server log:
http://pastebin.com/YYawJ4jA

zookeeper-3.2.2 is used.

Here is relevant portion from jstack - I attempted to attach jstack twice in my 
email to d...@hbase.apache.org but failed:

"DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition 
[0x]
   java.lang.Thread.State: RUNNABLE

"regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000 nid=0x6c81 in 
Object.wait() [0x43755000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x2aaab76633c0> (a 
org.apache.zookeeper.ClientCnxn$Packet)
at java.lang.Object.wait(Object.java:485)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
- locked <0x2aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet)
at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
- locked <0x2aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
at java.lang.Thread.run(Thread.java:619)

"main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on 
condition [0x413f3000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x2aaabf6e9150> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)


  was:
Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where 
Regionserver
process was shutting down and seemed to hang.

Here is the bottom of region server log:
http://pastebin.com/YYawJ4jA

zookeeper-3.2.2 is used.

Your comment is welcome.

Here is relevant portion from jstack - I attempted to attach jstack twice in my 
email to d...@hbase.apache.org but failed:

"DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition 
[0x]
   java.lang.Thread.State: RUNNABLE

"regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000 nid=0x6c81 in 
Object.wait() [0x43755000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x2aaab76633c0> (a 
org.apache.zookeeper.ClientCnxn$Packet)
at java.lang.Object.wait(Object.java:485)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
- locked <0x2aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet)
at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
- locked <0x2aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
at java.lang.Thread.run(Thread.java:619)

"main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on 
condition [0x413f3000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x2aaabf6e9150> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)



> zookeeper client doesn't shut down cleanly on the close call
> 
>
> Key: ZOOKEEPER-846
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.2.2
>Reporter: Ted Yu
> Attachments: rs-13.stack
>
>
> Using 

[jira] Updated: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call

2010-08-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-846:
-

Attachment: rs-13.stack

jstack for Region Server

> zookeeper client doesn't shut down cleanly on the close call
> 
>
> Key: ZOOKEEPER-846
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.2.2
>Reporter: Ted Yu
> Attachments: rs-13.stack
>
>
> Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where 
> Regionserver
> process was shutting down and seemed to hang.
> Here is the bottom of region server log:
> http://pastebin.com/YYawJ4jA
> zookeeper-3.2.2 is used.
> Your comment is welcome.
> Here is relevant portion from jstack - I attempted to attach jstack twice in 
> my email to d...@hbase.apache.org but failed:
> "DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on 
> condition [0x]
>java.lang.Thread.State: RUNNABLE
> "regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000 nid=0x6c81 
> in Object.wait() [0x43755000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x2aaab76633c0> (a 
> org.apache.zookeeper.ClientCnxn$Packet)
> at java.lang.Object.wait(Object.java:485)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
> - locked <0x2aaab76633c0> (a 
> org.apache.zookeeper.ClientCnxn$Packet)
> at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
> at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
> - locked <0x2aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
> at java.lang.Thread.run(Thread.java:619)
> "main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80 waiting 
> on condition [0x413f3000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x2aaabf6e9150> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call

2010-08-12 Thread Ted Yu (JIRA)
zookeeper client doesn't shut down cleanly on the close call


 Key: ZOOKEEPER-846
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.2.2
Reporter: Ted Yu


Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where 
Regionserver
process was shutting down and seemed to hang.

Here is the bottom of region server log:
http://pastebin.com/YYawJ4jA

zookeeper-3.2.2 is used.

Your comment is welcome.

Here is relevant portion from jstack - I attempted to attach jstack twice in my 
email to d...@hbase.apache.org but failed:

"DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition 
[0x]
   java.lang.Thread.State: RUNNABLE

"regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000 nid=0x6c81 in 
Object.wait() [0x43755000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x2aaab76633c0> (a 
org.apache.zookeeper.ClientCnxn$Packet)
at java.lang.Object.wait(Object.java:485)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
- locked <0x2aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet)
at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
- locked <0x2aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
at java.lang.Thread.run(Thread.java:619)

"main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on 
condition [0x413f3000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x2aaabf6e9150> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: C client unit test failure

2010-08-12 Thread Patrick Hunt
If you figure out what it is let us know, would be good to identify a 
"fix" if others run into the same problem.


Regards,

Patrick

On 08/12/2010 09:42 AM, Michi Mutsuzaki wrote:

Yeah, I tried installing libtool 2, but that caused some other issue. I'll
play around a bit more, and let you know if I find anything.

--Michi

On 8/12/10 1:40 AM, "Patrick Hunt"  wrote:


I've been running with v4 for a while and never noticed that issue...

You might try googling it, a quick search turned up:

"The meaning of "-static" changed between libtool 1.5 and libtool 2.x,
and libtool 2.x introduced "-static-libtool-libs" to provide the old
behavior." ...

Patrick

On 08/09/2010 11:28 AM, Michi Mutsuzaki wrote:

Running "ant jar" fixed the unit test failure.

I'm using g++ 3.4.6. Do I need later version to get rid of
-static-libtool-libs error?

$ g++ --version
g++ (GCC) 3.4.6 20060404 (Red Hat 3.4.6-9)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Thanks!
--Michi


On 8/7/10 11:57 PM, "Patrick Hunt"   wrote:


What version of g++ do you have? Capture the test output and attach to
your response. However I suspect that the server is not running (it's
necessary to test the c client), did you "ant jar" (or similar - ie
build the server) before testing the client?

Patrick

On 08/07/2010 04:57 PM, Michi Mutsuzaki wrote:

Hello,

I'm having 2 issues while compiling/running c client unit test in
branch-3.3.

1. I get this error from "make check":

  g++: unrecognized option `-static-libtool-libs'

2. testAsyncWatcherAutoReset is not working for me.

Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after
throwing an instance of 'CppUnit::Exception'  what():  equality assertion
failed- Expected: -101- Actual  : -4

Let me know if anybody has seen these errors.

Thanks!
--Michi











Re: zookeeper seems to hang

2010-08-12 Thread Patrick Hunt

Great bug report Ted, the stack trace in particular is very useful.

It looks like a timing bug where the client is not shutting down cleanly 
on the close call. I reviewed the code in question but nothing pops out 
at me. Also the logs just show us shutting down, nothing else from zk in 
there.


Create a jira and attach all the detail you have available.

Patrick

On 08/11/2010 03:21 PM, Ted Yu wrote:

Hi,
Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where
Regionserver
process was shutting down and seemed to hang.

Here is the bottom of region server log:
http://pastebin.com/YYawJ4jA

zookeeper-3.2.2 is used.

Your comment is welcome.

Here is relevant portion from jstack - I attempted to attach jstack twice in
my email to d...@hbase.apache.org but failed:

"DestroyJavaVM" prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on
condition [0x]
java.lang.Thread.State: RUNNABLE

"regionserver/10.32.42.245:60020" prio=10 tid=0x2aabb84ce000 nid=0x6c81
in Object.wait() [0x43755000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on<0x2aaab76633c0>  (a
org.apache.zookeeper.ClientCnxn$Packet)
 at java.lang.Object.wait(Object.java:485)
 at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
 - locked<0x2aaab76633c0>  (a
org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
 at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
 - locked<0x2aaabf5e0c30>  (a org.apache.zookeeper.ZooKeeper)
 at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
 at java.lang.Thread.run(Thread.java:619)

"main-EventThread" daemon prio=10 tid=0x43474000 nid=0x6c80 waiting
on condition [0x413f3000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for<0x2aaabf6e9150>  (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

"RMI TCP Accept-0" daemon prio=10 tid=0x2aabb822c800 nid=0x6c7d runnable
[0x40752000]
java.lang.Thread.State: RUNNABLE
 at java.net.PlainSocketImpl.socketAccept(Native Method)
 at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
 - locked<0x2aaabf585578>  (a java.net.SocksSocketImpl)
 at java.net.ServerSocket.implAccept(ServerSocket.java:453)
 at java.net.ServerSocket.accept(ServerSocket.java:421)
 at
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
 at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
 at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
 at java.lang.Thread.run(Thread.java:619)



[jira] Created: (ZOOKEEPER-845) remove duplicate code from netty and nio ServerCnxn classes

2010-08-12 Thread Benjamin Reed (JIRA)
remove duplicate code from netty and nio ServerCnxn classes
---

 Key: ZOOKEEPER-845
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-845
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Benjamin Reed


the code for handling the 4-letter words is duplicated between the nio and 
netty versions of ServerCnxn. this makes maintenance problematic. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-733) use netty to handle client connections

2010-08-12 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-733:


Hadoop Flags: [Reviewed]

+1 looks good to commit. Sergey raises a valid point, but i think it should be 
addressed in a separate jira given the size of this patch.

> use netty to handle client connections
> --
>
> Key: ZOOKEEPER-733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Benjamin Reed
>Assignee: Patrick Hunt
> Fix For: 3.4.0
>
> Attachments: accessive.jar, flowctl.zip, moved.zip, 
> QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, 
> ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
> ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, 
> ZOOKEEPER-733.patch, ZOOKEEPER-733.patch
>
>
> we currently have our own asynchronous NIO socket engine to be able to handle 
> lots of clients with a single thread. over time the engine has become more 
> complicated. we would also like the engine to use multiple threads on 
> machines with lots of cores. plus, we would like to be able to support things 
> like SSL. if we switch to netty, we can simplify our code and get the 
> previously mentioned benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: C client unit test failure

2010-08-12 Thread Michi Mutsuzaki
Yeah, I tried installing libtool 2, but that caused some other issue. I'll
play around a bit more, and let you know if I find anything.

--Michi

On 8/12/10 1:40 AM, "Patrick Hunt"  wrote:

> I've been running with v4 for a while and never noticed that issue...
> 
> You might try googling it, a quick search turned up:
> 
> "The meaning of "-static" changed between libtool 1.5 and libtool 2.x,
> and libtool 2.x introduced "-static-libtool-libs" to provide the old
> behavior." ...
> 
> Patrick
> 
> On 08/09/2010 11:28 AM, Michi Mutsuzaki wrote:
>> Running "ant jar" fixed the unit test failure.
>> 
>> I'm using g++ 3.4.6. Do I need later version to get rid of
>> -static-libtool-libs error?
>> 
>> $ g++ --version
>> g++ (GCC) 3.4.6 20060404 (Red Hat 3.4.6-9)
>> Copyright (C) 2006 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions.  There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>> 
>> Thanks!
>> --Michi
>> 
>> 
>> On 8/7/10 11:57 PM, "Patrick Hunt"  wrote:
>> 
>>> What version of g++ do you have? Capture the test output and attach to
>>> your response. However I suspect that the server is not running (it's
>>> necessary to test the c client), did you "ant jar" (or similar - ie
>>> build the server) before testing the client?
>>> 
>>> Patrick
>>> 
>>> On 08/07/2010 04:57 PM, Michi Mutsuzaki wrote:
 Hello,
 
 I'm having 2 issues while compiling/running c client unit test in
 branch-3.3.
 
 1. I get this error from "make check":
 
  g++: unrecognized option `-static-libtool-libs'
 
 2. testAsyncWatcherAutoReset is not working for me.
 
 Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after
 throwing an instance of 'CppUnit::Exception'  what():  equality assertion
 failed- Expected: -101- Actual  : -4
 
 Let me know if anybody has seen these errors.
 
 Thanks!
 --Michi
 
>>> 
>> 
> 



[jira] Created: (ZOOKEEPER-844) handle auth failure in java client

2010-08-12 Thread Camille Fournier (JIRA)
handle auth failure in java client
--

 Key: ZOOKEEPER-844
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-844
 Project: Zookeeper
  Issue Type: Improvement
  Components: java client
Affects Versions: 3.3.1
Reporter: Camille Fournier


ClientCnxn.java currently has the following code:
  if (replyHdr.getXid() == -4) {
// -2 is the xid for AuthPacket
// TODO: process AuthPacket here
if (LOG.isDebugEnabled()) {
LOG.debug("Got auth sessionid:0x"
+ Long.toHexString(sessionId));
}
return;
}

Auth failures appear to cause the server to disconnect but the client never 
gets a proper state change or notification that auth has failed, which makes 
handling this scenario very difficult as it causes the client to go into a loop 
of sending bad auth, getting disconnected, trying to reconnect, sending bad 
auth again, over and over. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-838) Chroot is an attribute of ClientCnxn

2010-08-12 Thread Thomas Koch (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897725#action_12897725
 ] 

Thomas Koch commented on ZOOKEEPER-838:
---

I need to change the internal logic of watch registration: The watches will now 
be registered under the full serverPath, since we want to have multiple 
ChangeRoots in the client. For each watcher I'll additionally save its ChRoot 
in the watchManager.

> Chroot is an attribute of ClientCnxn
> 
>
> Key: ZOOKEEPER-838
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-838
> Project: Zookeeper
>  Issue Type: Sub-task
>Reporter: Patrick Datko
>
> It would be better to have one process that uses ZooKeeper for different 
> things 
> (managing a list of work, locking some unrelated locks elsewhere). So there 
> are
> components that do this work inside the same process. These components should 
> get the same zookeeper-client reference chroot'ed for their needs.
> So it'd be much better, if the ClientCnxn would not care about the chroot.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: C client unit test failure

2010-08-12 Thread Patrick Hunt

I've been running with v4 for a while and never noticed that issue...

You might try googling it, a quick search turned up:

"The meaning of "-static" changed between libtool 1.5 and libtool 2.x,
and libtool 2.x introduced "-static-libtool-libs" to provide the old
behavior." ...

Patrick

On 08/09/2010 11:28 AM, Michi Mutsuzaki wrote:

Running "ant jar" fixed the unit test failure.

I'm using g++ 3.4.6. Do I need later version to get rid of
-static-libtool-libs error?

$ g++ --version
g++ (GCC) 3.4.6 20060404 (Red Hat 3.4.6-9)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Thanks!
--Michi


On 8/7/10 11:57 PM, "Patrick Hunt"  wrote:


What version of g++ do you have? Capture the test output and attach to
your response. However I suspect that the server is not running (it's
necessary to test the c client), did you "ant jar" (or similar - ie
build the server) before testing the client?

Patrick

On 08/07/2010 04:57 PM, Michi Mutsuzaki wrote:

Hello,

I'm having 2 issues while compiling/running c client unit test in
branch-3.3.

1. I get this error from "make check":

 g++: unrecognized option `-static-libtool-libs'

2. testAsyncWatcherAutoReset is not working for me.

Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after
throwing an instance of 'CppUnit::Exception'  what():  equality assertion
failed- Expected: -101- Actual  : -4

Let me know if anybody has seen these errors.

Thanks!
--Michi







Re: High WTF count in ZooKeeper client code

2010-08-12 Thread Patrick Hunt


On 08/11/2010 11:36 PM, Thomas Koch wrote:

I saw your patch and was afraid you wouldn't like to wait for me and change
it. :-) I'll continue to work on my issues and also put them into jira for
review so that my team can start to work on the new API.
After your patch is applied, I'll adapt my patches, which should not change
anything to the user facing API of ZK.


No problem, just didn't want you to get frustrated in case you didn't 
notice. As long as you go in with eyes wide open it's ok w/me. :-)


Patrick



Patrick Hunt:

Thomas,

I see some patches already, which is great, however there's a
big/complicated refactoring that's pending here:

https://issues.apache.org/jira/browse/ZOOKEEPER-823

and to some extent here:
https://issues.apache.org/jira/browse/ZOOKEEPER-733

and refactorings in this code prior to 733/823 going in are going to
cause me much pain. (esp as I'm moving code around, creating new
classes, etc)

Could you hold off a bit on changes in this area until these two are
committed? Ben is working on the reviews now. Ben please prioritize
review/commit of these two.

Patrick

On 08/11/2010 08:23 AM, Thomas Koch wrote:

Hallo Mahadev,

thank you for your nice answer. Yes, we'll of cause preserve
compatibility. Otherwise there is no chance to get accepted.

I assume the following things must keep their interfaces:
ZooKeeper (It'll call the new interface in the background),
ASyncCallback, Watcher
We may want to change: ClientCnxn (faktor out some things, remove dep on
ZooKeeper)

I think other classes should not be involved at all in our issues. My
collegue Patrick was so kind to fill the jira issues.

Best regards,

Thomas

Mahadev Konar:

Also, I am assuming you have backwards compatability in mind when you
suggest these changes right?

The interfaces of zookeeper client should not be changing as part of
this, though the recursive delete hasn't been introduced yet (its only
available in 3.4, so we can move it out into a helper class).

Thanks
mahadev


On 8/11/10 7:40 AM, "Mahadev Konar"   wrote:

HI Thomas,

I read through the list of issues you posted, most of them seem

reasonable to fix. The one's you have mentioned below might take quite a
bit of time to fix and again a lot of testing! (just a warning :)). It
would be great if you'd want to clean this up for 3.4. Please go ahead
and file a jira. These improvements would be good to have in the
zookeeper java client.

For deleteRecursive, I definitely agree that it should be a helper
class. I don't believe it should be in the direct zookeeper api!

Thanks
mahadev


On 8/11/10 2:45 AM, "Thomas Koch"   wrote:

Hi,

I started yesterday to work on my idea of an alternative ZooKeeper
client interface.[1] Instead of methods on a ZooKeeper class, a user
should instantiate an Operation (Create, Delete, ...) and forward it to
an Executor which handles session loss errors and alikes.

By doing that, I got shocked by the sheer number of WTF issues I found.
I'm sorry for ranting now, but it gets quicker to the poing.

- Hostlist as string

The hostlist is parsed in the ctor of ClientCnxn. This violates the rule
of not doing (too much) work in a ctor. Instead the ClientCnxn should
receive an object of class "HostSet". HostSet could then be
instantiated e.g. with a comma separated string.

- cyclic dependency ClientCnxn, ZooKeeper

ZooKeeper instantiates ClientCnxn in its ctor with this and therefor
builds a cyclic dependency graph between both objects. This means, you
can't have the one without the other. So why did you bother do make
them to separate classes in the first place?
ClientCnxn accesses ZooKeeper.state. State should rather be a property
of ClientCnxn. And ClientCnxn accesses zooKeeper.get???Watches() in its
method primeConnection(). I've not yet checked, how this dependency
should be resolved better.

- Chroot is an attribute of ClientCnxn

I'd like to have one process that uses ZooKeeper for different things
(managing a list of work, locking some unrelated locks elsewhere). So
I've components that do this work inside the same process. These
components should get the same zookeeper-client reference chroot'ed for
their needs. So it'd be much better, if the ClientCnxn would not care
about the chroot.

- deleteRecursive does not belong to the other methods

DeleteRecursive has been committed to trunk already as a method to the
zookeeper class. So in the API it has the same level as the atomic
operations create, delete, getData, setData, etc. The user must get the
false impression, that deleteRecursive is also an atomic operation.
It would be better to have deleteRecursive in some helper class but not
that deep in zookeeper's core code. Maybe I'd like to have another
policy on how to react if deleteRecursive fails in the middle of its
work?

- massive code duplication in zookeeper class

Each operation calls validatePath, handles the chroot, calls ClientCnxn
and checks the return header for error. I'd like to address this with
the oper