Re: Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)

2010-01-25 Thread Qing Yan
I agree, masterless is ideal but it is against KISS somehow About error handling, does ZK-22 means disconnection will be eliminated from API and will be solely handled by ZK implementation? I am not sure it is such a good idea though. Application layer need to be notified that communication

Re: Server exception when closing session

2010-01-25 Thread Patrick Hunt
GC and disk IO (transactional log in particular) will cause significant latency in some cases. See this for details on the types of things you should look at: http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting I've seen cases where the JVM will pause for 2+ minutes for GC, in some cases

Re: ZooKeeper Dashboard: Error: No module named zookeeper_dashboard.zkadmin

2010-01-25 Thread Patrick Hunt
Rename phunt-zookeeper_dashboard-43ce91a to zookeeper_dashboard (or change the django files inside phunt-zookeeper_dashboard-43ce91a to use this as the module name) Patrick Eric Scheie wrote: I am having trouble getting the ZooKeeper Dashboard up and running. I installed Django-1.1.1 and

Re: ZooKeeper Dashboard: Error: No module named zookeeper_dashboard.zkadmin

2010-01-25 Thread Eric Scheie
Renaming the directory worked, thanks! On Mon, Jan 25, 2010 at 10:01 AM, Patrick Hunt ph...@apache.org wrote: Rename phunt-zookeeper_dashboard-43ce91a to zookeeper_dashboard (or change the django files inside phunt-zookeeper_dashboard-43ce91a to use this as the module name) Patrick Eric

Re: Killing a zookeeper server

2010-01-25 Thread Jean-Daniel Cryans
Everything is here http://people.apache.org/~jdcryans/zk_election_bug.tar.gz The server we are trying to start is sv4borg222 (myid is 2) and we started it around 10:03:21 Thx! J-D On Mon, Jan 25, 2010 at 10:49 AM, Patrick Hunt ph...@apache.org wrote: 1) Capture the logs from all 5 servers 2)

Re: Killing a zookeeper server

2010-01-25 Thread Patrick Hunt
According to the log for 222 it can't open a connection to the election port (3888) for any of the other servers. This seems very unusual. Can you verify that ther's connectivity on that port btw 222 and all the other servers? Also, can you re-run the netstat with -a option? We can see the

Re: Killing a zookeeper server

2010-01-25 Thread Jean-Daniel Cryans
According to the log for 222 it can't open a connection to the election port (3888) for any of the other servers. This seems very unusual. Can you verify that ther's connectivity on that port btw 222 and all the other servers? jdcry...@sv4borg222:~$ telnet sv4borg224 3888 Trying

Re: Server exception when closing session

2010-01-25 Thread Ted Dunning
Be very cautious about misdirection here. It is easy to focus on the ZK server-side GC's. In my experience, I have had many more GC related ZK problems caused by *client* side GC. If the client checks out for a minute, you get disconnects and session expiration which is good for debugging that

Re: Killing a zookeeper server

2010-01-25 Thread Patrick Hunt
JD, there's something _very_ unusual in your setup. Are you running official released ZooKeeper code or something else? Either there is a misconfiguration on the other servers (the configs for the other servers is exactly the same as 222 right?), or perhaps some patches to ZK codebase that

Re: Server exception when closing session

2010-01-25 Thread Patrick Hunt
Good point Ted. FYI, this is the case with the HBase-ZK integration. The HBase java region server code, acting as a client of the ZK service, was seeing 2min gc pauses in some cases. This was causing the ZK client session to timeout. Tuning has helped, however until the G1 compactor is