date:20101007

[jira] Created: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

2010-10-07 Thread Austin Shoemaker (JIRA)

C client invokes watcher callbacks multiple times
-

 Key: ZOOKEEPER-890
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-890
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1
 Environment: Mac OS X 10.6.5
Reporter: Austin Shoemaker
Priority: Critical
 Attachments: watcher_twice.c

The collect_session_watchers function in zk_hashtable.c gathers watchers from 
active_node_watchers, active_exist_watchers, and active_child_watchers without 
removing the watchers from the table.

Please see attached repro case and patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

2010-10-07 Thread Austin Shoemaker (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Austin Shoemaker updated ZOOKEEPER-890:
---

Attachment: watcher_twice.c

 C client invokes watcher callbacks multiple times
 -

 Key: ZOOKEEPER-890
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-890
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1
 Environment: Mac OS X 10.6.5
Reporter: Austin Shoemaker
Priority: Critical
 Attachments: watcher_twice.c


 The collect_session_watchers function in zk_hashtable.c gathers watchers from 
 active_node_watchers, active_exist_watchers, and active_child_watchers 
 without removing the watchers from the table.
 Please see attached repro case and patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

2010-10-07 Thread Austin Shoemaker (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Austin Shoemaker updated ZOOKEEPER-890:
---

Attachment: ZOOKEEPER-890.patch

Patch that clears active watcher sets when broadcasting a session event to all 
watchers.

 C client invokes watcher callbacks multiple times
 -

 Key: ZOOKEEPER-890
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-890
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1
 Environment: Mac OS X 10.6.5
Reporter: Austin Shoemaker
Priority: Critical
 Attachments: watcher_twice.c, ZOOKEEPER-890.patch


 The collect_session_watchers function in zk_hashtable.c gathers watchers from 
 active_node_watchers, active_exist_watchers, and active_child_watchers 
 without removing the watchers from the table.
 Please see attached repro case and patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

2010-10-07 Thread Austin Shoemaker (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Austin Shoemaker updated ZOOKEEPER-890:
---

Description:
Code using the C client assumes that watcher callbacks are called exactly once.
If the watcher is called more than once, the process will likely overwrite
freed memory and/or crash.

collect_session_watchers (zk_hashtable.c) gathers watchers from
active_node_watchers, active_exist_watchers, and active_child_watchers without
removing them. This results in watchers being invoked more than once.

Test code is attached that reproduces the bug, along with a proposed patch.

was:
The collect_session_watchers function in zk_hashtable.c gathers watchers from
active_node_watchers, active_exist_watchers, and active_child_watchers without
removing the watchers from the table.

Please see attached repro case and patch.

C client invokes watcher callbacks multiple times
-

Key: ZOOKEEPER-890
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-890
Project: Zookeeper
Issue Type: Bug
Components: c client
Affects Versions: 3.3.1
Environment: Mac OS X 10.6.5
Reporter: Austin Shoemaker
Priority: Critical
Attachments: watcher_twice.c, ZOOKEEPER-890.patch

Code using the C client assumes that watcher callbacks are called exactly
once. If the watcher is called more than once, the process will likely
overwrite freed memory and/or crash.
collect_session_watchers (zk_hashtable.c) gathers watchers from
active_node_watchers, active_exist_watchers, and active_child_watchers
without removing them. This results in watchers being invoked more than once.
Test code is attached that reproduces the bug, along with a proposed patch.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-10-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918878#action_12918878
 ] 

Hudson commented on ZOOKEEPER-822:
--

Integrated in ZooKeeper-trunk #959 (See 
[https://hudson.apache.org/hudson/job/ZooKeeper-trunk/959/])
ZOOKEEPER-822. Leader election taking a long time to complete


 Leader election taking a long time  to complete
 ---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, 
 test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, 
 ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, 
 ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, 
 ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, 
 ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1


 Created a 3 node cluster.
 1 Fail the ZK leader
 2. Let leader election finish. Restart the leader and let it join the 
 3. Repeat 
 After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
 Note- we didn't have any ZK clients and no new znodes were created.
 zoo.cfg is shown below:
 #Mon Jul 19 12:15:10 UTC 2010
 server.1=192.168.4.12\:2888\:3888
 server.0=192.168.4.11\:2888\:3888
 clientPort=2181
 dataDir=/var/zookeeper
 syncLimit=2
 server.2=192.168.4.13\:2888\:3888
 initLimit=5
 tickTime=2000
 I have attached logs from two nodes that took a long time to form the cluster 
 after failing the leader. The leader was down anyways so logs from that node 
 shouldn't matter.
 Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-844) handle auth failure in java client

2010-10-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918879#action_12918879
 ] 

Hudson commented on ZOOKEEPER-844:
--

Integrated in ZooKeeper-trunk #959 (See 
[https://hudson.apache.org/hudson/job/ZooKeeper-trunk/959/])
ZOOKEEPER-844. handle auth failure in java client


 handle auth failure in java client
 --

 Key: ZOOKEEPER-844
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-844
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-844.patch, ZOOKEEPER332-844


 ClientCnxn.java currently has the following code:
   if (replyHdr.getXid() == -4) {
 // -2 is the xid for AuthPacket
 // TODO: process AuthPacket here
 if (LOG.isDebugEnabled()) {
 LOG.debug(Got auth sessionid:0x
 + Long.toHexString(sessionId));
 }
 return;
 }
 Auth failures appear to cause the server to disconnect but the client never 
 gets a proper state change or notification that auth has failed, which makes 
 handling this scenario very difficult as it causes the client to go into a 
 loop of sending bad auth, getting disconnected, trying to reconnect, sending 
 bad auth again, over and over. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-07 Thread Alexandre Hardy (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918940#action_12918940
]

Alexandre Hardy commented on ZOOKEEPER-885:
---

{{maxClientCnxns}} is set to 30, so 45 clients spread across 3 servers should
not be unreasonable, and I do have confirmation that a session is established
for every client (all 45 of them) before beginning the disk load with {{dd}}.

I'm aiming for 0 disconnects with this simple example.

Zookeeper drops connections under moderate IO load
--

Key: ZOOKEEPER-885
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
Project: Zookeeper
Issue Type: Bug
Components: server
Affects Versions: 3.2.2
Environment: Debian (Lenny)
1Gb RAM
swap disabled
100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
Attachments: WatcherTest.java

A zookeeper server under minimum load, with a number of clients watching
exactly one node will fail to maintain the connection when the machine is
subjected to moderate IO load.
In a specific test example we had three zookeeper servers running on
dedicated machines with 45 clients connected, watching exactly one node. The
clients would disconnect after moderate load was added to each of the
zookeeper servers with the command:
{noformat}
dd if=/dev/urandom of=/dev/mapper/nimbula-test
{noformat}
The {{dd}} command transferred data at a rate of about 4Mb/s.
The same thing happens with
{noformat}
dd if=/dev/zero of=/dev/mapper/nimbula-test
{noformat}
It seems strange that such a moderate load should cause instability in the
connection.
Very few other processes were running, the machines were setup to test the
connection instability we have experienced. Clients performed no other read
or mutation operations.
Although the documents state that minimal competing IO load should present on
the zookeeper server, it seems reasonable that moderate IO should not cause
problems in this case.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-07 Thread Alexandre Hardy (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alexandre Hardy updated ZOOKEEPER-885:
--

Attachment: zklogs.tar.gz

Attached are logs from the two sessions with disconnects. I have not filtered
the logs in any way. The logs for 3.3.1 are the most clear, and have exactly
one failure roughly 3 minutes after the logs start. Unfortunately the logs
don't offer much information (as far as I can make out). Should I enable more
verbose logging?

Zookeeper drops connections under moderate IO load
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: znode inconsistencies across ZooKeeper servers

2010-10-07 Thread Patrick Hunt

Vishal, this sounds like a bug in ZK to me. Can you create a JIRA with this
description, your configuration files from all servers, and the log files
from all servers during the time of the incident? If you could run the
servers in DEBUG level logging during the time you reproduce the issue that
would probably help:
https://issues.apache.org/jira/browse/ZOOKEEPER

Thanks!

Patrick

On Wed, Oct 6, 2010 at 2:57 PM, Vishal K vishalm...@gmail.com wrote:

 Hi Patrick,

 You are correct, the test restarts both ZooKeeper server and the client.
 The
 client opens a new connection after restarting. So we would expect that the
 ephmeral znode (/foo) to expire after the session timeout. However, the
 client with the new session creates the ephemeral znode (/foo) again after
 it reboots (it sets a watch for /foo and recreates /foo if it is deleted or
 doesn't exist). The client is not reusing the session ID. What I expect to
 see is that the older /foo should expire after which a new /foo should get
 created. Is my expectation correct?

 What confuses me is the following output of 3 successive getstat /foo
 requests on A (the zxid, time and owner fields).  Notice that the older
 znode reappeared.
 At the same time when I do getstat at B and C, I see the newer /foo.

 log4j:WARN No appenders could be found for logger
 (org.apache.zookeeper.ZooKeeper).
 log4j:WARN Please initialize the log4j system properly.
 cZxid = 0x105ef
 ctime = Tue Oct 05 15:00:50 UTC 2010
 mZxid = 0x105ef
 mtime = Tue Oct 05 15:00:50 UTC 2010
 pZxid = 0x105ef
 cversion = 0
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x2b7ce57ce4
 dataLength = 54
 numChildren = 0

 log4j:WARN No appenders could be found for logger
 (org.apache.zookeeper.ZooKeeper).
 log4j:WARN Please initialize the log4j system properly.
 cZxid = 0x10607
 ctime = Tue Oct 05 15:01:07 UTC 2010
 mZxid = 0x10607
 mtime = Tue Oct 05 15:01:07 UTC 2010
 pZxid = 0x10607
 cversion = 0
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x2b7ce5bda4
 dataLength = 54
 numChildren = 0

 log4j:WARN No appenders could be found for logger
 (org.apache.zookeeper.ZooKeeper).
 log4j:WARN Please initialize the log4j system properly.
 cZxid = 0x105ef
 ctime = Tue Oct 05 15:00:50 UTC 2010
 mZxid = 0x105ef
 mtime = Tue Oct 05 15:00:50 UTC 2010
 pZxid = 0x105ef
 cversion = 0
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x2b7ce57ce4
 dataLength = 54
 numChildren = 0

 Thanks for your help.

 -Vishal

 On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt ph...@apache.org wrote:

  Vishal the attachment seems to be getting removed by the list daemon (I
  don't have it), can you create a JIRA and attach? Also this is a good
  question for the ppl on zookeeper-user. (ccing)
 
  You are aware that ephemeral znodes are tied to the session? And that
  sessions only expire after the session timeout period? At which time any
  znodes created during that session are then deleted. The fact that you
 are
  killing your client process leads me to believe that you are not
 closing
  the session cleanly (meaning that it will eventually expire after the
  session timeout period), in which case the ephemeral znodes _should_
  reappear when A is restarted and successfully rejoins the cluster. (at
  least
  until the session timeout is exceeded)
 
  Patrick
 
  On Tue, Oct 5, 2010 at 11:04 AM, Vishal K vishalm...@gmail.com wrote:
 
   Hi,
  
   I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node A),
 I
   have a ZK client running that connects to the local server and creates
 an
   ephemeral znode to indicate clients on other nodes that it is online.
  
   I have test script that reboots the zookeeper server as well as client
 on
   A. The test does a getstat on the ephemeral znode created by the client
  on
   A. I am seeing that the view of znodes on A is different from the other
 2
   nodes. I can tell this from the session ID that the client gets after
   reconnecting to the local ZK server.
  
   So the test is simple:
   - kill zookeeper server and client process
   - wait for a few seconds
   - do zkCli.sh stat ...  test.out
  
   What I am seeing is that the ephemeral znode with old zxid, time, and
   session ID is reappearing on node A. I have attached the output of 3
   consecutive getstat requests of the test (see client_getstat.out).
 Notice
   that the third output is the same as the first one. That is, the old
   ephemeral znode reappeared at A. However, both B and C are showing the
   latest znode with correct time, zxid and session ID (output not
  attached).
  
   After this point, all following getstat requests on A are showing the
 old
   znode. Whereas, B and C show the correct znode every time the client on
 A
   comes online. This is something very perplexing. Earlier I thought this
  was
   a bug in my client implementation. But the test shows that the ZK
 server
  on
   A after reboot is out of sync with rest of the servers.

Re: znode inconsistencies across ZooKeeper servers

2010-10-07 Thread Vishal K

Sure,  I will reproduce it with debug enabled and create a JIRA. Thanks.

On Thu, Oct 7, 2010 at 12:23 PM, Patrick Hunt ph...@apache.org wrote:

 Vishal, this sounds like a bug in ZK to me. Can you create a JIRA with this
 description, your configuration files from all servers, and the log files
 from all servers during the time of the incident? If you could run the
 servers in DEBUG level logging during the time you reproduce the issue that
 would probably help:
 https://issues.apache.org/jira/browse/ZOOKEEPER

 Thanks!

 Patrick


 On Wed, Oct 6, 2010 at 2:57 PM, Vishal K vishalm...@gmail.com wrote:

 Hi Patrick,

 You are correct, the test restarts both ZooKeeper server and the client.
 The
 client opens a new connection after restarting. So we would expect that
 the
 ephmeral znode (/foo) to expire after the session timeout. However, the
 client with the new session creates the ephemeral znode (/foo) again after
 it reboots (it sets a watch for /foo and recreates /foo if it is deleted
 or
 doesn't exist). The client is not reusing the session ID. What I expect to
 see is that the older /foo should expire after which a new /foo should get
 created. Is my expectation correct?

 What confuses me is the following output of 3 successive getstat /foo
 requests on A (the zxid, time and owner fields).  Notice that the older
 znode reappeared.
 At the same time when I do getstat at B and C, I see the newer /foo.

 log4j:WARN No appenders could be found for logger
 (org.apache.zookeeper.ZooKeeper).
 log4j:WARN Please initialize the log4j system properly.
 cZxid = 0x105ef
 ctime = Tue Oct 05 15:00:50 UTC 2010
 mZxid = 0x105ef
 mtime = Tue Oct 05 15:00:50 UTC 2010
 pZxid = 0x105ef
 cversion = 0
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x2b7ce57ce4
 dataLength = 54
 numChildren = 0

 log4j:WARN No appenders could be found for logger
 (org.apache.zookeeper.ZooKeeper).
 log4j:WARN Please initialize the log4j system properly.
 cZxid = 0x10607
 ctime = Tue Oct 05 15:01:07 UTC 2010
 mZxid = 0x10607
 mtime = Tue Oct 05 15:01:07 UTC 2010
 pZxid = 0x10607
 cversion = 0
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x2b7ce5bda4
 dataLength = 54
 numChildren = 0

 log4j:WARN No appenders could be found for logger
 (org.apache.zookeeper.ZooKeeper).
 log4j:WARN Please initialize the log4j system properly.
 cZxid = 0x105ef
 ctime = Tue Oct 05 15:00:50 UTC 2010
 mZxid = 0x105ef
 mtime = Tue Oct 05 15:00:50 UTC 2010
 pZxid = 0x105ef
 cversion = 0
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x2b7ce57ce4
 dataLength = 54
 numChildren = 0

 Thanks for your help.

 -Vishal

 On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt ph...@apache.org wrote:

  Vishal the attachment seems to be getting removed by the list daemon (I
  don't have it), can you create a JIRA and attach? Also this is a good
  question for the ppl on zookeeper-user. (ccing)
 
  You are aware that ephemeral znodes are tied to the session? And that
  sessions only expire after the session timeout period? At which time any
  znodes created during that session are then deleted. The fact that you
 are
  killing your client process leads me to believe that you are not
 closing
  the session cleanly (meaning that it will eventually expire after the
  session timeout period), in which case the ephemeral znodes _should_
  reappear when A is restarted and successfully rejoins the cluster. (at
  least
  until the session timeout is exceeded)
 
  Patrick
 
  On Tue, Oct 5, 2010 at 11:04 AM, Vishal K vishalm...@gmail.com wrote:
 
   Hi,
  
   I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node
 A), I
   have a ZK client running that connects to the local server and creates
 an
   ephemeral znode to indicate clients on other nodes that it is online.
  
   I have test script that reboots the zookeeper server as well as client
 on
   A. The test does a getstat on the ephemeral znode created by the
 client
  on
   A. I am seeing that the view of znodes on A is different from the
 other 2
   nodes. I can tell this from the session ID that the client gets after
   reconnecting to the local ZK server.
  
   So the test is simple:
   - kill zookeeper server and client process
   - wait for a few seconds
   - do zkCli.sh stat ...  test.out
  
   What I am seeing is that the ephemeral znode with old zxid, time, and
   session ID is reappearing on node A. I have attached the output of 3
   consecutive getstat requests of the test (see client_getstat.out).
 Notice
   that the third output is the same as the first one. That is, the old
   ephemeral znode reappeared at A. However, both B and C are showing the
   latest znode with correct time, zxid and session ID (output not
  attached).
  
   After this point, all following getstat requests on A are showing the
 old
   znode. Whereas, B and C show the correct znode every time the client
 on A
   comes online. This is something very perplexing. Earlier I thought

[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-07 Thread Patrick Hunt (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918984#action_12918984
]

Patrick Hunt commented on ZOOKEEPER-885:

bq. I do have confirmation that a session is established for every client (all
45 of them) before beginning the disk load with dd.

I see, I was just trying to reduce variables. that should be fine then.

I see this in the logs:
2010-10-07 14:49:13,956 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server
environment:java.version=1.6.0_0
2010-10-07 14:49:13,960 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server
environment:java.vendor=Sun Microsystems Inc.
2010-10-07 14:49:13,960 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:environm...@97] - Server
environment:java.home=/usr/lib/jvm/java-6-openjdk/jre

I'm not sure many users are running openjdk, also 1.6.0_0 is very old (I have
1.6.0_18 openjdk on my system). You should upgrade to a recent version of
openjdk at the least, although I'd highly suggest running with the official
(and recent) sun jdk. (again, this is to reduce variables)

Also I noticed this in the server log for 1 server, it seems to be
misconfigured, perhaps you can fix that? (normal_3.3.1/192.168.131.12.log)

2010-10-07 14:49:13,979 - FATAL [main:quorumpeerm...@83] - Invalid config,
exiting abnormally

bq. Should I enable more verbose logging?

Yes, give that a try, perhaps run with TRACE logging turned on. If you can
upload one of those logs I'll take a look.

Right now we have this in the server log:

2010-10-07 14:51:32,961 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@633] -
EndOfStreamException: Unable to read additional data from client sessionid
0x22b872ad9ff000c, likely client has closed socket
2010-10-07 14:51:32,962 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1434] - Closed socket
connection for client /10.23.4.95:59738 which had sessionid 0x22b872ad9ff000c

This indicates that the client is closing the connection (EOS).

Please capture the logs on your client and upload one of them. Perhaps run that
at DEBUG level as well. That will give us more insight into why the client is
closing it's side of the connection (at least from the server's perspective).

Thanks for the help on this!

Zookeeper drops connections under moderate IO load
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-07 Thread Patrick Hunt (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918984#action_12918984
]

Patrick Hunt edited comment on ZOOKEEPER-885 at 10/7/10 1:34 PM:
-

bq. I do have confirmation that a session is established for every client (all
45 of them) before beginning the disk load with dd.

I see, I was just trying to reduce variables. that should be fine then.

Also I noticed this in the server log for 1 server, it seems to be
misconfigured, perhaps you can fix that? (normal_3.3.1/192.168.131.12.log)

2010-10-07 14:49:13,979 - FATAL [main:quorumpeerm...@83] - Invalid config,
exiting abnormally

bq. Should I enable more verbose logging?

Yes, give that a try, perhaps run with TRACE logging turned on. If you can
upload one of those logs I'll take a look.

Right now we have this in the server log:

This indicates that the client is closing the connection (EOS).

Please capture the logs on your client and upload one of them. Perhaps run that
at TRACE level as well. That will give us more insight into why the client is
closing it's side of the connection (at least from the server's perspective).

Thanks for the help on this!

was (Author: phunt):
bq. I do have confirmation that a session is established for every client
(all 45 of them) before beginning the disk load with dd.

I see, I was just trying to reduce variables. that should be fine then.

Also I noticed this in the server log for 1 server, it seems to be
misconfigured, perhaps you can fix that? (normal_3.3.1/192.168.131.12.log)

2010-10-07 14:49:13,979 - FATAL [main:quorumpeerm...@83] - Invalid config,
exiting abnormally

bq. Should I enable more verbose logging?

Yes, give that a try, perhaps run with TRACE logging turned on. If you can
upload one of those logs I'll take a look.

Right now we have this in the server log:

This indicates that the client is closing the connection (EOS).

Thanks for the help on this!

Zookeeper drops connections under moderate IO load
--

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

2010-10-07 Thread Thomas Koch (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Koch updated ZOOKEEPER-823:
--

Attachment: ZOOKEEPER-823.patch

I may have fixed another issue:

I wrapped sendThread.readResponse(incomingBuffer) into a synchronization on the
OutgoingQueue, because it might happen otherwise, that a package is send over
netty and processed by the server, but not yet added to the pendingQueue. This
fix solved all the Heisenbugs I saw.
However there's still a bug with ASyncHammer and that the wait to join threads
times out. I added more Debugging information. The Thread that times out hangs
on ClientCnxnSocketNetty.wakeupCnxn where it waits for the
synchronized(outgoingQueue).
It seems that the outgoingQueue is already owned and blocked in the doWrites
method, hanging on write.awaitUninterruptibly(). doWrites is called by
doTransport where the synchronized(outgoingQueue) happens.

update ZooKeeper java client to optionally use Netty for connections

Key: ZOOKEEPER-823
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823
Project: Zookeeper
Issue Type: New Feature
Components: java client
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Fix For: 3.4.0

Attachments: NettyNettySuiteTest.rtf,
TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz,
ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch,
ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch,
ZOOKEEPER-823.patch, ZOOKEEPER-823.patch

This jira will port the client side connection code to use netty rather than
direct nio.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

2010-10-07 Thread Thomas Koch (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-823:
--

Status: Patch Available  (was: Open)

 update ZooKeeper java client to optionally use Netty for connections
 

 Key: ZOOKEEPER-823
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823
 Project: Zookeeper
  Issue Type: New Feature
  Components: java client
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: NettyNettySuiteTest.rtf, 
 TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch


 This jira will port the client side connection code to use netty rather than 
 direct nio.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

2010-10-07 Thread Thomas Koch (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-823:
--

Status: Open  (was: Patch Available)

 update ZooKeeper java client to optionally use Netty for connections
 

 Key: ZOOKEEPER-823
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823
 Project: Zookeeper
  Issue Type: New Feature
  Components: java client
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: NettyNettySuiteTest.rtf, 
 TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch


 This jira will port the client side connection code to use netty rather than 
 direct nio.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

2010-10-07 Thread Thomas Koch (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-823:
--

Attachment: ZOOKEEPER-823.patch

I did another version of the patch with an example how I'd solve the deadlock 
mentioned in my last comment. I made the synchronized blocks in doTransport and 
doWrites smaller.

 update ZooKeeper java client to optionally use Netty for connections
 

 Key: ZOOKEEPER-823
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823
 Project: Zookeeper
  Issue Type: New Feature
  Components: java client
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: NettyNettySuiteTest.rtf, 
 TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch


 This jira will port the client side connection code to use netty rather than 
 direct nio.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

2010-10-07 Thread Thomas Koch (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-823:
--

Attachment: (was: ZOOKEEPER-823.patch)

 update ZooKeeper java client to optionally use Netty for connections
 

 Key: ZOOKEEPER-823
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823
 Project: Zookeeper
  Issue Type: New Feature
  Components: java client
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: NettyNettySuiteTest.rtf, 
 TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch


 This jira will port the client side connection code to use netty rather than 
 direct nio.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

2010-10-07 Thread Thomas Koch (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-823:
--

Attachment: ZOOKEEPER-823.patch

 update ZooKeeper java client to optionally use Netty for connections
 

 Key: ZOOKEEPER-823
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823
 Project: Zookeeper
  Issue Type: New Feature
  Components: java client
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: NettyNettySuiteTest.rtf, 
 TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch


 This jira will port the client side connection code to use netty rather than 
 direct nio.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

2010-10-07 Thread Jared Cantwell (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919087#action_12919087
]

Jared Cantwell commented on ZOOKEEPER-890:
--

I don't believe the C-client makes the guarantee that watcher callbacks are
called exactly once. Callbacks are called for different reasons, including:

- connection lost event
- connection reestablished event
- session lost event
- data changed event

Only the last two events make the guarantee about being called exactly once,
but the first two connection events can be called numerous times until either
one of the last two events happens. I may be missing some events, but that's
the general idea. Bottom line is the callback can receive events of type
ZOO_SESSION_EVENT multiple times. I believe this was by design.

C client invokes watcher callbacks multiple times
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

2010-10-07 Thread Austin Shoemaker (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919094#action_12919094
 ] 

Austin Shoemaker commented on ZOOKEEPER-890:


That sounds like a good design. Perhaps it could be clarified in the 
documentation?
http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.html#ch_zkWatches

If this is correct behavior then the Python client needs to be fixed to not 
delete the watcher on session events. Will file a separate bug on that.

 C client invokes watcher callbacks multiple times
 -

 Key: ZOOKEEPER-890
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-890
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.1
 Environment: Mac OS X 10.6.5
Reporter: Austin Shoemaker
Priority: Critical
 Attachments: watcher_twice.c, ZOOKEEPER-890.patch


 Code using the C client assumes that watcher callbacks are called exactly 
 once. If the watcher is called more than once, the process will likely 
 overwrite freed memory and/or crash.
 collect_session_watchers (zk_hashtable.c) gathers watchers from 
 active_node_watchers, active_exist_watchers, and active_child_watchers 
 without removing them. This results in watchers being invoked more than once.
 Test code is attached that reproduces the bug, along with a proposed patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

2010-10-07 Thread Austin Shoemaker (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Austin Shoemaker updated ZOOKEEPER-888:
---

Attachment: ZOOKEEPER-888.patch

Path that prevents freeing a watcher in response to a session event, per the 
feedback in ZOOKEEPER-890.

 c-client / zkpython: Double free corruption on node watcher
 ---

 Key: ZOOKEEPER-888
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib-bindings
Affects Versions: 3.3.1
Reporter: Lukas
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: resume-segfault.py, ZOOKEEPER-888.patch


 the c-client / zkpython wrapper invokes already freed watcher callback
 steps to reproduce:
   0. start a zookeper server on your machine
   1. run the attached python script
   2. suspend the zookeeper server process (e.g. using `pkill -STOP -f 
 org.apache.zookeeper.server.quorum.QuorumPeerMain` )
   3. wait until the connection and the node observer fired with a session 
 event
   4. resume the zookeeper server process  (e.g. using `pkill -CONT -f 
 org.apache.zookeeper.server.quorum.QuorumPeerMain` )
 - the client tries to dispatch the node observer function again, but it was 
 already freed - double free corruption

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-891) Allow non-numeric version strings

2010-10-07 Thread Eli Collins (JIRA)

Allow non-numeric version strings
-

 Key: ZOOKEEPER-891
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-891
 Project: Zookeeper
  Issue Type: Improvement
  Components: build
Reporter: Eli Collins
Priority: Minor
 Fix For: 3.4.0, 4.0.0


Non-numeric version strings (eg -dev) or -are not currently accepted, you 
either get an error (Invalid version number format, must be x.y.z) or if you 
pass x.y.z-dev or x.y.z+1 you'll get a NumberFormatException.  Would be useful 
to allow non-numeric versions. 

{noformat}
version-info:
 [java] All version-related parameters must be valid integers!
 [java] Exception in thread main java.lang.NumberFormatException: For 
input string: 3-dev
 [java] at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
 [java] at java.lang.Integer.parseInt(Integer.java:458)
 [java] at java.lang.Integer.parseInt(Integer.java:499)
 [java] at 
org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131)
 [java] Java Result: 1
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

[jira] Updated: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

[jira] Updated: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

[jira] Updated: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete

[jira] Commented: (ZOOKEEPER-844) handle auth failure in java client

[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

Re: znode inconsistencies across ZooKeeper servers

Re: znode inconsistencies across ZooKeeper servers

[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

[jira] Issue Comment Edited: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

[jira] Commented: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

[jira] Commented: (ZOOKEEPER-890) C client invokes watcher callbacks multiple times

[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

[jira] Created: (ZOOKEEPER-891) Allow non-numeric version strings

22 matches

Site Navigation

Mail list logo

Footer information