[jira] Commented: (ZOOKEEPER-937) test -e not available on solaris /bin/sh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934066#action_12934066 ] Hadoop QA commented on ZOOKEEPER-937: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12460076/zk_solaris_zkEnv.patch against trunk revision 1036967. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/41//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/41//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/41//console This message is automatically generated. > test -e not available on solaris /bin/sh > > > Key: ZOOKEEPER-937 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.3.0, 3.3.1, 3.3.2 > Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris >Reporter: Erik Hetzner >Assignee: Erik Hetzner > Fix For: 3.4.0 > > Attachments: zk_solaris_zkEnv.patch, zk_solaris_zkEnv.patch > > > test -e FILENAME is not support on /bin/sh in solaris. This is used in > bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-937: Status: Patch Available (was: Open) > test -e not available on solaris /bin/sh > > > Key: ZOOKEEPER-937 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.3.2, 3.3.1, 3.3.0 > Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris >Reporter: Erik Hetzner >Assignee: Erik Hetzner > Fix For: 3.4.0 > > Attachments: zk_solaris_zkEnv.patch, zk_solaris_zkEnv.patch > > > test -e FILENAME is not support on /bin/sh in solaris. This is used in > bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-937: Status: Open (was: Patch Available) > test -e not available on solaris /bin/sh > > > Key: ZOOKEEPER-937 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.3.2, 3.3.1, 3.3.0 > Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris >Reporter: Erik Hetzner >Assignee: Erik Hetzner > Fix For: 3.4.0 > > Attachments: zk_solaris_zkEnv.patch, zk_solaris_zkEnv.patch > > > test -e FILENAME is not support on /bin/sh in solaris. This is used in > bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hetzner updated ZOOKEEPER-937: --- Attachment: zk_solaris_zkEnv.patch patch with proper path. > test -e not available on solaris /bin/sh > > > Key: ZOOKEEPER-937 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.3.0, 3.3.1, 3.3.2 > Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris >Reporter: Erik Hetzner >Assignee: Erik Hetzner > Fix For: 3.4.0 > > Attachments: zk_solaris_zkEnv.patch, zk_solaris_zkEnv.patch > > > test -e FILENAME is not support on /bin/sh in solaris. This is used in > bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-937) test -e not available on solaris /bin/sh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934057#action_12934057 ] Hadoop QA commented on ZOOKEEPER-937: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12460061/zk_solaris_zkEnv.patch against trunk revision 1036967. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/40//console This message is automatically generated. > test -e not available on solaris /bin/sh > > > Key: ZOOKEEPER-937 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.3.0, 3.3.1, 3.3.2 > Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris >Reporter: Erik Hetzner >Assignee: Erik Hetzner > Fix For: 3.4.0 > > Attachments: zk_solaris_zkEnv.patch > > > test -e FILENAME is not support on /bin/sh in solaris. This is used in > bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-937) test -e not available on solaris /bin/sh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934052#action_12934052 ] Mahadev konar commented on ZOOKEEPER-937: - +1 the patch looks good. marking it PA. > test -e not available on solaris /bin/sh > > > Key: ZOOKEEPER-937 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.3.0, 3.3.1, 3.3.2 > Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris >Reporter: Erik Hetzner >Assignee: Erik Hetzner > Fix For: 3.4.0 > > Attachments: zk_solaris_zkEnv.patch > > > test -e FILENAME is not support on /bin/sh in solaris. This is used in > bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-937: Status: Patch Available (was: Open) > test -e not available on solaris /bin/sh > > > Key: ZOOKEEPER-937 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.3.2, 3.3.1, 3.3.0 > Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris >Reporter: Erik Hetzner >Assignee: Erik Hetzner > Fix For: 3.4.0 > > Attachments: zk_solaris_zkEnv.patch > > > test -e FILENAME is not support on /bin/sh in solaris. This is used in > bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-937) test -e not available on solaris /bin/sh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar reassigned ZOOKEEPER-937: --- Assignee: Erik Hetzner > test -e not available on solaris /bin/sh > > > Key: ZOOKEEPER-937 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.3.0, 3.3.1, 3.3.2 > Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris >Reporter: Erik Hetzner >Assignee: Erik Hetzner > Fix For: 3.4.0 > > Attachments: zk_solaris_zkEnv.patch > > > test -e FILENAME is not support on /bin/sh in solaris. This is used in > bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-937: Fix Version/s: 3.4.0 > test -e not available on solaris /bin/sh > > > Key: ZOOKEEPER-937 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.3.0, 3.3.1, 3.3.2 > Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris >Reporter: Erik Hetzner > Fix For: 3.4.0 > > Attachments: zk_solaris_zkEnv.patch > > > test -e FILENAME is not support on /bin/sh in solaris. This is used in > bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Concurrent primitives library - shared lock
Thanks Lin. Thanks mahadev On 11/18/10 2:00 AM, "Lin Chia-Hung" wrote: Hi According to the mail (http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201005.mbox/%3c4bfd646c.2040...@apache.org%3e) I create a jira (ZOOKEEPER-935) with a patch attached, containing shared lock feature. Please let me know if any issue in terms of this. Thanks, Chiahung
[jira] Created: (ZOOKEEPER-937) test -e not available on solaris /bin/sh
test -e not available on solaris /bin/sh Key: ZOOKEEPER-937 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937 Project: Zookeeper Issue Type: Bug Components: scripts Affects Versions: 3.3.2, 3.3.1, 3.3.0 Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris Reporter: Erik Hetzner Attachments: zk_solaris_zkEnv.patch test -e FILENAME is not support on /bin/sh in solaris. This is used in bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hetzner updated ZOOKEEPER-937: --- Attachment: zk_solaris_zkEnv.patch > test -e not available on solaris /bin/sh > > > Key: ZOOKEEPER-937 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937 > Project: Zookeeper > Issue Type: Bug > Components: scripts >Affects Versions: 3.3.0, 3.3.1, 3.3.2 > Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris >Reporter: Erik Hetzner > Attachments: zk_solaris_zkEnv.patch > > > test -e FILENAME is not support on /bin/sh in solaris. This is used in > bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933967#action_12933967 ] Hadoop QA commented on ZOOKEEPER-880: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12460041/ZOOKEEPER-880.patch against trunk revision 1036967. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/39//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/39//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/39//console This message is automatically generated. > QuorumCnxManager$SendWorker grows without bounds > > > Key: ZOOKEEPER-880 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.2.2 >Reporter: Jean-Daniel Cryans >Priority: Critical > Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, > hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, > TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch > > > We're seeing an issue where one server in the ensemble has a steady growing > number of QuorumCnxManager$SendWorker threads up to a point where the OS runs > out of native threads, and at the same time we see a lot of exceptions in the > logs. This is on 3.2.2 and our config looks like: > {noformat} > tickTime=3000 > dataDir=/somewhere_thats_not_tmp > clientPort=2181 > initLimit=10 > syncLimit=5 > server.0=sv4borg9:2888:3888 > server.1=sv4borg10:2888:3888 > server.2=sv4borg11:2888:3888 > server.3=sv4borg12:2888:3888 > server.4=sv4borg13:2888:3888 > {noformat} > The issue is on the first server. I'm going to attach threads dumps and logs > in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Please commit ZOOKEEPER-836
Hi, Hudson complains about 8 findbugs issues in ZOOKEEPER-836, but I did not introduce those. They are just in a java class I happened to modify. Could you be so kind to review and commit? Then I can continue tomorrow with ZOOKEEPER-849. Best regards, Thomas Koch, http://www.koch.ro
[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal K updated ZOOKEEPER-880: --- Status: Patch Available (was: Open) patch for trunk > QuorumCnxManager$SendWorker grows without bounds > > > Key: ZOOKEEPER-880 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.2.2 >Reporter: Jean-Daniel Cryans >Priority: Critical > Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, > hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, > TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch > > > We're seeing an issue where one server in the ensemble has a steady growing > number of QuorumCnxManager$SendWorker threads up to a point where the OS runs > out of native threads, and at the same time we see a lot of exceptions in the > logs. This is on 3.2.2 and our config looks like: > {noformat} > tickTime=3000 > dataDir=/somewhere_thats_not_tmp > clientPort=2181 > initLimit=10 > syncLimit=5 > server.0=sv4borg9:2888:3888 > server.1=sv4borg10:2888:3888 > server.2=sv4borg11:2888:3888 > server.3=sv4borg12:2888:3888 > server.4=sv4borg13:2888:3888 > {noformat} > The issue is on the first server. I'm going to attach threads dumps and logs > in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal K updated ZOOKEEPER-880: --- Attachment: ZOOKEEPER-880.patch The root cause of frequent disconnect needs to be resolved. In the mean time, I have fixed the problem that was causing the leak of every other thread of SendWorker. I tested the patch by connecting to 3888 on one of the servers using telnet. 2010-11-19 14:51:10,081 - INFO [/10.17.119.101:3888:quorumcnxmanager$liste...@477] - Received connection request /10.16.251.39:2074 2010-11-19 14:51:14,364 - DEBUG [/10.17.119.101:3888:quorumcnxmanager$sendwor...@553] - Address of remote peer: 8103510703875099187 2010-11-19 14:51:19,440 - WARN [Thread-7:quorumcnxmanager$recvwor...@726] - Connection broken for id 8103510703875099187, my id = 1, error = java.io.IOException: Received packet with invalid packet: 218824692 at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:711) 2010-11-19 14:51:19,441 - WARN [Thread-7:quorumcnxmanager$recvwor...@730] - Interrupting SendWorker <= SendWorker is getting killed. 2010-11-19 14:51:19,442 - DEBUG [Thread-7:quorumcnxmanager$sendwor...@571] - Calling finish for 8103510703875099187 2010-11-19 14:51:19,443 - DEBUG [Thread-7:quorumcnxmanager$sendwor...@591] - Removing entry from senderWorkerMap sid=8103510703875099187 2010-11-19 14:51:19,443 - WARN [Thread-6:quorumcnxmanager$sendwor...@643] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:631) 2010-11-19 14:51:19,456 - DEBUG [Thread-6:quorumcnxmanager$sendwor...@571] - Calling finish for 8103510703875099187 2010-11-19 14:51:19,457 - WARN [Thread-6:quorumcnxmanager$sendwor...@652] - Send worker leaving thread Can you see if this fixes the problem? > QuorumCnxManager$SendWorker grows without bounds > > > Key: ZOOKEEPER-880 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.2.2 >Reporter: Jean-Daniel Cryans >Priority: Critical > Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, > hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, > TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch > > > We're seeing an issue where one server in the ensemble has a steady growing > number of QuorumCnxManager$SendWorker threads up to a point where the OS runs > out of native threads, and at the same time we see a lot of exceptions in the > logs. This is on 3.2.2 and our config looks like: > {noformat} > tickTime=3000 > dataDir=/somewhere_thats_not_tmp > clientPort=2181 > initLimit=10 > syncLimit=5 > server.0=sv4borg9:2888:3888 > server.1=sv4borg10:2888:3888 > server.2=sv4borg11:2888:3888 > server.3=sv4borg12:2888:3888 > server.4=sv4borg13:2888:3888 > {noformat} > The issue is on the first server. I'm going to attach threads dumps and logs > in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933928#action_12933928 ] Flavio Junqueira commented on ZOOKEEPER-880: I think we agree that monitoring alone was not causing the issue. But, your logs indicate that there were some orphan threads due to the monitoring, and we can see it from excerpts of your logs like the one I posted above. Without the monitoring, the same problem is being triggered, though, but apparently in a different way and it is not clear why. You can see it from all the "Channel eof" messages on the log. To solve this issue, we need to understand the following: # What's causing those IOExceptions? # Why are we even starting a new connection if there is no leader election going on? Do you folks have any idea if there is anything in your environment that could be causing those TCP connections to break? > QuorumCnxManager$SendWorker grows without bounds > > > Key: ZOOKEEPER-880 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.2.2 >Reporter: Jean-Daniel Cryans >Priority: Critical > Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, > hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, > TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz > > > We're seeing an issue where one server in the ensemble has a steady growing > number of QuorumCnxManager$SendWorker threads up to a point where the OS runs > out of native threads, and at the same time we see a lot of exceptions in the > logs. This is on 3.2.2 and our config looks like: > {noformat} > tickTime=3000 > dataDir=/somewhere_thats_not_tmp > clientPort=2181 > initLimit=10 > syncLimit=5 > server.0=sv4borg9:2888:3888 > server.1=sv4borg10:2888:3888 > server.2=sv4borg11:2888:3888 > server.3=sv4borg12:2888:3888 > server.4=sv4borg13:2888:3888 > {noformat} > The issue is on the first server. I'm going to attach threads dumps and logs > in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933918#action_12933918 ] Hadoop QA commented on ZOOKEEPER-836: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12460034/ZOOKEEPER-836.patch against trunk revision 1036967. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/38//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/38//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/38//console This message is automatically generated. > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-836: -- Status: Open (was: Patch Available) > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-836: -- Status: Patch Available (was: Open) > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-836: -- Attachment: (was: ZOOKEEPER-836.patch) > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-836: -- Attachment: ZOOKEEPER-836.patch now also added apache license headers ... > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933903#action_12933903 ] Hadoop QA commented on ZOOKEEPER-836: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12460027/ZOOKEEPER-836.patch against trunk revision 1036071. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 8 new Findbugs warnings. -1 release audit. The applied patch generated 26 release audit warnings (more than the trunk's current 24 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/37//testReport/ Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/37//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/37//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/37//console This message is automatically generated. > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933897#action_12933897 ] Benoit Sigoure commented on ZOOKEEPER-880: -- Do we agree that monitoring wasn't causing the issue? As JD said, even after we stopped it, the problem re-occurred. > QuorumCnxManager$SendWorker grows without bounds > > > Key: ZOOKEEPER-880 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.2.2 >Reporter: Jean-Daniel Cryans >Priority: Critical > Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, > hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, > TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz > > > We're seeing an issue where one server in the ensemble has a steady growing > number of QuorumCnxManager$SendWorker threads up to a point where the OS runs > out of native threads, and at the same time we see a lot of exceptions in the > logs. This is on 3.2.2 and our config looks like: > {noformat} > tickTime=3000 > dataDir=/somewhere_thats_not_tmp > clientPort=2181 > initLimit=10 > syncLimit=5 > server.0=sv4borg9:2888:3888 > server.1=sv4borg10:2888:3888 > server.2=sv4borg11:2888:3888 > server.3=sv4borg12:2888:3888 > server.4=sv4borg13:2888:3888 > {noformat} > The issue is on the first server. I'm going to attach threads dumps and logs > in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933891#action_12933891 ] Patrick Hunt commented on ZOOKEEPER-880: Flavio (and others) we should update the docs to include details on which ports can/should be monitored, and which ports should NOT be monitored (or if monitoring is supported any conditions). Can we update the docs as part of any patch/fix? Thanks. > QuorumCnxManager$SendWorker grows without bounds > > > Key: ZOOKEEPER-880 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.2.2 >Reporter: Jean-Daniel Cryans >Priority: Critical > Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, > hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, > TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz > > > We're seeing an issue where one server in the ensemble has a steady growing > number of QuorumCnxManager$SendWorker threads up to a point where the OS runs > out of native threads, and at the same time we see a lot of exceptions in the > logs. This is on 3.2.2 and our config looks like: > {noformat} > tickTime=3000 > dataDir=/somewhere_thats_not_tmp > clientPort=2181 > initLimit=10 > syncLimit=5 > server.0=sv4borg9:2888:3888 > server.1=sv4borg10:2888:3888 > server.2=sv4borg11:2888:3888 > server.3=sv4borg12:2888:3888 > server.4=sv4borg13:2888:3888 > {noformat} > The issue is on the first server. I'm going to attach threads dumps and logs > in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-836: -- Status: Patch Available (was: Open) > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-836: -- Attachment: ZOOKEEPER-836.patch forgot to add a file in the last patch > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-836: -- Status: Open (was: Patch Available) > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-836: -- Attachment: (was: ZOOKEEPER-836.patch) > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933882#action_12933882 ] Hadoop QA commented on ZOOKEEPER-836: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12460025/ZOOKEEPER-836.patch against trunk revision 1036071. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/36//testReport/ Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/36//console This message is automatically generated. > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-895) ClientCnxn.authInfo must be thread safe
[ https://issues.apache.org/jira/browse/ZOOKEEPER-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch resolved ZOOKEEPER-895. --- Resolution: Fixed Has been solved as part of ZOOKEEPER-909 by using CopyOnWriteArraySet. > ClientCnxn.authInfo must be thread safe > --- > > Key: ZOOKEEPER-895 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-895 > Project: Zookeeper > Issue Type: Bug >Reporter: Thomas Koch > > authInfo can be accessed concurrently by different Threads, as exercised in > org.apache.zookeeper.test.ACLTest > The two concurrent access points in this case were (presumably): > org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:805) > and > org.apache.zookeeper.ClientCnxn.addAuthInfo(ClientCnxn.java:1121) > The line numbers refer to the latest patch in ZOOKEEPER-823. > The exception that pointed to this issue: > [junit] 2010-10-13 09:35:55,113 [myid:] - WARN > [main-SendThread(localhost:11221):clientcnxn$sendthr...@713] - Session 0x0 > for server localhost/127.0.0.1:11221, unexpected error, closing socket > connection and attempting reconnect > [junit] java.util.ConcurrentModificationException > [junit] at > java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) > [junit] at java.util.AbstractList$Itr.next(AbstractList.java:343) > [junit] at > org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:805) > [junit] at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:247) > [junit] at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:694) > Proposed solution: Use a thread save list for authInfo -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-836: -- Status: Patch Available (was: Open) > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-836: -- Attachment: ZOOKEEPER-836.patch > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-836) hostlist as string
[ https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch reassigned ZOOKEEPER-836: - Assignee: Thomas Koch > hostlist as string > -- > > Key: ZOOKEEPER-836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836 > Project: Zookeeper > Issue Type: Sub-task > Components: java client >Affects Versions: 3.3.1 >Reporter: Patrick Datko >Assignee: Thomas Koch > Attachments: ZOOKEEPER-836.patch > > > The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of > not doing (too much) work in a ctor. Instead the ClientCnxn should receive an > object of class "HostSet". HostSet could then be instantiated e.g. with a > comma separated string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933871#action_12933871 ] Jean-Daniel Cryans commented on ZOOKEEPER-880: -- Nagios is monitoring all 5 ensemble members the exact same way (checking connectivity on all 3 ports), although only 1 machine shows the issue. We tried stopping the monitoring on the problematic machine, but still got a growing number of threads. > QuorumCnxManager$SendWorker grows without bounds > > > Key: ZOOKEEPER-880 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.2.2 >Reporter: Jean-Daniel Cryans >Priority: Critical > Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, > hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, > TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz > > > We're seeing an issue where one server in the ensemble has a steady growing > number of QuorumCnxManager$SendWorker threads up to a point where the OS runs > out of native threads, and at the same time we see a lot of exceptions in the > logs. This is on 3.2.2 and our config looks like: > {noformat} > tickTime=3000 > dataDir=/somewhere_thats_not_tmp > clientPort=2181 > initLimit=10 > syncLimit=5 > server.0=sv4borg9:2888:3888 > server.1=sv4borg10:2888:3888 > server.2=sv4borg11:2888:3888 > server.3=sv4borg12:2888:3888 > server.4=sv4borg13:2888:3888 > {noformat} > The issue is on the first server. I'm going to attach threads dumps and logs > in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933842#action_12933842 ] Vishal K commented on ZOOKEEPER-880: Leader has the same problem as well. LearnerHandler expects a QuorumPacket to be received as the first packet after connection. However, if Nagios was monitoring server port as well, then one would expect to see a lot of such messages: LOG.error("First packet " + qp.toString() + " is not FOLLOWERINFO or OBSERVERINFO!"); Is Nagios not monitoring the server port? > QuorumCnxManager$SendWorker grows without bounds > > > Key: ZOOKEEPER-880 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.2.2 >Reporter: Jean-Daniel Cryans >Priority: Critical > Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, > hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, > TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz > > > We're seeing an issue where one server in the ensemble has a steady growing > number of QuorumCnxManager$SendWorker threads up to a point where the OS runs > out of native threads, and at the same time we see a lot of exceptions in the > logs. This is on 3.2.2 and our config looks like: > {noformat} > tickTime=3000 > dataDir=/somewhere_thats_not_tmp > clientPort=2181 > initLimit=10 > syncLimit=5 > server.0=sv4borg9:2888:3888 > server.1=sv4borg10:2888:3888 > server.2=sv4borg11:2888:3888 > server.3=sv4borg12:2888:3888 > server.4=sv4borg13:2888:3888 > {noformat} > The issue is on the first server. I'm going to attach threads dumps and logs > in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID
[ https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933838#action_12933838 ] Vishal K commented on ZOOKEEPER-934: Nagios might be just sending some 8 byte information to QCM and QCM will accept that as a ID and start thread that connection. If we have the above check, we will run into this scenario only if nagios sends OBSERVER_ID or a valid server ID. As a first step it might be a good solution to : 1. reject if (sid != OBSERVER_ID && !self.viewContains(sid) 2. interrupt SendWorker When RecvWorker exits 3. Incorporate a sloution for ZOOKEEPER-933. Note with this solution in place, Nagios will also have to generate the correct role/peertype string in addition to ID. 4. Kill SendWorker and RecvWorker iff leader election has been completed and we have no notifications to send. In general, this cannot be solved without some form of authentication. Essentially, these are forms of DoS attacks. Another quick solution could be to introduce a "cluster password" (or a cluster identifier string)- We can store this password in zoo.cfg file. A peer can include hash of this password in outgoing messages or use the f(password,serverid) as a key to hmac outgoing packets. This of course is not secure. However, it is good enough to prevent QCM from considering port scanners as ZK servers. > Add sanity check for server ID > -- > > Key: ZOOKEEPER-934 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934 > Project: Zookeeper > Issue Type: Sub-task >Reporter: Vishal K > Fix For: 3.4.0 > > > 2. Should I add a check to reject connections from peers that are not > listed in the configuration file? Currently, we are not doing any > sanity check for server IDs. I think this might fix ZOOKEEPER-851. > The fix is simple. However, I am not sure if anyone in community > is relying on this ability. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933829#action_12933829 ] Vishal K commented on ZOOKEEPER-880: Hi Flavio, You are right. We can see RecvWorker leaving but no messages from SendWorker. 2010-09-27 16:02:59,111 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:595) 2010-09-27 16:02:59,162 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:595) 2010-09-27 16:03:14,269 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:595) I thought that RecvWorker in 3.3.1 called sw.finish() before exiting. Adding this call in RecvWorker should fix this problem. -Vishal > QuorumCnxManager$SendWorker grows without bounds > > > Key: ZOOKEEPER-880 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.2.2 >Reporter: Jean-Daniel Cryans >Priority: Critical > Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, > hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, > TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz > > > We're seeing an issue where one server in the ensemble has a steady growing > number of QuorumCnxManager$SendWorker threads up to a point where the OS runs > out of native threads, and at the same time we see a lot of exceptions in the > logs. This is on 3.2.2 and our config looks like: > {noformat} > tickTime=3000 > dataDir=/somewhere_thats_not_tmp > clientPort=2181 > initLimit=10 > syncLimit=5 > server.0=sv4borg9:2888:3888 > server.1=sv4borg10:2888:3888 > server.2=sv4borg11:2888:3888 > server.3=sv4borg12:2888:3888 > server.4=sv4borg13:2888:3888 > {noformat} > The issue is on the first server. I'm going to attach threads dumps and logs > in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botond Hejj updated ZOOKEEPER-896: -- Attachment: ZOOKEEPER-896.patch Updated the patch as requested. Now trunk was used > Improve C client to support dynamic authentication schemes > -- > > Key: ZOOKEEPER-896 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896 > Project: Zookeeper > Issue Type: Improvement > Components: c client >Affects Versions: 3.3.1 >Reporter: Botond Hejj >Assignee: Botond Hejj > Fix For: 3.4.0 > > Attachments: ZOOKEEPER-896.patch, ZOOKEEPER-896.patch > > > When we started exploring zookeeper for our requirements we found the > authentication mechanism is not flexible enough. > We want to use kerberos for authentication but using the current API we ran > into a few problems. The idea is that we get a kerberos token on the client > side and than send that token to the server with a kerberos scheme. A server > side authentication plugin can use that token to authenticate the client and > also use the token for authorization. > We ran into two problems with this approach: > 1. A different kerberos token is needed for each different server that client > can connect to since kerberos uses mutual authentication. That means when the > client acquires this kerberos token it has to know which server it connects > to and generate the token according to that. The client currently can't > generate a token for a specific server. The token stored in the auth_info is > used for all the servers. > 2. The kerberos token might have an expiry time so if the client loses the > connection to the server and than it tries to reconnect it should acquire a > new token. That is not possible currently since the token is stored in > auth_info and reused for every connection. > The problem can be solved if we allow the client to register a callback for > authentication instead a static token. This can be a callback with an > argument which passes the current host string. The zookeeper client code > could call this callback before it sends the authentication info to the > server to get a fresh server specific token. > This would solve our problem with the kerberos authentication and also could > be used for other more dynamic authentication schemes. > The solution could be generalization also for the java client as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID
[ https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933719#action_12933719 ] Flavio Junqueira commented on ZOOKEEPER-934: One more comment. Looking at the logs for ZOOKEEPER-880, I remembered that in their case the RecvWorker thread was able to read a valid id from the connection with a Nagios server. I'm not exactly sure how that happened, but that essentially tells that the simple check you proposed might not do it. We don't want a Nagios box impersonating a ZooKeeper server! :-) > Add sanity check for server ID > -- > > Key: ZOOKEEPER-934 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934 > Project: Zookeeper > Issue Type: Sub-task >Reporter: Vishal K > Fix For: 3.4.0 > > > 2. Should I add a check to reject connections from peers that are not > listed in the configuration file? Currently, we are not doing any > sanity check for server IDs. I think this might fix ZOOKEEPER-851. > The fix is simple. However, I am not sure if anyone in community > is relying on this ability. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933718#action_12933718 ] Flavio Junqueira commented on ZOOKEEPER-880: One problem here is that we had some discussions over IRC and the information is not reflected here. If you have a look at the logs, you'll observe this: {noformat} 2010-09-28 10:31:22,227 DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection request /10.10.20.5:41861 2010-09-28 10:31:22,227 DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection request: 0 2010-09-28 10:31:22,227 DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager: Address of remote peer: 0 2010-09-28 10:31:22,229 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:595) {noformat} If I remember the discussion with J-D correctly, that node trying to connect is running Nagios. My conjecture at the time was that the IOException was killing the receiver thread, but not the sender thread (RecvWorker.finish() does not close its SendWorker counterpart). Your point is good, but it sounds like that the race you mention would have to be triggered continuously to cause the number of SendWorker threads to grow steadily. It sounds unlikely to me. > QuorumCnxManager$SendWorker grows without bounds > > > Key: ZOOKEEPER-880 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.2.2 >Reporter: Jean-Daniel Cryans >Priority: Critical > Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, > hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, > TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz > > > We're seeing an issue where one server in the ensemble has a steady growing > number of QuorumCnxManager$SendWorker threads up to a point where the OS runs > out of native threads, and at the same time we see a lot of exceptions in the > logs. This is on 3.2.2 and our config looks like: > {noformat} > tickTime=3000 > dataDir=/somewhere_thats_not_tmp > clientPort=2181 > initLimit=10 > syncLimit=5 > server.0=sv4borg9:2888:3888 > server.1=sv4borg10:2888:3888 > server.2=sv4borg11:2888:3888 > server.3=sv4borg12:2888:3888 > server.4=sv4borg13:2888:3888 > {noformat} > The issue is on the first server. I'm going to attach threads dumps and logs > in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.