[jira] Commented: (ZOOKEEPER-937) test -e not available on solaris /bin/sh

2010-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934066#action_12934066
 ] 

Hadoop QA commented on ZOOKEEPER-937:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12460076/zk_solaris_zkEnv.patch
  against trunk revision 1036967.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 8 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/41//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/41//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/41//console

This message is automatically generated.

> test -e not available on solaris /bin/sh
> 
>
> Key: ZOOKEEPER-937
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937
> Project: Zookeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
> Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris
>Reporter: Erik Hetzner
>Assignee: Erik Hetzner
> Fix For: 3.4.0
>
> Attachments: zk_solaris_zkEnv.patch, zk_solaris_zkEnv.patch
>
>
> test -e FILENAME is not support on /bin/sh in solaris. This is used in 
> bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh

2010-11-19 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-937:


Status: Patch Available  (was: Open)

> test -e not available on solaris /bin/sh
> 
>
> Key: ZOOKEEPER-937
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937
> Project: Zookeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.3.2, 3.3.1, 3.3.0
> Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris
>Reporter: Erik Hetzner
>Assignee: Erik Hetzner
> Fix For: 3.4.0
>
> Attachments: zk_solaris_zkEnv.patch, zk_solaris_zkEnv.patch
>
>
> test -e FILENAME is not support on /bin/sh in solaris. This is used in 
> bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh

2010-11-19 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-937:


Status: Open  (was: Patch Available)

> test -e not available on solaris /bin/sh
> 
>
> Key: ZOOKEEPER-937
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937
> Project: Zookeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.3.2, 3.3.1, 3.3.0
> Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris
>Reporter: Erik Hetzner
>Assignee: Erik Hetzner
> Fix For: 3.4.0
>
> Attachments: zk_solaris_zkEnv.patch, zk_solaris_zkEnv.patch
>
>
> test -e FILENAME is not support on /bin/sh in solaris. This is used in 
> bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh

2010-11-19 Thread Erik Hetzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hetzner updated ZOOKEEPER-937:
---

Attachment: zk_solaris_zkEnv.patch

patch with proper path.

> test -e not available on solaris /bin/sh
> 
>
> Key: ZOOKEEPER-937
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937
> Project: Zookeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
> Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris
>Reporter: Erik Hetzner
>Assignee: Erik Hetzner
> Fix For: 3.4.0
>
> Attachments: zk_solaris_zkEnv.patch, zk_solaris_zkEnv.patch
>
>
> test -e FILENAME is not support on /bin/sh in solaris. This is used in 
> bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-937) test -e not available on solaris /bin/sh

2010-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934057#action_12934057
 ] 

Hadoop QA commented on ZOOKEEPER-937:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12460061/zk_solaris_zkEnv.patch
  against trunk revision 1036967.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/40//console

This message is automatically generated.

> test -e not available on solaris /bin/sh
> 
>
> Key: ZOOKEEPER-937
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937
> Project: Zookeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
> Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris
>Reporter: Erik Hetzner
>Assignee: Erik Hetzner
> Fix For: 3.4.0
>
> Attachments: zk_solaris_zkEnv.patch
>
>
> test -e FILENAME is not support on /bin/sh in solaris. This is used in 
> bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-937) test -e not available on solaris /bin/sh

2010-11-19 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934052#action_12934052
 ] 

Mahadev konar commented on ZOOKEEPER-937:
-

+1 the patch looks good. marking it PA.

> test -e not available on solaris /bin/sh
> 
>
> Key: ZOOKEEPER-937
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937
> Project: Zookeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
> Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris
>Reporter: Erik Hetzner
>Assignee: Erik Hetzner
> Fix For: 3.4.0
>
> Attachments: zk_solaris_zkEnv.patch
>
>
> test -e FILENAME is not support on /bin/sh in solaris. This is used in 
> bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh

2010-11-19 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-937:


Status: Patch Available  (was: Open)

> test -e not available on solaris /bin/sh
> 
>
> Key: ZOOKEEPER-937
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937
> Project: Zookeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.3.2, 3.3.1, 3.3.0
> Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris
>Reporter: Erik Hetzner
>Assignee: Erik Hetzner
> Fix For: 3.4.0
>
> Attachments: zk_solaris_zkEnv.patch
>
>
> test -e FILENAME is not support on /bin/sh in solaris. This is used in 
> bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-937) test -e not available on solaris /bin/sh

2010-11-19 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar reassigned ZOOKEEPER-937:
---

Assignee: Erik Hetzner

> test -e not available on solaris /bin/sh
> 
>
> Key: ZOOKEEPER-937
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937
> Project: Zookeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
> Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris
>Reporter: Erik Hetzner
>Assignee: Erik Hetzner
> Fix For: 3.4.0
>
> Attachments: zk_solaris_zkEnv.patch
>
>
> test -e FILENAME is not support on /bin/sh in solaris. This is used in 
> bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh

2010-11-19 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-937:


Fix Version/s: 3.4.0

> test -e not available on solaris /bin/sh
> 
>
> Key: ZOOKEEPER-937
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937
> Project: Zookeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
> Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris
>Reporter: Erik Hetzner
> Fix For: 3.4.0
>
> Attachments: zk_solaris_zkEnv.patch
>
>
> test -e FILENAME is not support on /bin/sh in solaris. This is used in 
> bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Concurrent primitives library - shared lock

2010-11-19 Thread Mahadev Konar
Thanks Lin.

Thanks
mahadev


On 11/18/10 2:00 AM, "Lin Chia-Hung"  wrote:

Hi

According to the mail
(http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201005.mbox/%3c4bfd646c.2040...@apache.org%3e)
I create a jira (ZOOKEEPER-935) with a patch attached, containing
shared lock feature.

Please let me know if any issue in terms of this.

Thanks,
Chiahung



[jira] Created: (ZOOKEEPER-937) test -e not available on solaris /bin/sh

2010-11-19 Thread Erik Hetzner (JIRA)
test -e not available on solaris /bin/sh


 Key: ZOOKEEPER-937
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937
 Project: Zookeeper
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.3.2, 3.3.1, 3.3.0
 Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris

Reporter: Erik Hetzner
 Attachments: zk_solaris_zkEnv.patch

test -e FILENAME is not support on /bin/sh in solaris. This is used in 
bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-937) test -e not available on solaris /bin/sh

2010-11-19 Thread Erik Hetzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hetzner updated ZOOKEEPER-937:
---

Attachment: zk_solaris_zkEnv.patch

> test -e not available on solaris /bin/sh
> 
>
> Key: ZOOKEEPER-937
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-937
> Project: Zookeeper
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
> Environment: SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris
>Reporter: Erik Hetzner
> Attachments: zk_solaris_zkEnv.patch
>
>
> test -e FILENAME is not support on /bin/sh in solaris. This is used in 
> bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933967#action_12933967
 ] 

Hadoop QA commented on ZOOKEEPER-880:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12460041/ZOOKEEPER-880.patch
  against trunk revision 1036967.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 8 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/39//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/39//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/39//console

This message is automatically generated.

> QuorumCnxManager$SendWorker grows without bounds
> 
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch
>
>
> We're seeing an issue where one server in the ensemble has a steady growing 
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
> out of native threads, and at the same time we see a lot of exceptions in the 
> logs.  This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs 
> in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Please commit ZOOKEEPER-836

2010-11-19 Thread Thomas Koch
Hi,

Hudson complains about 8 findbugs issues in ZOOKEEPER-836, but I did not 
introduce those. They are just in a java class I happened to modify.
Could you be so kind to review and commit? Then I can continue tomorrow with 
ZOOKEEPER-849.

Best regards,

Thomas Koch, http://www.koch.ro


[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-19 Thread Vishal K (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal K updated ZOOKEEPER-880:
---

Status: Patch Available  (was: Open)

patch for trunk

> QuorumCnxManager$SendWorker grows without bounds
> 
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch
>
>
> We're seeing an issue where one server in the ensemble has a steady growing 
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
> out of native threads, and at the same time we see a lot of exceptions in the 
> logs.  This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs 
> in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-19 Thread Vishal K (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal K updated ZOOKEEPER-880:
---

Attachment: ZOOKEEPER-880.patch



The root cause of frequent disconnect needs to be resolved. In the mean time, I 
have fixed the problem that was causing the leak of every other thread of 
SendWorker.

I tested the patch by connecting to 3888 on one of the servers using telnet.

2010-11-19 14:51:10,081 - INFO  
[/10.17.119.101:3888:quorumcnxmanager$liste...@477] - Received connection 
request /10.16.251.39:2074
2010-11-19 14:51:14,364 - DEBUG 
[/10.17.119.101:3888:quorumcnxmanager$sendwor...@553] - Address of remote peer: 
8103510703875099187
2010-11-19 14:51:19,440 - WARN  [Thread-7:quorumcnxmanager$recvwor...@726] - 
Connection broken for id 8103510703875099187, my id = 1, error = 
java.io.IOException: Received packet with invalid packet: 218824692
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:711)
2010-11-19 14:51:19,441 - WARN  [Thread-7:quorumcnxmanager$recvwor...@730] - 
Interrupting SendWorker   <= SendWorker is getting killed.
2010-11-19 14:51:19,442 - DEBUG [Thread-7:quorumcnxmanager$sendwor...@571] - 
Calling finish for 8103510703875099187
2010-11-19 14:51:19,443 - DEBUG [Thread-7:quorumcnxmanager$sendwor...@591] - 
Removing entry from senderWorkerMap sid=8103510703875099187
2010-11-19 14:51:19,443 - WARN  [Thread-6:quorumcnxmanager$sendwor...@643] - 
Interrupted while waiting for message on queue
java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at 
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:631)
2010-11-19 14:51:19,456 - DEBUG [Thread-6:quorumcnxmanager$sendwor...@571] - 
Calling finish for 8103510703875099187
2010-11-19 14:51:19,457 - WARN  [Thread-6:quorumcnxmanager$sendwor...@652] - 
Send worker leaving thread

Can you see if this fixes the problem?

> QuorumCnxManager$SendWorker grows without bounds
> 
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch
>
>
> We're seeing an issue where one server in the ensemble has a steady growing 
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
> out of native threads, and at the same time we see a lot of exceptions in the 
> logs.  This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs 
> in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-19 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933928#action_12933928
 ] 

Flavio Junqueira commented on ZOOKEEPER-880:


I think we agree that monitoring alone was not causing the issue. But, your 
logs indicate that there were some orphan threads due to the monitoring, and we 
can see it from excerpts of your logs like the one I posted above. Without the 
monitoring, the same problem is being triggered, though, but apparently in a 
different way and it is not clear why. You can see it from all the "Channel 
eof" messages on the log. 

To solve this issue, we need to understand the following:

# What's causing those IOExceptions?
# Why are we even starting a new connection if there is no leader election 
going on? 

Do you folks have any idea if there is anything in your environment that could 
be causing those TCP connections to break? 

> QuorumCnxManager$SendWorker grows without bounds
> 
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz
>
>
> We're seeing an issue where one server in the ensemble has a steady growing 
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
> out of native threads, and at the same time we see a lot of exceptions in the 
> logs.  This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs 
> in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933918#action_12933918
 ] 

Hadoop QA commented on ZOOKEEPER-836:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12460034/ZOOKEEPER-836.patch
  against trunk revision 1036967.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 8 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/38//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/38//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/38//console

This message is automatically generated.

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-836:
--

Status: Open  (was: Patch Available)

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-836:
--

Status: Patch Available  (was: Open)

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-836:
--

Attachment: (was: ZOOKEEPER-836.patch)

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-836:
--

Attachment: ZOOKEEPER-836.patch

now also added apache license headers ...

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933903#action_12933903
 ] 

Hadoop QA commented on ZOOKEEPER-836:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12460027/ZOOKEEPER-836.patch
  against trunk revision 1036071.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 8 new Findbugs warnings.

-1 release audit.  The applied patch generated 26 release audit warnings 
(more than the trunk's current 24 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/37//testReport/
Release audit warnings: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/37//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/37//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/37//console

This message is automatically generated.

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-19 Thread Benoit Sigoure (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933897#action_12933897
 ] 

Benoit Sigoure commented on ZOOKEEPER-880:
--

Do we agree that monitoring wasn't causing the issue?  As JD said, even after 
we stopped it, the problem re-occurred.

> QuorumCnxManager$SendWorker grows without bounds
> 
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz
>
>
> We're seeing an issue where one server in the ensemble has a steady growing 
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
> out of native threads, and at the same time we see a lot of exceptions in the 
> logs.  This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs 
> in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-19 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933891#action_12933891
 ] 

Patrick Hunt commented on ZOOKEEPER-880:


Flavio (and others) we should update the docs to include details on which ports 
can/should be monitored, and which ports should NOT be monitored (or if 
monitoring is supported any conditions).

Can we update the docs as part of any patch/fix? Thanks.

> QuorumCnxManager$SendWorker grows without bounds
> 
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz
>
>
> We're seeing an issue where one server in the ensemble has a steady growing 
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
> out of native threads, and at the same time we see a lot of exceptions in the 
> logs.  This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs 
> in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-836:
--

Status: Patch Available  (was: Open)

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-836:
--

Attachment: ZOOKEEPER-836.patch

forgot to add a file in the last patch

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-836:
--

Status: Open  (was: Patch Available)

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-836:
--

Attachment: (was: ZOOKEEPER-836.patch)

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933882#action_12933882
 ] 

Hadoop QA commented on ZOOKEEPER-836:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12460025/ZOOKEEPER-836.patch
  against trunk revision 1036071.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/36//testReport/
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/36//console

This message is automatically generated.

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-895) ClientCnxn.authInfo must be thread safe

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch resolved ZOOKEEPER-895.
---

Resolution: Fixed

Has been solved as part of ZOOKEEPER-909 by using CopyOnWriteArraySet.

> ClientCnxn.authInfo must be thread safe
> ---
>
> Key: ZOOKEEPER-895
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-895
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Thomas Koch
>
> authInfo can be accessed concurrently by different Threads, as exercised in 
> org.apache.zookeeper.test.ACLTest
> The two concurrent access points in this case were (presumably):
> org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:805)
>  and
> org.apache.zookeeper.ClientCnxn.addAuthInfo(ClientCnxn.java:1121)
> The line numbers refer to the latest patch in ZOOKEEPER-823.
> The exception that pointed to this issue:
> [junit] 2010-10-13 09:35:55,113 [myid:] - WARN  
> [main-SendThread(localhost:11221):clientcnxn$sendthr...@713] - Session 0x0 
> for server localhost/127.0.0.1:11221, unexpected error, closing socket 
> connection and attempting reconnect
> [junit] java.util.ConcurrentModificationException
> [junit]   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
> [junit]   at java.util.AbstractList$Itr.next(AbstractList.java:343)
> [junit]   at 
> org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:805)
> [junit]   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:247)
> [junit]   at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:694)
> Proposed solution: Use a thread save list for authInfo

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-836:
--

Status: Patch Available  (was: Open)

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch updated ZOOKEEPER-836:
--

Attachment: ZOOKEEPER-836.patch

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-836) hostlist as string

2010-11-19 Thread Thomas Koch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Koch reassigned ZOOKEEPER-836:
-

Assignee: Thomas Koch

> hostlist as string
> --
>
> Key: ZOOKEEPER-836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-836
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Affects Versions: 3.3.1
>Reporter: Patrick Datko
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-836.patch
>
>
> The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of 
> not doing (too much) work in a ctor. Instead the ClientCnxn should receive an 
> object of class "HostSet". HostSet could then be instantiated e.g. with a 
> comma separated string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-19 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933871#action_12933871
 ] 

Jean-Daniel Cryans commented on ZOOKEEPER-880:
--

Nagios is monitoring all 5 ensemble members the exact same way (checking 
connectivity on all 3 ports), although only 1 machine shows the issue. We tried 
stopping the monitoring on the problematic machine, but still got a growing 
number of threads.

> QuorumCnxManager$SendWorker grows without bounds
> 
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz
>
>
> We're seeing an issue where one server in the ensemble has a steady growing 
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
> out of native threads, and at the same time we see a lot of exceptions in the 
> logs.  This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs 
> in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-19 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933842#action_12933842
 ] 

Vishal K commented on ZOOKEEPER-880:


Leader has the same problem as well. LearnerHandler expects a QuorumPacket to 
be received as the first packet after connection. However, if Nagios was 
monitoring server port as well, then one would expect to see a lot of such 
messages:
LOG.error("First packet " + qp.toString()
+ " is not FOLLOWERINFO or OBSERVERINFO!");

Is Nagios not monitoring the server port?


> QuorumCnxManager$SendWorker grows without bounds
> 
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz
>
>
> We're seeing an issue where one server in the ensemble has a steady growing 
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
> out of native threads, and at the same time we see a lot of exceptions in the 
> logs.  This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs 
> in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID

2010-11-19 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933838#action_12933838
 ] 

Vishal K commented on ZOOKEEPER-934:



Nagios might be just sending some 8 byte information to QCM and QCM will accept 
that as a ID and start thread that connection. If we have the above check, we 
will run into this scenario only if nagios sends OBSERVER_ID or a valid server 
ID.

As a first step it might be a good solution to :
1. reject if (sid != OBSERVER_ID && !self.viewContains(sid)
2. interrupt SendWorker When RecvWorker exits
3. Incorporate a sloution for ZOOKEEPER-933. Note with this solution in place, 
Nagios will also have to generate the correct role/peertype string in addition 
to ID.
4. Kill SendWorker and RecvWorker iff leader election has been completed and  
we have no notifications to send. 

In general, this cannot be solved without some form of authentication. 
Essentially, these are forms of DoS attacks.

Another quick solution could be to introduce a "cluster password" (or a cluster 
identifier string)- We can store this password in zoo.cfg file. A peer can 
include hash of this password in outgoing messages or use the 
f(password,serverid) as a key to hmac outgoing packets. This of course is not 
secure. However, it is good enough to prevent QCM from considering port 
scanners as ZK servers.

> Add sanity check for server ID
> --
>
> Key: ZOOKEEPER-934
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934
> Project: Zookeeper
>  Issue Type: Sub-task
>Reporter: Vishal K
> Fix For: 3.4.0
>
>
> 2. Should I add a check to reject connections from peers that are not
> listed in the configuration file? Currently, we are not doing any
> sanity check for server IDs. I think this might fix ZOOKEEPER-851.
> The fix is simple. However, I am not sure if anyone in community
> is relying on this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-19 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933829#action_12933829
 ] 

Vishal K commented on ZOOKEEPER-880:


Hi Flavio,

You are right.  We can see RecvWorker leaving but no messages from SendWorker.

2010-09-27 16:02:59,111 WARN 
org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection broken:
java.io.IOException: Channel eof
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:595)
2010-09-27 16:02:59,162 WARN 
org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection broken:
java.io.IOException: Channel eof
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:595)
2010-09-27 16:03:14,269 WARN 
org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection broken:
java.io.IOException: Channel eof
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:595)

I thought that RecvWorker in  3.3.1 called sw.finish() before exiting. Adding 
this call in RecvWorker should fix this problem.


-Vishal

> QuorumCnxManager$SendWorker grows without bounds
> 
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz
>
>
> We're seeing an issue where one server in the ensemble has a steady growing 
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
> out of native threads, and at the same time we see a lot of exceptions in the 
> logs.  This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs 
> in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes

2010-11-19 Thread Botond Hejj (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botond Hejj updated ZOOKEEPER-896:
--

Attachment: ZOOKEEPER-896.patch

Updated the patch as requested. Now trunk was used

> Improve C client to support dynamic authentication schemes
> --
>
> Key: ZOOKEEPER-896
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.3.1
>Reporter: Botond Hejj
>Assignee: Botond Hejj
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-896.patch, ZOOKEEPER-896.patch
>
>
> When we started exploring zookeeper for our requirements we found the 
> authentication mechanism is not flexible enough.
> We want to use kerberos for authentication but using the current API we ran 
> into a few problems. The idea is that we get a kerberos token on the client 
> side and than send that token to the server with a kerberos scheme. A server 
> side authentication plugin can use that token to authenticate the client and 
> also use the token for authorization.
> We ran into two problems with this approach:
> 1. A different kerberos token is needed for each different server that client 
> can connect to since kerberos uses mutual authentication. That means when the 
> client acquires this kerberos token it has to know which server it connects 
> to and generate the token according to that. The client currently can't 
> generate a token for a specific server. The token stored in the auth_info is 
> used for all the servers.
> 2. The kerberos token might have an expiry time so if the client loses the 
> connection to the server and than it tries to reconnect it should acquire a 
> new token. That is not possible currently since the token is stored in 
> auth_info and reused for every connection.
> The problem can be solved if we allow the client to register a callback for 
> authentication instead a static token. This can be a callback with an 
> argument which passes the current host string. The zookeeper client code 
> could call this callback before it sends the authentication info to the 
> server to get a fresh server specific token.
> This would solve our problem with the kerberos authentication and also could 
> be used for other more dynamic authentication schemes.
> The solution could be generalization also for the java client as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID

2010-11-19 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933719#action_12933719
 ] 

Flavio Junqueira commented on ZOOKEEPER-934:


One more comment. Looking at the logs for ZOOKEEPER-880, I remembered that in 
their case the RecvWorker thread was able to read a valid id from the 
connection with a Nagios server. I'm not exactly sure how that happened, but 
that essentially tells that the simple check you proposed might not do it. We 
don't want a Nagios box impersonating a ZooKeeper server! :-)

> Add sanity check for server ID
> --
>
> Key: ZOOKEEPER-934
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934
> Project: Zookeeper
>  Issue Type: Sub-task
>Reporter: Vishal K
> Fix For: 3.4.0
>
>
> 2. Should I add a check to reject connections from peers that are not
> listed in the configuration file? Currently, we are not doing any
> sanity check for server IDs. I think this might fix ZOOKEEPER-851.
> The fix is simple. However, I am not sure if anyone in community
> is relying on this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-19 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933718#action_12933718
 ] 

Flavio Junqueira commented on ZOOKEEPER-880:


One problem here is that we had some discussions over IRC and the information 
is not reflected here. 

If you have a look at the logs, you'll observe this:

{noformat}

2010-09-28 10:31:22,227 DEBUG 
org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection request 
/10.10.20.5:41861
2010-09-28 10:31:22,227 DEBUG 
org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection request: 0
2010-09-28 10:31:22,227 DEBUG 
org.apache.zookeeper.server.quorum.QuorumCnxManager: Address of remote peer: 0
2010-09-28 10:31:22,229 WARN 
org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection broken:
java.io.IOException: Channel eof
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:595)
{noformat}

If I remember the discussion with J-D correctly, that node trying to connect is 
running Nagios. My conjecture at the time was that the IOException was killing 
the receiver thread, but not the sender thread (RecvWorker.finish() does not 
close its SendWorker counterpart).

Your point is good, but it sounds like that the race you mention would have to 
be triggered continuously to cause the number of SendWorker threads to grow 
steadily. It sounds unlikely to me.

> QuorumCnxManager$SendWorker grows without bounds
> 
>
> Key: ZOOKEEPER-880
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
> hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
> TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz
>
>
> We're seeing an issue where one server in the ensemble has a steady growing 
> number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
> out of native threads, and at the same time we see a lot of exceptions in the 
> logs.  This is on 3.2.2 and our config looks like:
> {noformat}
> tickTime=3000
> dataDir=/somewhere_thats_not_tmp
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=sv4borg9:2888:3888
> server.1=sv4borg10:2888:3888
> server.2=sv4borg11:2888:3888
> server.3=sv4borg12:2888:3888
> server.4=sv4borg13:2888:3888
> {noformat}
> The issue is on the first server. I'm going to attach threads dumps and logs 
> in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.