Build failed in Hudson: ZooKeeper-trunk #808

2010-04-29 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/808/changes

Changes:

[henry] ZOOKEEPER-749 (OSGi metadata not included in binary only jar) and 
ZOOKEEPER-750 (move maven artifacts into dist-maven subdir of the release 
(package target))

--
[...truncated 129393 lines...]
[junit] 2010-04-29 10:44:09,085 - INFO  [main:quorumb...@286] - Shutting 
down quorum peer QuorumPeer:/0:0:0:0:0:0:0:0:11225
[junit] 2010-04-29 10:44:09,085 - INFO  [main:follo...@166] - shutdown 
called
[junit] java.lang.Exception: shutdown Follower
[junit] at 
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
[junit] at 
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:679)
[junit] at 
org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:287)
[junit] at 
org.apache.zookeeper.test.ZkDatabaseCorruptionTest.testCorruption(ZkDatabaseCorruptionTest.java:127)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
[junit] at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:51)
[junit] at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
[junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
[junit] at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
[junit] at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
[junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
[junit] at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
[junit] at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
[junit] at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
[junit] at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit] 2010-04-29 10:44:09,085 - INFO  [main:finalrequestproces...@378] - 
shutdown of request processor complete
[junit] 2010-04-29 10:44:09,085 - INFO  
[FollowerRequestProcessor:3:followerrequestproces...@93] - 
FollowerRequestProcessor exited loop!
[junit] 2010-04-29 10:44:09,085 - INFO  
[CommitProcessor:3:commitproces...@148] - CommitProcessor exited loop!
[junit] 2010-04-29 10:44:09,086 - INFO  
[SyncThread:3:syncrequestproces...@151] - SyncRequestProcessor exited!
[junit] 2010-04-29 10:44:09,087 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11225:nioservercnxn$fact...@278] - 
NIOServerCnxn factory exited run method
[junit] 2010-04-29 10:44:09,088 - WARN  
[Thread-95:quorumcnxmanager$sendwor...@581] - Interrupted while waiting for 
message on queue
[junit] java.lang.InterruptedException
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
[junit] at 
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
[junit] at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570)
[junit] 2010-04-29 10:44:09,088 - WARN  
[Thread-101:quorumcnxmanager$sendwor...@581] - Interrupted while waiting for 
message on queue
[junit] java.lang.InterruptedException
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
[junit] at 

FYI Apache Avro considering .NET support.

2010-04-29 Thread Patrick Hunt
For those windows based ZooKeeper developers, Avro is looking at .NET 
support http://bit.ly/9nThxD seems like it would be a great project (and 
we are planning to add Avro marshaling support at some point).


Patrick


[jira] Created: (ZOOKEEPER-759) Stop accepting connections when close to file descriptor limit

2010-04-29 Thread Travis Crawford (JIRA)
Stop accepting connections when close to file descriptor limit
--

 Key: ZOOKEEPER-759
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-759
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Travis Crawford


Zookeeper always tries to accept new connections, throwing an exception if out 
of file descriptors. An improvement would be denying new client connections 
when close to the limit.

Additionally, file-descriptor limits+usage should be exported to the monitoring 
four-letter word, should that get implemented (see ZOOKEEPER-744).


DETAILS

A Zookeeper ensemble I administer recently suffered an outage when one node was 
restarted with the low system-default ulimit of 1024 file descriptors and later 
ran out. File descriptor usage+max are already being monitored by the following 
MBeans:

- java.lang.OperatingSystem.MaxFileDescriptorCount
- java.lang.OperatingSystem.OpenFileDescriptorCount

They're described (rather tersely) at:

http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html

This feature request is for the following:

(a) Stop accepting new connections when OpenFileDescriptorCount is close to 
MaxFileDescriptorCount, defaulting to 95% FD usage. New connections should be 
denied, logged to disk at debug level, and increment a 
``ConnectionDeniedCount`` MBean counter.

(b) Begin accepting new connections when usage drops below some configurable 
threshold, defaulting to 90% of FD usage, basically the high/low watermark 
model.

(c) Update the administrators guide with a comment about using an appropriate 
FD limit.

(d) Extra credit: if ZOOKEEPER-744 is implemented export statistics for:

zookeeper_open_file_descriptor_count
zookeeper_max_file_descriptor_count
zookeeper_max_file_descriptor_mismatch - boolean, exported by leader, if not 
all zk's have the same max FD value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-759) Stop accepting connections when close to file descriptor limit

2010-04-29 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862296#action_12862296
 ] 

Patrick Hunt commented on ZOOKEEPER-759:


You allude to it but to be specific we should make the thresholds configurable 
(zoo.cfg) with sensible defaults.


 Stop accepting connections when close to file descriptor limit
 --

 Key: ZOOKEEPER-759
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-759
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Travis Crawford

 Zookeeper always tries to accept new connections, throwing an exception if 
 out of file descriptors. An improvement would be denying new client 
 connections when close to the limit.
 Additionally, file-descriptor limits+usage should be exported to the 
 monitoring four-letter word, should that get implemented (see ZOOKEEPER-744).
 DETAILS
 A Zookeeper ensemble I administer recently suffered an outage when one node 
 was restarted with the low system-default ulimit of 1024 file descriptors and 
 later ran out. File descriptor usage+max are already being monitored by the 
 following MBeans:
 - java.lang.OperatingSystem.MaxFileDescriptorCount
 - java.lang.OperatingSystem.OpenFileDescriptorCount
 They're described (rather tersely) at:
 http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html
 This feature request is for the following:
 (a) Stop accepting new connections when OpenFileDescriptorCount is close to 
 MaxFileDescriptorCount, defaulting to 95% FD usage. New connections should be 
 denied, logged to disk at debug level, and increment a 
 ``ConnectionDeniedCount`` MBean counter.
 (b) Begin accepting new connections when usage drops below some configurable 
 threshold, defaulting to 90% of FD usage, basically the high/low watermark 
 model.
 (c) Update the administrators guide with a comment about using an appropriate 
 FD limit.
 (d) Extra credit: if ZOOKEEPER-744 is implemented export statistics for:
 zookeeper_open_file_descriptor_count
 zookeeper_max_file_descriptor_count
 zookeeper_max_file_descriptor_mismatch - boolean, exported by leader, if not 
 all zk's have the same max FD value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-759) Stop accepting connections when close to file descriptor limit

2010-04-29 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862301#action_12862301
 ] 

Patrick Hunt commented on ZOOKEEPER-759:


Allowing the user to override entirely seems like a good feature (config) as 
well. I always like to support this with new features that have potentially 
significant downside, just in case we overlooked something during 
concept/design/dev.

 Stop accepting connections when close to file descriptor limit
 --

 Key: ZOOKEEPER-759
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-759
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Travis Crawford

 Zookeeper always tries to accept new connections, throwing an exception if 
 out of file descriptors. An improvement would be denying new client 
 connections when close to the limit.
 Additionally, file-descriptor limits+usage should be exported to the 
 monitoring four-letter word, should that get implemented (see ZOOKEEPER-744).
 DETAILS
 A Zookeeper ensemble I administer recently suffered an outage when one node 
 was restarted with the low system-default ulimit of 1024 file descriptors and 
 later ran out. File descriptor usage+max are already being monitored by the 
 following MBeans:
 - java.lang.OperatingSystem.MaxFileDescriptorCount
 - java.lang.OperatingSystem.OpenFileDescriptorCount
 They're described (rather tersely) at:
 http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html
 This feature request is for the following:
 (a) Stop accepting new connections when OpenFileDescriptorCount is close to 
 MaxFileDescriptorCount, defaulting to 95% FD usage. New connections should be 
 denied, logged to disk at debug level, and increment a 
 ``ConnectionDeniedCount`` MBean counter.
 (b) Begin accepting new connections when usage drops below some configurable 
 threshold, defaulting to 90% of FD usage, basically the high/low watermark 
 model.
 (c) Update the administrators guide with a comment about using an appropriate 
 FD limit.
 (d) Extra credit: if ZOOKEEPER-744 is implemented export statistics for:
 zookeeper_open_file_descriptor_count
 zookeeper_max_file_descriptor_count
 zookeeper_max_file_descriptor_mismatch - boolean, exported by leader, if not 
 all zk's have the same max FD value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-759) Stop accepting connections when close to file descriptor limit

2010-04-29 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862303#action_12862303
 ] 

Ted Dunning commented on ZOOKEEPER-759:
---


This is a unix specific bean so don't forget to defang the test if the bean 
isn't available.



 Stop accepting connections when close to file descriptor limit
 --

 Key: ZOOKEEPER-759
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-759
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Travis Crawford

 Zookeeper always tries to accept new connections, throwing an exception if 
 out of file descriptors. An improvement would be denying new client 
 connections when close to the limit.
 Additionally, file-descriptor limits+usage should be exported to the 
 monitoring four-letter word, should that get implemented (see ZOOKEEPER-744).
 DETAILS
 A Zookeeper ensemble I administer recently suffered an outage when one node 
 was restarted with the low system-default ulimit of 1024 file descriptors and 
 later ran out. File descriptor usage+max are already being monitored by the 
 following MBeans:
 - java.lang.OperatingSystem.MaxFileDescriptorCount
 - java.lang.OperatingSystem.OpenFileDescriptorCount
 They're described (rather tersely) at:
 http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html
 This feature request is for the following:
 (a) Stop accepting new connections when OpenFileDescriptorCount is close to 
 MaxFileDescriptorCount, defaulting to 95% FD usage. New connections should be 
 denied, logged to disk at debug level, and increment a 
 ``ConnectionDeniedCount`` MBean counter.
 (b) Begin accepting new connections when usage drops below some configurable 
 threshold, defaulting to 90% of FD usage, basically the high/low watermark 
 model.
 (c) Update the administrators guide with a comment about using an appropriate 
 FD limit.
 (d) Extra credit: if ZOOKEEPER-744 is implemented export statistics for:
 zookeeper_open_file_descriptor_count
 zookeeper_max_file_descriptor_count
 zookeeper_max_file_descriptor_mismatch - boolean, exported by leader, if not 
 all zk's have the same max FD value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-759) Stop accepting connections when close to file descriptor limit

2010-04-29 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862307#action_12862307
 ] 

Mahadev konar commented on ZOOKEEPER-759:
-

was just goint to comment on similar lines ted! yeah we should make sure this 
case is taken care of... 

 Stop accepting connections when close to file descriptor limit
 --

 Key: ZOOKEEPER-759
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-759
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Travis Crawford

 Zookeeper always tries to accept new connections, throwing an exception if 
 out of file descriptors. An improvement would be denying new client 
 connections when close to the limit.
 Additionally, file-descriptor limits+usage should be exported to the 
 monitoring four-letter word, should that get implemented (see ZOOKEEPER-744).
 DETAILS
 A Zookeeper ensemble I administer recently suffered an outage when one node 
 was restarted with the low system-default ulimit of 1024 file descriptors and 
 later ran out. File descriptor usage+max are already being monitored by the 
 following MBeans:
 - java.lang.OperatingSystem.MaxFileDescriptorCount
 - java.lang.OperatingSystem.OpenFileDescriptorCount
 They're described (rather tersely) at:
 http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html
 This feature request is for the following:
 (a) Stop accepting new connections when OpenFileDescriptorCount is close to 
 MaxFileDescriptorCount, defaulting to 95% FD usage. New connections should be 
 denied, logged to disk at debug level, and increment a 
 ``ConnectionDeniedCount`` MBean counter.
 (b) Begin accepting new connections when usage drops below some configurable 
 threshold, defaulting to 90% of FD usage, basically the high/low watermark 
 model.
 (c) Update the administrators guide with a comment about using an appropriate 
 FD limit.
 (d) Extra credit: if ZOOKEEPER-744 is implemented export statistics for:
 zookeeper_open_file_descriptor_count
 zookeeper_max_file_descriptor_count
 zookeeper_max_file_descriptor_mismatch - boolean, exported by leader, if not 
 all zk's have the same max FD value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-759) Stop accepting connections when close to file descriptor limit

2010-04-29 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862320#action_12862320
 ] 

Patrick Hunt commented on ZOOKEEPER-759:


Is there anything comparable on the windows jvm, or no options at all on that 
platform?

What about alternative (non sun) jvms? Guess we treat those similar to 
windows... we should think about it when implementing though.


 Stop accepting connections when close to file descriptor limit
 --

 Key: ZOOKEEPER-759
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-759
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Travis Crawford

 Zookeeper always tries to accept new connections, throwing an exception if 
 out of file descriptors. An improvement would be denying new client 
 connections when close to the limit.
 Additionally, file-descriptor limits+usage should be exported to the 
 monitoring four-letter word, should that get implemented (see ZOOKEEPER-744).
 DETAILS
 A Zookeeper ensemble I administer recently suffered an outage when one node 
 was restarted with the low system-default ulimit of 1024 file descriptors and 
 later ran out. File descriptor usage+max are already being monitored by the 
 following MBeans:
 - java.lang.OperatingSystem.MaxFileDescriptorCount
 - java.lang.OperatingSystem.OpenFileDescriptorCount
 They're described (rather tersely) at:
 http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html
 This feature request is for the following:
 (a) Stop accepting new connections when OpenFileDescriptorCount is close to 
 MaxFileDescriptorCount, defaulting to 95% FD usage. New connections should be 
 denied, logged to disk at debug level, and increment a 
 ``ConnectionDeniedCount`` MBean counter.
 (b) Begin accepting new connections when usage drops below some configurable 
 threshold, defaulting to 90% of FD usage, basically the high/low watermark 
 model.
 (c) Update the administrators guide with a comment about using an appropriate 
 FD limit.
 (d) Extra credit: if ZOOKEEPER-744 is implemented export statistics for:
 zookeeper_open_file_descriptor_count
 zookeeper_max_file_descriptor_count
 zookeeper_max_file_descriptor_mismatch - boolean, exported by leader, if not 
 all zk's have the same max FD value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-759) Stop accepting connections when close to file descriptor limit

2010-04-29 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-759:
---

Fix Version/s: 3.4.0

Would be great to see this in 3.4.0. It's a very good project for someone 
interested in getting their feet wet with ZK development and the Apache process.


 Stop accepting connections when close to file descriptor limit
 --

 Key: ZOOKEEPER-759
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-759
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Travis Crawford
 Fix For: 3.4.0


 Zookeeper always tries to accept new connections, throwing an exception if 
 out of file descriptors. An improvement would be denying new client 
 connections when close to the limit.
 Additionally, file-descriptor limits+usage should be exported to the 
 monitoring four-letter word, should that get implemented (see ZOOKEEPER-744).
 DETAILS
 A Zookeeper ensemble I administer recently suffered an outage when one node 
 was restarted with the low system-default ulimit of 1024 file descriptors and 
 later ran out. File descriptor usage+max are already being monitored by the 
 following MBeans:
 - java.lang.OperatingSystem.MaxFileDescriptorCount
 - java.lang.OperatingSystem.OpenFileDescriptorCount
 They're described (rather tersely) at:
 http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html
 This feature request is for the following:
 (a) Stop accepting new connections when OpenFileDescriptorCount is close to 
 MaxFileDescriptorCount, defaulting to 95% FD usage. New connections should be 
 denied, logged to disk at debug level, and increment a 
 ``ConnectionDeniedCount`` MBean counter.
 (b) Begin accepting new connections when usage drops below some configurable 
 threshold, defaulting to 90% of FD usage, basically the high/low watermark 
 model.
 (c) Update the administrators guide with a comment about using an appropriate 
 FD limit.
 (d) Extra credit: if ZOOKEEPER-744 is implemented export statistics for:
 zookeeper_open_file_descriptor_count
 zookeeper_max_file_descriptor_count
 zookeeper_max_file_descriptor_mismatch - boolean, exported by leader, if not 
 all zk's have the same max FD value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-760) Improved string encoding and decoding performance

2010-04-29 Thread Patrick Hunt (JIRA)
Improved string encoding and decoding performance
-

 Key: ZOOKEEPER-760
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-760
 Project: Zookeeper
  Issue Type: Improvement
  Components: java client, server
Reporter: Patrick Hunt
 Fix For: 3.4.0


Our marshaling code converts strings to utf8 bytes, this can be optimized, see: 

https://issues.apache.org/jira/browse/AVRO-532


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Interesting Avro/Thrift JIRA re utf8 performance

2010-04-29 Thread Patrick Hunt
We also suffer from Java's poor string conversion performance (ser/deser 
of znode paths in the marshalling code), we should take advantage of 
this as well:


https://issues.apache.org/jira/browse/AVRO-532

I did perform a micro-benchmark of string encoding and decoding, which 
showed the new methods take about half as long


https://issues.apache.org/jira/browse/ZOOKEEPER-760

Patrick


[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-690:
-

Attachment: ZOOKEEPER-690.patch

I have found what I hope is the problem.

Because QuorumPeers duplicate their 'LearnerType' in two places there's the 
possibility that they may get out of sync. This is what was happening here - it 
was a test bug. Although the Observers knew that they were Observers, the other 
nodes did not. This affected the leader election protocol as other node did not 
know to reject an Observer.

I feel like we should refactor the QuorumPeer.QuorumServer code so as not to 
duplicate information, but for the time being I think this patch will work. 

I have also taken the opportunity to standardise the naming of 'learnertype' 
throughout the code (in some places it was called 'peertype' adding to the 
confusion).

Tests pass on my machine, but I can't guarantee that the problem is fixed as I 
could never recreate the error.

Thanks to Flavio for catching the broken invariant!

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862351#action_12862351
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Alan - can you try this patch to see if it fixes things? 

Thanks, 

Henry


 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-690:
-

Status: Patch Available  (was: Open)

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-737) some 4 letter words may fail with netcat (nc)

2010-04-29 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862393#action_12862393
 ] 

Benjamin Reed commented on ZOOKEEPER-737:
-

-1 we have a problem for the cases when a thread isn't spawned to handle the 
command. in those cases the send could block which would block the NIO thread.

 some 4 letter words may fail with netcat (nc)
 -

 Key: ZOOKEEPER-737
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-737
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.0
Reporter: Patrick Hunt
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, 
 ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, 
 ZOOKEEPER-737.patch


 nc closes the write channel as soon as it's sent it's information, for 
 example echo stat|nc localhost 2181
 in general this is fine, however the server code will close the socket as 
 soon as it receives notice that nc has
 closed it's write channel. if not all the 4 letter word result has been 
 written back to the client yet, this will cause
 some or all of the result to be lost - ie the client will not see the full 
 result. this was introduced in 3.3.0 as part
 of a change to reduce blocking of the selector by long running 4letter words.
 here's an example of the logs from the server during this
 echo -n stat | nc localhost 2181
 2010-04-09 21:55:36,124 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@251] - 
 Accepted socket connection from /127.0.0.1:42179
 2010-04-09 21:55:36,124 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@968] - Processing 
 stat command from /127.0.0.1:42179
 2010-04-09 21:55:36,125 - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@606] - 
 EndOfStreamException: Unable to read additional data from client sessionid 
 0x0, likely client has closed socket
 2010-04-09 21:55:36,125 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1286] - Closed 
 socket connection for client /127.0.0.1:42179 (no session established for 
 client)
 [ph...@gsbl90850 zookeeper-3.3.0]$ 2010-04-09 21:55:36,126 - ERROR 
 [Thread-15:nioserverc...@422] - Unexpected Exception: 
 java.nio.channels.CancelledKeyException
   at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
   at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
   at 
 org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:907)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945)
   at java.io.BufferedWriter.flush(BufferedWriter.java:236)
   at java.io.PrintWriter.flush(PrintWriter.java:276)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089)
 2010-04-09 21:55:36,126 - ERROR [Thread-15:nioservercnxn$factor...@82] - 
 Thread Thread[Thread-15,5,main] died
 java.nio.channels.CancelledKeyException
   at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
   at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.wakeup(NIOServerCnxn.java:927)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:909)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945)
   at java.io.BufferedWriter.flush(BufferedWriter.java:236)
   at java.io.PrintWriter.flush(PrintWriter.java:276)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Alan Cabrera (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Cabrera updated ZOOKEEPER-690:
---

Attachment: TEST-org.apache.zookeeper.test.AsyncHammerTest.txt

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Alan Cabrera (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862398#action_12862398
 ] 

Alan Cabrera commented on ZOOKEEPER-690:


Test does not lock up but it does not pass.

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-737) some 4 letter words may fail with netcat (nc)

2010-04-29 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862401#action_12862401
 ] 

Mahadev konar commented on ZOOKEEPER-737:
-

very good point ben.. 

 some 4 letter words may fail with netcat (nc)
 -

 Key: ZOOKEEPER-737
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-737
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.0
Reporter: Patrick Hunt
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, 
 ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, 
 ZOOKEEPER-737.patch


 nc closes the write channel as soon as it's sent it's information, for 
 example echo stat|nc localhost 2181
 in general this is fine, however the server code will close the socket as 
 soon as it receives notice that nc has
 closed it's write channel. if not all the 4 letter word result has been 
 written back to the client yet, this will cause
 some or all of the result to be lost - ie the client will not see the full 
 result. this was introduced in 3.3.0 as part
 of a change to reduce blocking of the selector by long running 4letter words.
 here's an example of the logs from the server during this
 echo -n stat | nc localhost 2181
 2010-04-09 21:55:36,124 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@251] - 
 Accepted socket connection from /127.0.0.1:42179
 2010-04-09 21:55:36,124 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@968] - Processing 
 stat command from /127.0.0.1:42179
 2010-04-09 21:55:36,125 - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@606] - 
 EndOfStreamException: Unable to read additional data from client sessionid 
 0x0, likely client has closed socket
 2010-04-09 21:55:36,125 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1286] - Closed 
 socket connection for client /127.0.0.1:42179 (no session established for 
 client)
 [ph...@gsbl90850 zookeeper-3.3.0]$ 2010-04-09 21:55:36,126 - ERROR 
 [Thread-15:nioserverc...@422] - Unexpected Exception: 
 java.nio.channels.CancelledKeyException
   at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
   at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
   at 
 org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:907)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945)
   at java.io.BufferedWriter.flush(BufferedWriter.java:236)
   at java.io.PrintWriter.flush(PrintWriter.java:276)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089)
 2010-04-09 21:55:36,126 - ERROR [Thread-15:nioservercnxn$factor...@82] - 
 Thread Thread[Thread-15,5,main] died
 java.nio.channels.CancelledKeyException
   at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
   at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.wakeup(NIOServerCnxn.java:927)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:909)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945)
   at java.io.BufferedWriter.flush(BufferedWriter.java:236)
   at java.io.PrintWriter.flush(PrintWriter.java:276)
   at 
 org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Alan Cabrera (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Cabrera updated ZOOKEEPER-690:
---

Attachment: (was: TEST-org.apache.zookeeper.test.AsyncHammerTest.txt)

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Alan Cabrera (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Cabrera updated ZOOKEEPER-690:
---

Attachment: nohup-201004291409.txt
jstack-201004291409.txt

Disregard the last comment.

I have now cleaned and rebuilt and now it still locks up.

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
 nohup-201004201053.txt, nohup-201004291409.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862414#action_12862414
 ] 

Benjamin Reed commented on ZOOKEEPER-690:
-

i think the key fix is here:

{quote}
public void setLearnerType(LearnerType p) {
   learnerType = p;
   if (quorumPeers.containsValue(this.myid)) {
   this.quorumPeers.get(myid).type = p;
   } else {
   LOG.error(Setting LearnerType to  + p +  but  + myid 
   +  not in QuorumPeers. );
   }
   ...
}
{quote}

right?

the problem i see is that we are only updating the quorumPeers for the one 
peer.  the other peers are going to be thinking it is a participant.

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
 nohup-201004201053.txt, nohup-201004291409.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862424#action_12862424
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

This map is, I think, shared between the quorumpeers for the purposes of the 
test (and in general there aren't two quorumpeers sharing this datastructure 
when running normally). 

But! The error here is that I'm dumb (and that Java's type-checking leaves a 
little to be desired). I've written quorumPeers.containsValue up there, but 
actually it should be quorumPeers.containsKey. New patch on the way, let's see 
if that fixes it.

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
 nohup-201004201053.txt, nohup-201004291409.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-690:
-

Attachment: ZOOKEEPER-690.patch

Alan - would you mind trying this new patch? Thanks for your patience. I 
suspect that something might still be a bit flaky with these tests (not the 
code, but the tests), but I hope this will fix this particular problem. 

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
 nohup-201004201053.txt, nohup-201004291409.txt, 
 TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
 ZOOKEEPER-690.patch, ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-761) Remove *synchronous* calls from the *single-threaded* C clieant API, since they are documented not to work

2010-04-29 Thread Jozef Hatala (JIRA)
Remove *synchronous* calls from the *single-threaded* C clieant API, since they 
are documented not to work
--

 Key: ZOOKEEPER-761
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-761
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.2, 3.1.1
 Environment: RHEL 4u8 (Linux).  The issue is not OS-specific though.
Reporter: Jozef Hatala
Priority: Minor


Since the synchronous calls are 
[http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client|known]
 to be unimplemented in the single threaded version of the client library 
libzookeeper_st.so, I believe that it would be helpful towards users of the 
library if that information was also obvious from the header file.

Anecdotally more than one of us here made the mistake of starting by using the 
synchronous calls with the single-threaded library, and we found ourselves 
debugging it.  An early warning would have been greatly appreciated.

1. Could you please add warnings to the doxygen blocks of all synchronous calls 
saying that they are not available in the single-threaded API.  This cannot be 
safely done with {{#ifdef THREADED}}, obviously, because the same header file 
is included whichever client library implementation one is compiling for.

2. Could you please bracket the implementation of all synchronous calls in 
zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols are 
not present in libzookeeper_st.so?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-761) Remove *synchronous* calls from the *single-threaded* C clieant API, since they are documented not to work

2010-04-29 Thread Jozef Hatala (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jozef Hatala updated ZOOKEEPER-761:
---

Description: 
Since the synchronous calls are 
[known|http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client]
 to be unimplemented in the single threaded version of the client library 
libzookeeper_st.so, I believe that it would be helpful towards users of the 
library if that information was also obvious from the header file.

Anecdotally more than one of us here made the mistake of starting by using the 
synchronous calls with the single-threaded library, and we found ourselves 
debugging it.  An early warning would have been greatly appreciated.

1. Could you please add warnings to the doxygen blocks of all synchronous calls 
saying that they are not available in the single-threaded API.  This cannot be 
safely done with {{#ifdef THREADED}}, obviously, because the same header file 
is included whichever client library implementation one is compiling for.

2. Could you please bracket the implementation of all synchronous calls in 
zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols are 
not present in libzookeeper_st.so?

  was:
Since the synchronous calls are 
[http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client|known]
 to be unimplemented in the single threaded version of the client library 
libzookeeper_st.so, I believe that it would be helpful towards users of the 
library if that information was also obvious from the header file.

Anecdotally more than one of us here made the mistake of starting by using the 
synchronous calls with the single-threaded library, and we found ourselves 
debugging it.  An early warning would have been greatly appreciated.

1. Could you please add warnings to the doxygen blocks of all synchronous calls 
saying that they are not available in the single-threaded API.  This cannot be 
safely done with {{#ifdef THREADED}}, obviously, because the same header file 
is included whichever client library implementation one is compiling for.

2. Could you please bracket the implementation of all synchronous calls in 
zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols are 
not present in libzookeeper_st.so?


 Remove *synchronous* calls from the *single-threaded* C clieant API, since 
 they are documented not to work
 --

 Key: ZOOKEEPER-761
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-761
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.1.1, 3.2.2
 Environment: RHEL 4u8 (Linux).  The issue is not OS-specific though.
Reporter: Jozef Hatala
Priority: Minor

 Since the synchronous calls are 
 [known|http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client]
  to be unimplemented in the single threaded version of the client library 
 libzookeeper_st.so, I believe that it would be helpful towards users of the 
 library if that information was also obvious from the header file.
 Anecdotally more than one of us here made the mistake of starting by using 
 the synchronous calls with the single-threaded library, and we found 
 ourselves debugging it.  An early warning would have been greatly appreciated.
 1. Could you please add warnings to the doxygen blocks of all synchronous 
 calls saying that they are not available in the single-threaded API.  This 
 cannot be safely done with {{#ifdef THREADED}}, obviously, because the same 
 header file is included whichever client library implementation one is 
 compiling for.
 2. Could you please bracket the implementation of all synchronous calls in 
 zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols 
 are not present in libzookeeper_st.so?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-701) GSoC 2010: Web-based Administrative Interface

2010-04-29 Thread Savu Andrei (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Savu Andrei reassigned ZOOKEEPER-701:
-

Assignee: Savu Andrei

 GSoC 2010: Web-based Administrative Interface
 -

 Key: ZOOKEEPER-701
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-701
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Savu Andrei

 Web-based Administrative Interface
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Modern web platform - e.g. Django. Some design or UI skills would help. Java 
 for adding methods to ZooKeeper.
 Description
 ZooKeeper is a complex distributed system. Understanding how well it is 
 running is tremendously important. Patrick Hunt has created a Django-based 
 dashboard (see http://github.com/phunt/zookeeper_dashboard#readme) that 
 allows some insight into how ZooKeeper is running. This is a great foundation 
 on which to build; however there are improvements that could be made! This 
 project would capture much more information from ZooKeeper, adding hooks to 
 retrieve it where necessary and visualise it in a appealing and useful way. 
 Integration with Ganglia would be a definite plus.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-701) GSoC 2010: Monitoring Recipes and Web-based Administrative Interface

2010-04-29 Thread Savu Andrei (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Savu Andrei updated ZOOKEEPER-701:
--

Attachment: milestones.txt

In the first phase of the project I will focus my attention on solving 
monitoring related issues and identifying important health signals. In the 
second phase I will focus exclusively on the web application.  

 GSoC 2010: Monitoring Recipes and Web-based Administrative Interface
 

 Key: ZOOKEEPER-701
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-701
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Savu Andrei
 Attachments: milestones.txt


 Monitoring Recipes And Web-based Administrative Interface
 Mentor: Patrick Hunt (ph...@apache.org)
 Requirements:
 Modern web platform - e.g. Django. Some design or UI skills would help. Java 
 for adding methods to ZooKeeper.
 Description:
 ZooKeeper is a complex distributed system. Understanding how well it is 
 running is tremendously important. Patrick Hunt has created a Django-based 
 dashboard (see http://github.com/phunt/zookeeper_dashboard) that allows some 
 insight into how ZooKeeper is running. This is a great foundation on which to 
 build; however there are improvements that could be made! This project would 
 capture much more information from ZooKeeper, adding hooks to retrieve it 
 where necessary and visualise it in a appealing and useful way. Integration 
 with Ganglia would be a definite plus.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862480#action_12862480
 ] 

Benjamin Reed commented on ZOOKEEPER-690:
-

henry, i think this may show that we can't really have a setLearnerType() 
method. In the real distributed setting, each peer will have its own list, so 
we should really think of the peers list as immutable.

 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
 jstack-201004291527.txt, nohup-201004201053.txt, nohup-201004291409.txt, 
 nohup-201004291527.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, 
 zoo.log, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862482#action_12862482
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Ben - 

Agreed. I see this as the same as setMyid(...) - it sets an immutable value and 
should only be called once. I'd prefer if these parameters were 'final' in 
QuorumPeer and set in the constructor, but that's not the way that 
runFromConfig (the only place outside of tests that these methods are called) 
is written. Then we could get rid of setLearnerType, for sure. 

The real error here, I think, is duplicating the learnertype between QuorumPeer 
and QuorumServer. If we are going to have the list of QuorumServers, then 
getLearnerType should lookup the learner type in the peer map. Same for the 
serverid, perhaps, and we should just save a reference to the QuorumServer that 
represents our Quorumpeer. 


 AsyncTestHammer test fails on hudson.
 -

 Key: ZOOKEEPER-690
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
 Project: Zookeeper
  Issue Type: Bug
Reporter: Mahadev konar
Assignee: Henry Robinson
Priority: Blocker
 Fix For: 3.3.1, 3.4.0

 Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
 jstack-201004291527.txt, nohup-201004201053.txt, nohup-201004291409.txt, 
 nohup-201004291527.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, 
 zoo.log, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch


 the hudson test failed on 
 http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
  There are huge set of cancelledkeyexceptions in the logs. Still going 
 through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.