[jira] Commented: (ZOOKEEPER-864) Hedwig C++ client improvements

2010-09-13 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908666#action_12908666
 ] 

Ivan Kelly commented on ZOOKEEPER-864:
--

Yes, you are absolutely right. Unless we don't install protocol.h at all and 
try to remove it from the hedwig public headers. I remember the protocol.h 
stuff was causing you problems recently so this could resolve that issue too.

 Hedwig C++ client improvements
 --

 Key: ZOOKEEPER-864
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-864
 Project: Zookeeper
  Issue Type: Improvement
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 3.4.0

 Attachments: warnings.txt, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff, 
 ZOOKEEPER-864.diff


 I changed the socket code to use boost asio. Now the client only creates one 
 thread, and all operations are non-blocking. 
 Tests are now automated, just run make check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: ZooKeeper-trunk #934

2010-09-13 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/934/

--
[...truncated 168605 lines...]
[junit] 2010-09-13 10:52:44,182 [myid:] - INFO  [main:quorumb...@195] - 
127.0.0.1:11237 is accepting client connections
[junit] 2010-09-13 10:52:44,182 [myid:] - INFO  [main:clientb...@225] - 
connecting to 127.0.0.1 11238
[junit] 2010-09-13 10:52:44,183 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11238:nioservercnxnfact...@196] - 
Accepted socket connection from /127.0.0.1:44910
[junit] 2010-09-13 10:52:44,183 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11238:nioserverc...@791] - Processing 
stat command from /127.0.0.1:44910
[junit] 2010-09-13 10:52:44,183 [myid:] - INFO  
[Thread-359:nioservercnxn$statcomm...@645] - Stat command output
[junit] 2010-09-13 10:52:44,184 [myid:] - INFO  
[Thread-359:nioserverc...@967] - Closed socket connection for client 
/127.0.0.1:44910 (no session established for client)
[junit] 2010-09-13 10:52:44,184 [myid:] - INFO  [main:quorumb...@195] - 
127.0.0.1:11238 is accepting client connections
[junit] 2010-09-13 10:52:44,184 [myid:] - INFO  [main:clientb...@225] - 
connecting to 127.0.0.1 11239
[junit] 2010-09-13 10:52:44,185 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - 
Accepted socket connection from /127.0.0.1:46017
[junit] 2010-09-13 10:52:44,185 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing 
stat command from /127.0.0.1:46017
[junit] 2010-09-13 10:52:44,201 [myid:] - INFO  
[Thread-360:nioserverc...@967] - Closed socket connection for client 
/127.0.0.1:46017 (no session established for client)
[junit] 2010-09-13 10:52:44,452 [myid:] - INFO  [main:clientb...@225] - 
connecting to 127.0.0.1 11239
[junit] 2010-09-13 10:52:44,452 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - 
Accepted socket connection from /127.0.0.1:46018
[junit] 2010-09-13 10:52:44,452 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing 
stat command from /127.0.0.1:46018
[junit] 2010-09-13 10:52:44,453 [myid:] - INFO  
[Thread-361:nioserverc...@967] - Closed socket connection for client 
/127.0.0.1:46018 (no session established for client)
[junit] 2010-09-13 10:52:44,703 [myid:] - INFO  [main:clientb...@225] - 
connecting to 127.0.0.1 11239
[junit] 2010-09-13 10:52:44,703 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - 
Accepted socket connection from /127.0.0.1:46019
[junit] 2010-09-13 10:52:44,704 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing 
stat command from /127.0.0.1:46019
[junit] 2010-09-13 10:52:44,704 [myid:] - INFO  
[Thread-362:nioserverc...@967] - Closed socket connection for client 
/127.0.0.1:46019 (no session established for client)
[junit] 2010-09-13 10:52:44,954 [myid:] - INFO  [main:clientb...@225] - 
connecting to 127.0.0.1 11239
[junit] 2010-09-13 10:52:44,955 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - 
Accepted socket connection from /127.0.0.1:46020
[junit] 2010-09-13 10:52:44,955 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing 
stat command from /127.0.0.1:46020
[junit] 2010-09-13 10:52:44,955 [myid:] - INFO  
[Thread-363:nioserverc...@967] - Closed socket connection for client 
/127.0.0.1:46020 (no session established for client)
[junit] 2010-09-13 10:52:45,206 [myid:] - INFO  [main:clientb...@225] - 
connecting to 127.0.0.1 11239
[junit] 2010-09-13 10:52:45,206 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - 
Accepted socket connection from /127.0.0.1:46021
[junit] 2010-09-13 10:52:45,206 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing 
stat command from /127.0.0.1:46021
[junit] 2010-09-13 10:52:45,207 [myid:] - INFO  
[Thread-364:nioserverc...@967] - Closed socket connection for client 
/127.0.0.1:46021 (no session established for client)
[junit] 2010-09-13 10:52:45,457 [myid:] - INFO  [main:clientb...@225] - 
connecting to 127.0.0.1 11239
[junit] 2010-09-13 10:52:45,457 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioservercnxnfact...@196] - 
Accepted socket connection from /127.0.0.1:46022
[junit] 2010-09-13 10:52:45,457 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11239:nioserverc...@791] - Processing 
stat command from /127.0.0.1:46022
[junit] 2010-09-13 10:52:45,458 [myid:] - INFO  
[Thread-365:nioservercnxn$statcomm...@645] - Stat command output
[junit] 2010-09-13 10:52:45,458 [myid:] - INFO  
[Thread-365:nioserverc...@967] - Closed socket connection for client 
/127.0.0.1:46022 (no session established for client)
[junit] 

[jira] Commented: (ZOOKEEPER-831) BookKeeper: Throttling improved for reads

2010-09-13 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908724#action_12908724
 ] 

Ivan Kelly commented on ZOOKEEPER-831:
--

submitOrdered can throw RejectedExecutionException (Im guessing rare to the 
order of if this happens the machine will die soon) or NullPointerException 
(unlikely). However there's no harm putting a try { } catch (Exception e) { 
opCounterSem.release(); } around it. the handler will never run except if the 
job is never submitted.

Also, you need a release inside the if (metadata.isClosed) . Otherwise I think 
it should be fine. As PendingAddOp should be able to handle it no matter what 
error occurs.

 BookKeeper: Throttling improved for reads
 -

 Key: ZOOKEEPER-831
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-831
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.1
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-831.patch, ZOOKEEPER-831.patch, 
 ZOOKEEPER-831.patch


 Reads and writes in BookKeeper are asymmetric: a write request writes one 
 entry, whereas a read request may read multiple requests. The current 
 implementation of throttling only counts the number of read requests instead 
 of counting the number of entries being read. Consequently, a few read 
 requests reading a large number of entries each will spawn a large number of 
 read-entry requests. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-13 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

In this patch:
* Added support for declaring new failure detectors, as Ivan suggested.
* Updated documentation to reflect this.
* Replaced all 'heartbeat' mentions in the API and tests for 'ping'.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-13 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908817#action_12908817
 ] 

Ivan Kelly commented on ZOOKEEPER-702:
--

Code looks good Ambar, good job :)

I think the factory code could be improved slightly though. Currently if you 
want a custom FD you pass in a FDname and an FDClass.
I would merge these two, so that you only pass in FDname to the factory, and 
the factory can decide whether this is a builtin or a custom. So as an example, 
FDname can be fixed, chen, phiaccrual or com.blah.foobar.MyFailureDetector. 
When the factory sees this it goes...

if (fdName.equals(fixed)) {
fdClass = FixedPingFailureDetector.class;
} else if (fdName.equals(chen)) {
fdClass = ChenFailureDetector.class;
} else if (fdName.equals(bertier)) {
fdClass = BertierFailureDetector.class;
} else if (fdName.equals(phiaccrual)) {
fdClass = PhiAccrualFailureDetector.class;
} else if (fdName.equals(sliced)) {
fdClass = SlicedPingFailureDetector.class;
} else {
fdClass = Class.forName(fdName);
}



 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-13 Thread Abmar Barros (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908876#action_12908876
 ] 

Abmar Barros commented on ZOOKEEPER-702:


Hi Ivan, thanks for the feedback.

I thought the idea was to declare an alias to a custom failure detector, so 
setting its options would be more user friendly. e.g.:

sessionsFD = myfd
sessionsFD.myfd = com.softwaresoft.zk.myFailureDetector
sessionsFD.myfd.blahblah = foobar
sessionsFD.myfd.myoption = barfoo

In this case, I need both className and fdName parameters in the factory code 
(and this is how it is implemented in the patch).
What do you think?

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-804) c unit tests failing due to assertion cptr failed

2010-09-13 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-804:
--

Attachment: ZOOKEEPER-804.patch

This patch modifies zookeeper_process() function. After dequeue_completion() is 
called, it checks if zookeeper_close has been called. If it has been called, it 
returns ZINVALIDSTATE.

--Michi

 c unit tests failing due to assertion cptr failed
 ---

 Key: ZOOKEEPER-804
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-804
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.4.0
 Environment: gcc 4.4.3, ubuntu lucid lynx, dual core laptop (intel)
Reporter: Patrick Hunt
Assignee: Michi Mutsuzaki
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-804.patch


 I'm seeing this frequently:
  [exec] Zookeeper_simpleSystem::testPing : elapsed 18006 : OK
  [exec] Zookeeper_simpleSystem::testAcl : elapsed 1022 : OK
  [exec] Zookeeper_simpleSystem::testChroot : elapsed 3145 : OK
  [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started : 
 elapsed 25687 : OK
  [exec] zktest-mt: 
 /home/phunt/dev/workspace/gitzk/src/c/src/zookeeper.c:1952: 
 zookeeper_process: Assertion `cptr' failed.
  [exec] make: *** [run-check] Aborted
  [exec] Zookeeper_simpleSystem::testHangingClient
 Mahadev can you take a look?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-804) c unit tests failing due to assertion cptr failed

2010-09-13 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-804:
--

Status: Patch Available  (was: Open)

 c unit tests failing due to assertion cptr failed
 ---

 Key: ZOOKEEPER-804
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-804
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.4.0
 Environment: gcc 4.4.3, ubuntu lucid lynx, dual core laptop (intel)
Reporter: Patrick Hunt
Assignee: Michi Mutsuzaki
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-804.patch


 I'm seeing this frequently:
  [exec] Zookeeper_simpleSystem::testPing : elapsed 18006 : OK
  [exec] Zookeeper_simpleSystem::testAcl : elapsed 1022 : OK
  [exec] Zookeeper_simpleSystem::testChroot : elapsed 3145 : OK
  [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started : 
 elapsed 25687 : OK
  [exec] zktest-mt: 
 /home/phunt/dev/workspace/gitzk/src/c/src/zookeeper.c:1952: 
 zookeeper_process: Assertion `cptr' failed.
  [exec] make: *** [run-check] Aborted
  [exec] Zookeeper_simpleSystem::testHangingClient
 Mahadev can you take a look?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-864) Hedwig C++ client improvements

2010-09-13 Thread Erwin Tam (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909002#action_12909002
 ] 

Erwin Tam commented on ZOOKEEPER-864:
-

+1
Latest patch looks good to me.  We can just approve it and push it in first.  
Thanks for the great work Ivan!

 Hedwig C++ client improvements
 --

 Key: ZOOKEEPER-864
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-864
 Project: Zookeeper
  Issue Type: Improvement
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 3.4.0

 Attachments: warnings.txt, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff, 
 ZOOKEEPER-864.diff


 I changed the socket code to use boost asio. Now the client only creates one 
 thread, and all operations are non-blocking. 
 Tests are now automated, just run make check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call

2010-09-13 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-846:
---

Attachment: ZOOKEEPER-846.patch

This patch fixes the problem by adding a check for close in queuepacket.

Note that this also moves the close=true into queuepacket (otw the closeSession 
would never be queued during a close.

I spend  1 day trying to craft a test for this, unfortunately due to the small 
size of the timing window I wasn't able to get a test that failed with this 
case. However by inspection it seems to address a clear bug. All tests are 
passing.

 zookeeper client doesn't shut down cleanly on the close call
 

 Key: ZOOKEEPER-846
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.2.2
Reporter: Ted Yu
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: rs-13.stack, ZOOKEEPER-846.patch


 Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where 
 Regionserver
 process was shutting down and seemed to hang.
 Here is the bottom of region server log:
 http://pastebin.com/YYawJ4jA
 zookeeper-3.2.2 is used.
 Here is relevant portion from jstack - I attempted to attach jstack twice in 
 my email to d...@hbase.apache.org but failed:
 DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81 
 in Object.wait() [0x43755000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0x2aaab76633c0 (a 
 org.apache.zookeeper.ClientCnxn$Packet)
 at java.lang.Object.wait(Object.java:485)
 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
 - locked 0x2aaab76633c0 (a 
 org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
 at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
 - locked 0x2aaabf5e0c30 (a org.apache.zookeeper.ZooKeeper)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
 at java.lang.Thread.run(Thread.java:619)
 main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting 
 on condition [0x413f3000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aaabf6e9150 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call

2010-09-13 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-846:
---

Status: Patch Available  (was: Open)

 zookeeper client doesn't shut down cleanly on the close call
 

 Key: ZOOKEEPER-846
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.2.2
Reporter: Ted Yu
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: rs-13.stack, ZOOKEEPER-846.patch


 Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where 
 Regionserver
 process was shutting down and seemed to hang.
 Here is the bottom of region server log:
 http://pastebin.com/YYawJ4jA
 zookeeper-3.2.2 is used.
 Here is relevant portion from jstack - I attempted to attach jstack twice in 
 my email to d...@hbase.apache.org but failed:
 DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on 
 condition [0x]
java.lang.Thread.State: RUNNABLE
 regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81 
 in Object.wait() [0x43755000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0x2aaab76633c0 (a 
 org.apache.zookeeper.ClientCnxn$Packet)
 at java.lang.Object.wait(Object.java:485)
 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
 - locked 0x2aaab76633c0 (a 
 org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
 at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
 - locked 0x2aaabf5e0c30 (a org.apache.zookeeper.ZooKeeper)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
 at java.lang.Thread.run(Thread.java:619)
 main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting 
 on condition [0x413f3000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x2aaabf6e9150 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-13 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909119#action_12909119
 ] 

Flavio Junqueira commented on ZOOKEEPER-702:


Hi Abmar, I believe my comments from Aug 18 have not been addressed. In 
particular, I still see the same javadoc issues and no test exercising the 
failure detectors with a running ensemble. I believe the default failure 
detector is naturally exercised in various tests, but it would be good to have 
tests that also exercise the other failure detectors in a running ensemble. 
Isn't it right?

There are still several references in javadocs and in the documentation to 
heartbeat. Should we replace them with ping for consistency?

I'm also seeing some tests failing, like ObserverTest and 
NioNettySuiteHammerTest, but I'm not sure if this is related to this patch. I 
will explore a little further.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.