[jira] Updated: (ZOOKEEPER-512) FLE election fails to elect leader

2009-08-26 Thread Flavio Paiva Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Paiva Junqueira updated ZOOKEEPER-512:
-

Attachment: ZOOKEEPER-512.patch

I've found a corner that I was not expecting but was an easy fix. Basically 
receiveConnection was passing an invalid server identifier to connectOne which 
was throwing a NPE and was propagating to the Listener. The listener would 
consequently die, and would stop participating in leader election. The fix is 
just not to proceed with the connection when that happens.

This patch is working fine for my test. It is important to note, though, that 
if we are overwhelming servers so much (clients are hammering the system and 
connections are failing), then there will be periods in which there will be no 
leader. The important invariant to satisfy is that the system converges to a 
live state once it stabilizes. In my tests, I observe periods with no leader 
when clients are hammering the servers with requests, but they converge to a 
leader soon after the clients stop. Of course, if we have no injected faults, 
the clients requests are executed just fine (there is always a leader). This is 
the behavior I expect to see.

At the same time, although I think it was a good idea to test such an extreme 
case, I'm still not convinced that this test is realistic. It would be great if 
we could model the cases this fault injection is trying to emulate to make sure 
they are really expected cases. 

Also, I don't see a good way of introducing a unit test for such extreme cases. 
In fact, I'm not even sure it would make sense to test only leader election 
under such extreme conditions. 

 FLE election fails to elect leader
 --

 Key: ZOOKEEPER-512
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-512
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: jst.txt, log3_debug.tar.gz, logs.tar.gz, logs2.tar.gz, 
 t5_aj.tar.gz, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, 
 ZOOKEEPER-512.patch


 I was doing some fault injection testing of 3.2.1 with ZOOKEEPER-508 patch 
 applied and noticed that after some time the ensemble failed to re-elect a 
 leader.
 See the attached log files - 5 member ensemble. typically 5 is the leader
 Notice that after 16:23:50,525 no quorum is formed, even after 20 minutes 
 elapses w/no quorum
 environment:
 I was doing fault injection testing using aspectj. The faults are injected 
 into socketchannel read/write, I throw exceptions randomly at a 1/200 ratio 
 (rand.nextFloat() = .005 = throw IOException
 You can see when a fault is injected in the log via:
 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@38] 
 - READPACKET FORCED FAIL
 vs a read/write that didn't force fail:
 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@41] 
 - READPACKET OK
 otw standard code/config (straight fle quorum with 5 members)
 also see the attached jstack trace. this is for one of the servers. Notice in 
 particular that the number of sendworkers != the number of recv workers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



zookeeper trunk build

2009-08-26 Thread Giridharan Kesavan
Zookeeper trunk build is now moved from vesta.apache.org to a 
h8.grid.sp2.yahoo.net machine

Tnx
Giri






[jira] Commented: (ZOOKEEPER-512) FLE election fails to elect leader

2009-08-26 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748018#action_12748018
 ] 

Benjamin Reed commented on ZOOKEEPER-512:
-

agreed. i think the problem is that under high load we don't have a period of 
error free operation. i think it is ok to generate errors randomly as we are 
doing, but we should have periods of error free operation so that things can 
settle down.

 FLE election fails to elect leader
 --

 Key: ZOOKEEPER-512
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-512
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: jst.txt, log3_debug.tar.gz, logs.tar.gz, logs2.tar.gz, 
 t5_aj.tar.gz, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, 
 ZOOKEEPER-512.patch


 I was doing some fault injection testing of 3.2.1 with ZOOKEEPER-508 patch 
 applied and noticed that after some time the ensemble failed to re-elect a 
 leader.
 See the attached log files - 5 member ensemble. typically 5 is the leader
 Notice that after 16:23:50,525 no quorum is formed, even after 20 minutes 
 elapses w/no quorum
 environment:
 I was doing fault injection testing using aspectj. The faults are injected 
 into socketchannel read/write, I throw exceptions randomly at a 1/200 ratio 
 (rand.nextFloat() = .005 = throw IOException
 You can see when a fault is injected in the log via:
 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@38] 
 - READPACKET FORCED FAIL
 vs a read/write that didn't force fail:
 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@41] 
 - READPACKET OK
 otw standard code/config (straight fle quorum with 5 members)
 also see the attached jstack trace. this is for one of the servers. Notice in 
 particular that the number of sendworkers != the number of recv workers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-518) DEBUG message for outstanding proposals in leader should be moved to trace.

2009-08-26 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-518:
---

  Component/s: server
Affects Version/s: 3.1.1
   3.2.0

 DEBUG message for outstanding proposals in leader should be moved to trace.
 ---

 Key: ZOOKEEPER-518
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-518
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.1.1, 3.2.0
Reporter: Mahadev konar
Assignee: Patrick Hunt
 Fix For: 3.2.1, 3.3.0


 this is the code in Leader.java 
 {code}
  if (LOG.isDebugEnabled()) {
 LOG.debug(Ack zxid: 0x + Long.toHexString(zxid));
 for (Proposal p : outstandingProposals.values()) {
 long packetZxid = p.packet.getZxid();
 LOG.debug(outstanding proposal: 0x
 + Long.toHexString(packetZxid));
 }
 LOG.debug(outstanding proposals all);
 }
 {code}
 We should move this debug to trace since it will cause really high latencies 
 in response times from zookeeper servers in case folks want to use DEBUG 
 logging for servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-518) DEBUG message for outstanding proposals in leader should be moved to trace.

2009-08-26 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-518:
---

Fix Version/s: 3.2.1
 Assignee: Patrick Hunt  (was: Mahadev konar)

 DEBUG message for outstanding proposals in leader should be moved to trace.
 ---

 Key: ZOOKEEPER-518
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-518
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.1.1, 3.2.0
Reporter: Mahadev konar
Assignee: Patrick Hunt
 Fix For: 3.2.1, 3.3.0


 this is the code in Leader.java 
 {code}
  if (LOG.isDebugEnabled()) {
 LOG.debug(Ack zxid: 0x + Long.toHexString(zxid));
 for (Proposal p : outstandingProposals.values()) {
 long packetZxid = p.packet.getZxid();
 LOG.debug(outstanding proposal: 0x
 + Long.toHexString(packetZxid));
 }
 LOG.debug(outstanding proposals all);
 }
 {code}
 We should move this debug to trace since it will cause really high latencies 
 in response times from zookeeper servers in case folks want to use DEBUG 
 logging for servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-512) FLE election fails to elect leader

2009-08-26 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748020#action_12748020
 ] 

Patrick Hunt commented on ZOOKEEPER-512:


I'm seeing 2 cases:

1) the entire quorum is unstable because clients are driving and causing many 
network (simulated) failures, in this case I agree

2) but I also see the case where the quorum is stable, but there's one server 
that's
been orphaned from the group. it is never able to reconnect, even though the 
clients
are stopped and the quorum in general is stable.

eventually 3 servers become orphaned (out of 5), in which case regardless of 
clients are running
or not the quorum will never re-form. I don't agree that this is ok.


 FLE election fails to elect leader
 --

 Key: ZOOKEEPER-512
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-512
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: jst.txt, log3_debug.tar.gz, logs.tar.gz, logs2.tar.gz, 
 t5_aj.tar.gz, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, 
 ZOOKEEPER-512.patch


 I was doing some fault injection testing of 3.2.1 with ZOOKEEPER-508 patch 
 applied and noticed that after some time the ensemble failed to re-elect a 
 leader.
 See the attached log files - 5 member ensemble. typically 5 is the leader
 Notice that after 16:23:50,525 no quorum is formed, even after 20 minutes 
 elapses w/no quorum
 environment:
 I was doing fault injection testing using aspectj. The faults are injected 
 into socketchannel read/write, I throw exceptions randomly at a 1/200 ratio 
 (rand.nextFloat() = .005 = throw IOException
 You can see when a fault is injected in the log via:
 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@38] 
 - READPACKET FORCED FAIL
 vs a read/write that didn't force fail:
 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@41] 
 - READPACKET OK
 otw standard code/config (straight fle quorum with 5 members)
 also see the attached jstack trace. this is for one of the servers. Notice in 
 particular that the number of sendworkers != the number of recv workers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-518) DEBUG message for outstanding proposals in leader should be moved to trace.

2009-08-26 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-518:
---

Status: Patch Available  (was: Open)

 DEBUG message for outstanding proposals in leader should be moved to trace.
 ---

 Key: ZOOKEEPER-518
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-518
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.0, 3.1.1
Reporter: Mahadev konar
Assignee: Patrick Hunt
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-518.patch


 this is the code in Leader.java 
 {code}
  if (LOG.isDebugEnabled()) {
 LOG.debug(Ack zxid: 0x + Long.toHexString(zxid));
 for (Proposal p : outstandingProposals.values()) {
 long packetZxid = p.packet.getZxid();
 LOG.debug(outstanding proposal: 0x
 + Long.toHexString(packetZxid));
 }
 LOG.debug(outstanding proposals all);
 }
 {code}
 We should move this debug to trace since it will cause really high latencies 
 in response times from zookeeper servers in case folks want to use DEBUG 
 logging for servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-518) DEBUG message for outstanding proposals in leader should be moved to trace.

2009-08-26 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-518:
---

Attachment: ZOOKEEPER-518.patch

this patch changes from debug to trace, that is all. I expect the qabot to fail 
(no test changed)

 DEBUG message for outstanding proposals in leader should be moved to trace.
 ---

 Key: ZOOKEEPER-518
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-518
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.1.1, 3.2.0
Reporter: Mahadev konar
Assignee: Patrick Hunt
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-518.patch


 this is the code in Leader.java 
 {code}
  if (LOG.isDebugEnabled()) {
 LOG.debug(Ack zxid: 0x + Long.toHexString(zxid));
 for (Proposal p : outstandingProposals.values()) {
 long packetZxid = p.packet.getZxid();
 LOG.debug(outstanding proposal: 0x
 + Long.toHexString(packetZxid));
 }
 LOG.debug(outstanding proposals all);
 }
 {code}
 We should move this debug to trace since it will cause really high latencies 
 in response times from zookeeper servers in case folks want to use DEBUG 
 logging for servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-512) FLE election fails to elect leader

2009-08-26 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-512:
---

Fix Version/s: (was: 3.2.1)

 FLE election fails to elect leader
 --

 Key: ZOOKEEPER-512
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-512
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.3.0

 Attachments: jst.txt, log3_debug.tar.gz, logs.tar.gz, logs2.tar.gz, 
 t5_aj.tar.gz, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, 
 ZOOKEEPER-512.patch


 I was doing some fault injection testing of 3.2.1 with ZOOKEEPER-508 patch 
 applied and noticed that after some time the ensemble failed to re-elect a 
 leader.
 See the attached log files - 5 member ensemble. typically 5 is the leader
 Notice that after 16:23:50,525 no quorum is formed, even after 20 minutes 
 elapses w/no quorum
 environment:
 I was doing fault injection testing using aspectj. The faults are injected 
 into socketchannel read/write, I throw exceptions randomly at a 1/200 ratio 
 (rand.nextFloat() = .005 = throw IOException
 You can see when a fault is injected in the log via:
 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@38] 
 - READPACKET FORCED FAIL
 vs a read/write that didn't force fail:
 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@41] 
 - READPACKET OK
 otw standard code/config (straight fle quorum with 5 members)
 also see the attached jstack trace. this is for one of the servers. Notice in 
 particular that the number of sendworkers != the number of recv workers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-518) DEBUG message for outstanding proposals in leader should be moved to trace.

2009-08-26 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748048#action_12748048
 ] 

Mahadev konar commented on ZOOKEEPER-518:
-

+1 the patch looks good... 

 DEBUG message for outstanding proposals in leader should be moved to trace.
 ---

 Key: ZOOKEEPER-518
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-518
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.1.1, 3.2.0
Reporter: Mahadev konar
Assignee: Patrick Hunt
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-518.patch


 this is the code in Leader.java 
 {code}
  if (LOG.isDebugEnabled()) {
 LOG.debug(Ack zxid: 0x + Long.toHexString(zxid));
 for (Proposal p : outstandingProposals.values()) {
 long packetZxid = p.packet.getZxid();
 LOG.debug(outstanding proposal: 0x
 + Long.toHexString(packetZxid));
 }
 LOG.debug(outstanding proposals all);
 }
 {code}
 We should move this debug to trace since it will cause really high latencies 
 in response times from zookeeper servers in case folks want to use DEBUG 
 logging for servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-518) DEBUG message for outstanding proposals in leader should be moved to trace.

2009-08-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748205#action_12748205
 ] 

Hadoop QA commented on ZOOKEEPER-518:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12417755/ZOOKEEPER-518.patch
  against trunk revision 807484.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/191/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/191/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/191/console

This message is automatically generated.

 DEBUG message for outstanding proposals in leader should be moved to trace.
 ---

 Key: ZOOKEEPER-518
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-518
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.1.1, 3.2.0
Reporter: Mahadev konar
Assignee: Patrick Hunt
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-518.patch


 this is the code in Leader.java 
 {code}
  if (LOG.isDebugEnabled()) {
 LOG.debug(Ack zxid: 0x + Long.toHexString(zxid));
 for (Proposal p : outstandingProposals.values()) {
 long packetZxid = p.packet.getZxid();
 LOG.debug(outstanding proposal: 0x
 + Long.toHexString(packetZxid));
 }
 LOG.debug(outstanding proposals all);
 }
 {code}
 We should move this debug to trace since it will cause really high latencies 
 in response times from zookeeper servers in case folks want to use DEBUG 
 logging for servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-505) testAsyncCreateClose is badly broken

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-505:


Fix Version/s: 3.3.0

 testAsyncCreateClose is badly broken
 

 Key: ZOOKEEPER-505
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-505
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Utkarsh Srivastava
Priority: Critical
 Fix For: 3.3.0

 Attachments: ZOOKEEPER-505.1.patch


 The test case testAsyncCreateClose is badly broken. I was wondering why all 
 the unit tests are passing inspite of having found so many different problems 
 with LedgerManagementProcessor. 
 There is a big try-catch block sitting in the test case that catches all 
 exception, prints their stack trace, and exits, thereby allowing the test to 
 pass. In general, unit tests shouldnt catch exceptions unless it is something 
 you are expecting that will happen.
 Another problem is that the same ControlObject is used for synchronization 
 throughout. Since we already have the problem of callbacks being called 
 multiple times (ZOOKEEPER-502), notify() on the control object is called too 
 many times, resulting in the unit test not waiting for certain callbacks.
 Thus the test never waits for the asyncOpenLedger() to finish, and hence 
 still succeeds. I believe asyncOpenLedger() has never worked right. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-126) zookeeper client close operation may block indefinitely

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-126:


Fix Version/s: 3.3.0

 zookeeper client close operation may block indefinitely
 ---

 Key: ZOOKEEPER-126
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-126
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Reporter: Patrick Hunt
 Fix For: 3.3.0

 Attachments: ZOOKEEPER-126.patch


 Moving the hang issue from ZOOKEEPER-63 to here. See 63 for background and 
 potential patch (patch_ZOOKEEPER-63.patch).
 specifically (from James): 
 I'm thinking the close() method should not wait() forever on the disconnect 
 packet, just a closeTimeout length - say a few seconds. Afterall blocking and 
 forcing a reconnect just to redeliver the disconnect packet seems a bit silly 
 - when the server has to deal with clients which just have their sockets fail 
 anyway

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-283) Add more javadocs to BookKeeper

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-283:


Fix Version/s: 3.3.0

 Add more javadocs to BookKeeper
 ---

 Key: ZOOKEEPER-283
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-283
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bookkeeper
Reporter: Flavio Paiva Junqueira
 Fix For: 3.3.0


 Add more javadoc descriptors to BookKeeper code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-121) SyncRequestProcessor is not closing log stream during shutdown

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-121:


Fix Version/s: 3.3.0

 SyncRequestProcessor is not closing log stream during shutdown
 --

 Key: ZOOKEEPER-121
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-121
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Patrick Hunt
Assignee: Mahadev konar
 Fix For: 3.3.0


 The SyncRequestProcessor is not closing log stream during shutdown. 
 See FIXMEs with this ID in ClientBase.java -- I've commented out the 
 assertion for the time being (checking for logs being deleted), as part of 
 this fix re-enable these asserts and also verify tests on a Windows system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-238) HostAuthenicationProvider should be removed

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-238:


Fix Version/s: 3.2.0
   3.2.1
 Assignee: Benjamin Reed

We fixed this in ZOOKEEPER-446.

 HostAuthenicationProvider should be removed
 ---

 Key: ZOOKEEPER-238
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-238
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Fix For: 3.2.0, 3.2.1


 I think the way the HostAuthenticationProvider is implemented could cause 
 serious performance problems if DNS is slow or broken. The problem is that we 
 need to do a reverse hostname resolution during connection establishment. I 
 suggest it be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-238) HostAuthenicationProvider should be removed

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar resolved ZOOKEEPER-238.
-

Resolution: Fixed

 HostAuthenicationProvider should be removed
 ---

 Key: ZOOKEEPER-238
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-238
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Benjamin Reed
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.2.0


 I think the way the HostAuthenticationProvider is implemented could cause 
 serious performance problems if DNS is slow or broken. The problem is that we 
 need to do a reverse hostname resolution during connection establishment. I 
 suggest it be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-86) intermittent test failure of org.apache.zookeeper.test.AsyncTest

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-86:
---

Fix Version/s: 3.3.0

 intermittent test failure of org.apache.zookeeper.test.AsyncTest
 

 Key: ZOOKEEPER-86
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-86
 Project: Zookeeper
  Issue Type: Bug
  Components: tests
 Environment: OS X and linux. It sometimes passes; but mostly seems to 
 fail on OS X each time
Reporter: james strachan
Assignee: james strachan
 Fix For: 3.3.0

 Attachments: patch_for_ZOOKEEPER-86.patch, 
 TEST-org.apache.zookeeper.test.AsyncTest.txt


 Will attach the test output in an attachment...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-418) Need nifty zookeeper browser

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-418:


Fix Version/s: 3.3.0

 Need nifty zookeeper browser
 

 Key: ZOOKEEPER-418
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-418
 Project: Zookeeper
  Issue Type: Bug
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.3.0

 Attachments: pom.xml, screenshot-1.jpg, zk-view-0.1.tgz


 It would be very nice to have a browser that would allow the state of a Zoo 
 to be examined.  Even nice would be such a utility that showed changes in 
 real time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-51) Review error handling in PrepRequestProcessor.fixupACL

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-51:
---

Fix Version/s: 3.3.0

 Review error handling in PrepRequestProcessor.fixupACL
 --

 Key: ZOOKEEPER-51
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-51
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Benjamin Reed
Priority: Minor
 Fix For: 3.3.0


 Line 409 (fixupacl method) logs error for missing authenciation 
 provider..., is this really an error? (no exception thrown as a result...) 
 should we be notifying the client in this case (might help with client side 
 debugging.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-494) zookeeper should install include headers in /usr/local/include/zookeeper

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-494:


Fix Version/s: 3.3.0

 zookeeper should install include headers in /usr/local/include/zookeeper
 

 Key: ZOOKEEPER-494
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-494
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.2.0
Reporter: Michi Mutsuzaki
Priority: Trivial
 Fix For: 3.3.0


 Hello,
 Currently all the c client header files get installed under 
 /usr/local/include/c-client-src . Ideally they should get installed in 
 /usr/local/include/zookeeper .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-10) Bad error message

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-10:
---

Fix Version/s: 3.3.0

 Bad error message
 -

 Key: ZOOKEEPER-10
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-10
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Minor
 Fix For: 3.3.0


 Moved from SourceForge to Apache.
 http://sourceforge.net/tracker/index.php?func=detailaid=1941108group_id=209147atid=1008544

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-158) Leader and followers increase cpu utilization upon loss of a follower

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-158:


Fix Version/s: 3.3.0

 Leader and followers increase cpu utilization upon loss of a follower
 -

 Key: ZOOKEEPER-158
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-158
 Project: Zookeeper
  Issue Type: Improvement
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Mahadev konar
Priority: Minor
 Fix For: 3.3.0

 Attachments: dead-follower.tar.gz


 In a set of ZooKeeper servers, when there is a leader operation and supported 
 by a quorum of servers, we have observed that cpu utilization increases 
 substantially once a follower fails or disconnects. Stu Hood provided logs 
 showing this behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-328) We need to test the code that uses DNS lookups that return multiple results to enumerate the zookeeper servers

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-328:


Fix Version/s: 3.3.0

 We need to test the code that uses DNS lookups that return multiple results 
 to enumerate the zookeeper servers
 --

 Key: ZOOKEEPER-328
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-328
 Project: Zookeeper
  Issue Type: Test
  Components: c client, java client
Reporter: Benjamin Reed
Priority: Minor
 Fix For: 3.3.0


 Our client code uses the list of addresses returned from DNS as separate 
 servers, so a zookeeper installation can have a single host name resolve to 
 the addresses of all the zookeeper servers. this allows the zookeepers to 
 change without changing the clients' configuration. the code is there, but we 
 do not have tests for that code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-463) C++ tests can't be built on Mac OS using XCode command line tools

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-463:


Fix Version/s: 3.3.0

 C++ tests can't be built on Mac OS using XCode command line tools
 -

 Key: ZOOKEEPER-463
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-463
 Project: Zookeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.2.0
 Environment: Using latest XCode 3.1.3.
 [apache-zookeeper/bin]$ ld -v
 @(#)PROGRAM:ld  PROJECT:ld64-85.2.1
Reporter: Henry Robinson
Priority: Minor
 Fix For: 3.3.0


 --wrap is an unsupported command line flag for ld on Mac OS. The cppunit 
 tests therefore won't build.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-3) syncLimit has slightly different comments in the class header, and inline with the variable.

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-3:
--

Fix Version/s: 3.3.0
  Summary:  syncLimit has slightly different comments in the class 
header, and  inline with the variable.  (was:  syncLimit has slightly 
different comments in the class header, and
 inline with the variable.)

  syncLimit has slightly different comments in the class header, and  inline 
 with the variable.
 ---

 Key: ZOOKEEPER-3
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Reporter: Benjamin Reed
Priority: Trivial
 Fix For: 3.3.0


 syncLimit as documented in QuorumPeer is documented twice with two different 
 aspects of in each instance. It should be better documented and unified. 
 (Probably remove the second instance.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-152) Improve unit tests for leader election

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-152:


Fix Version/s: 3.3.0

 Improve unit tests for leader election
 --

 Key: ZOOKEEPER-152
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-152
 Project: Zookeeper
  Issue Type: Improvement
  Components: quorum
Reporter: Flavio Paiva Junqueira
Priority: Minor
 Fix For: 3.3.0


 There are two possible tasks here:
 1- Change the algorithm tested on QuorumTest.java;
 2- Add tests for the other supported algorithms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-495) c client logs an invalid error when zookeeper_init is called with chroot

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-495:


Fix Version/s: 3.3.0

 c client logs an invalid error when zookeeper_init is called with chroot
 

 Key: ZOOKEEPER-495
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-495
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.2.0
Reporter: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.3.0

 Attachments: chroot.cc, chroot.log


 The C client logs this error message when zookeeper_init is called with 
 chroot. 
 2009-08-03 18:14:29,130:6624(0x5e66e950):zoo_er...@sub_string@730: server 
 path  does not include chroot path /chroot
 I'll attach a simple program to reproduce this.
 Thanks!
 --Michi

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-277) Define PATH_SEPARATOR

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-277:


Fix Version/s: 3.3.0

 Define PATH_SEPARATOR
 -

 Key: ZOOKEEPER-277
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-277
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client, documentation, java client, server, tests
Reporter: Nitay Joffe
Priority: Trivial
 Fix For: 3.3.0


 We should define a constant for PATH_SEPARATOR = / and use that throughout 
 the code rather than the hardcoded /. Users can be told to use this 
 constant to be safe in case of future changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-445) Potential bug in leader code

2009-08-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-445:


Fix Version/s: 3.3.0

 Potential bug in leader code
 

 Key: ZOOKEEPER-445
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-445
 Project: Zookeeper
  Issue Type: Bug
  Components: server
 Environment: Linux fortiz-desktop 2.6.27-7-generic #1 SMP Fri Oct 24 
 06:42:44 UTC 2008 i686 GNU/Linux
 java version 1.6.0_10
 Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
 Java HotSpot(TM) Client VM (build 11.0-b15, mixed mode, sharing)
Reporter: Manos Kapritsos
Priority: Minor
 Fix For: 3.3.0

   Original Estimate: 0.33h
  Remaining Estimate: 0.33h

 There is a suspicious line in server/quorum/Leader.java:226. It reads
 if (stop) {
 LOG.info(exception while shutting down acceptor:  + e);
 stop = true;
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.