Question on quorum behavior

2010-05-06 Thread Sergey Doroshenko
In short: it seems leader can treat observers as quorum members.

Steps to repro:

1. I have a following ensemble configuration:
# servers list
server.1=localhost:2881:3881
server.2=localhost:2882:3882
server.3=localhost:2883:3883:observer
server.4=localhost:2884:3884
server.5=localhost:2885:3885:observer

2. I'm bringing up servers 1,2,3 and it's enough for quorum (1 and 2).
3. I'm shutting down the one from the quorum who is the follower.

As I understand, expected result is that leader will start a new election
round so that to regain quorum.
But the real situation is that it just says goodbye to that follower, and is
still operable. (When I'm shutting down 3rd one -- observer -- leader starts
trying to regain a quorum).

Is this a bug, or a feature?


-- 
Regards, Sergey


Re: Question on quorum behavior

2010-05-06 Thread Henry Robinson
Sergey -

Sounds like a bug. Can you open a new JIRA and attach your log files to it?

Thanks,
Henry

On 6 May 2010 07:50, Sergey Doroshenko dors...@gmail.com wrote:

 In short: it seems leader can treat observers as quorum members.

 Steps to repro:

 1. I have a following ensemble configuration:
 # servers list
 server.1=localhost:2881:3881
 server.2=localhost:2882:3882
 server.3=localhost:2883:3883:observer
 server.4=localhost:2884:3884
 server.5=localhost:2885:3885:observer

 2. I'm bringing up servers 1,2,3 and it's enough for quorum (1 and 2).
 3. I'm shutting down the one from the quorum who is the follower.

 As I understand, expected result is that leader will start a new election
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and
 is
 still operable. (When I'm shutting down 3rd one -- observer -- leader
 starts
 trying to regain a quorum).

 Is this a bug, or a feature?


 --
 Regards, Sergey




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


[jira] Created: (ZOOKEEPER-768) zkpython segfault (assertion error in io thread)

2010-05-06 Thread Kapil Thangavelu (JIRA)
zkpython segfault (assertion error in io thread)


 Key: ZOOKEEPER-768
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-768
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.4.0
 Environment: ubuntu lucid (10.04), zookeeper trunk (java/c/zkpython)
Reporter: Kapil Thangavelu
 Attachments: zkpython-segfault-stack-traces.txt, zkpython-segfault.py

While trying to create a test case showing slow average add_auth, i stumbled 
upon a test case that reliably segfaults for me, albeit with variable amount of 
iterations (anwhere from 2 to 20). fwiw, I've got about 220 processes in my 
test environment (ubuntu lucid 10.04). The test case opens a connection, adds 
authentication to it, and closes the connection, in a loop. I'm including the 
sample program and the gdb stack traces from the core file. I can upload the 
core file if thats helpful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-768) zkpython segfault (assertion error in io thread)

2010-05-06 Thread Kapil Thangavelu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Thangavelu updated ZOOKEEPER-768:
---

Attachment: zkpython-segfault-client-log.txt

client log with debug logging on.

 zkpython segfault (assertion error in io thread)
 

 Key: ZOOKEEPER-768
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-768
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.4.0
 Environment: ubuntu lucid (10.04), zookeeper trunk (java/c/zkpython)
Reporter: Kapil Thangavelu
 Attachments: zkpython-segfault-client-log.txt, 
 zkpython-segfault-stack-traces.txt, zkpython-segfault.py


 While trying to create a test case showing slow average add_auth, i stumbled 
 upon a test case that reliably segfaults for me, albeit with variable amount 
 of iterations (anwhere from 2 to 20). fwiw, I've got about 220 processes in 
 my test environment (ubuntu lucid 10.04). The test case opens a connection, 
 adds authentication to it, and closes the connection, in a loop. I'm 
 including the sample program and the gdb stack traces from the core file. I 
 can upload the core file if thats helpful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-768) zkpython segfault on close (assertion error in io thread)

2010-05-06 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864849#action_12864849
 ] 

Henry Robinson commented on ZOOKEEPER-768:
--

Thanks Kapil - I'll take a look. From the stack trace it looks as though a 
pending completion callback is null and therefore something weird is going on 
with a completion dispatcher being freed before it is finished being used. As 
per usual I can't reproduce on my machine, but this is enough information to 
dig into it. 

 zkpython segfault on close (assertion error in io thread)
 -

 Key: ZOOKEEPER-768
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-768
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.4.0
 Environment: ubuntu lucid (10.04), zookeeper trunk (java/c/zkpython)
Reporter: Kapil Thangavelu
 Attachments: zkpython-segfault-client-log.txt, 
 zkpython-segfault-stack-traces.txt, zkpython-segfault.py


 While trying to create a test case showing slow average add_auth, i stumbled 
 upon a test case that reliably segfaults for me, albeit with variable amount 
 of iterations (anwhere from 0 to 20 typically). fwiw, I've got about 220 
 processes in my test environment (ubuntu lucid 10.04). The test case opens a 
 connection, adds authentication to it, and closes the connection, in a loop. 
 I'm including the sample program and the gdb stack traces from the core file. 
 I can upload the core file if thats helpful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [GSoC 2010] Zookeeper Read-Only Mode

2010-05-06 Thread Sergey Doroshenko
I created a wiki page that outlines how I'm thinking to implement read-only
mode:
http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode

This feature can change both client library and server side, and before
implementing it, it would be great to hear some feedback from the community.

I'd like to ask ZK developers what do they think about current approach, and
ask ZK users to also share their thoughts, use cases where read-only mode
would be beneficial and how current approach fits to them and so on.

P.S.
For zookeeper-user@ subscribers who didn't previously see this email: I'm
Sergey Doroshenko, accepted GSoC-2010 applicant, and I'll be implementing
read-only mode for Zookeeper.
Any feedback is greatly appreciated and would be really helpful.

On Mon, Apr 19, 2010 at 5:51 PM, Vishal K vishalm...@gmail.com wrote:

 Hi Sergey,

 This is a very useful feature. We would be happy to have it on our cluster.

 One small suggestion - It will be nice if you could document things along
 the way while going through the ZK code or testing ZK behavior.
 Since it will cover one of the core logic of ZK it will helpful to other
 contributors as well.

 Thanks.

 Regards,
 -Vishal

 On Fri, Apr 16, 2010 at 6:43 PM, Sergey Doroshenko dors...@gmail.com
 wrote:

  Hi,
  I'm Sergey Doroshenko, GSoC applicant, and I've submitted application for
  enabling read-only mode in Zookeeper.
 
  I worked with Zookeeper during my internship this winter, and liked it
  much.
  Now I'm very eager to contribute to it.
 
  Task for enabling read-only mode in ZK (
  https://issues.apache.org/jira/browse/ZOOKEEPER-704) looks interesting
  and,
  as we discussed with Henry, has quite practical importance.
  Please check my proposal here:
  http://docs.google.com/View?id=dghqvqdd_51ffvhcsdb ,
  and let me know if you have some thoughts or suggestions about it.
 
  Thanks!
 
  --
  Regards, Sergey
 




-- 
Regards, Sergey


[jira] Created: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-06 Thread Sergey Doroshenko (JIRA)
Leader can treat observers as quorum members


 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
 Fix For: 3.3.0


In short: it seems leader can treat observers as quorum members.

Steps to repro:

1. Server configuration: 3 voters, 2 observers (attached).
2. Bring up 2 voters and one observer. It's enough for quorum.
3. Shut down the one from the quorum who is the follower.

As I understand, expected result is that leader will start a new election round 
so that to regain quorum.
But the real situation is that it just says goodbye to that follower, and is 
still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
trying to regain a quorum).

(Expectedly, if on step 3 we shut down the leader, not the follower, remaining 
follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-06 Thread Sergey Doroshenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Doroshenko updated ZOOKEEPER-769:


Attachment: zoo1.cfg

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
 Fix For: 3.3.0

 Attachments: zoo1.cfg


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-06 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-763:
---

Attachment: (was: ZOOKEEPER-763_3_3_1.patch)

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Fix For: 3.3.1, 3.4.0

 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
 ZOOKEEPER-763.patch, ZOOKEEPER-763.patch


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-06 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864871#action_12864871
 ] 

Patrick Hunt commented on ZOOKEEPER-763:


For some reason I got confused on the 3.3 branch (may not have been up to 
date), the main patch applies to both just fine. Fixed this in svn.

 Deadlock on close w/ zkpython / c client
 

 Key: ZOOKEEPER-763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.3.0
 Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
Reporter: Kapil Thangavelu
Assignee: Henry Robinson
 Fix For: 3.3.1, 3.4.0

 Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
 ZOOKEEPER-763.patch, ZOOKEEPER-763.patch


 deadlocks occur if we attempt to close a handle while there are any 
 outstanding async requests (aget, acreate, etc). Normally on close both the 
 io thread terminates and the completion thread are terminated and joined, 
 however w\ith outstanding async requests, the completion thread won't be in a 
 joinable state, and we effectively hang when the main thread does the join.
 afaics ideal behavior would be on close of a handle, to effectively clear out 
 any remaining callbacks and let the completion thread terminate.
 i've tried adding some bookkeeping to within a python client to guard against 
 closing while there is an outstanding async completion request, but its an 
 imperfect solution since even after the python callback is executed there is 
 still a window for deadlock before the completion thread finishes the 
 callback.
 a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-768) zkpython segfault on close (assertion error in io thread)

2010-05-06 Thread Kapil Thangavelu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864873#action_12864873
 ] 

Kapil Thangavelu commented on ZOOKEEPER-768:


i've uploaded the core file here 
http://kapilt.com/files/zkpython-segfault-on-close-core.bz2

a little more poking around in gdb, shows the packet to be a ping one.

(gdb) print hdr
$8 = {xid = -2, zxid = 181, err = 0}


 zkpython segfault on close (assertion error in io thread)
 -

 Key: ZOOKEEPER-768
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-768
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.4.0
 Environment: ubuntu lucid (10.04), zookeeper trunk (java/c/zkpython)
Reporter: Kapil Thangavelu
 Attachments: zkpython-segfault-client-log.txt, 
 zkpython-segfault-stack-traces.txt, zkpython-segfault.py


 While trying to create a test case showing slow average add_auth, i stumbled 
 upon a test case that reliably segfaults for me, albeit with variable amount 
 of iterations (anwhere from 0 to 20 typically). fwiw, I've got about 220 
 processes in my test environment (ubuntu lucid 10.04). The test case opens a 
 connection, adds authentication to it, and closes the connection, in a loop. 
 I'm including the sample program and the gdb stack traces from the core file. 
 I can upload the core file if thats helpful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-768) zkpython segfault on close (assertion error in io thread)

2010-05-06 Thread Kapil Thangavelu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Thangavelu updated ZOOKEEPER-768:
---

Attachment: zkpython-segfault-on-close-core.bz2

 Compressed the core file is small enough to just attach to the ticket. 

 zkpython segfault on close (assertion error in io thread)
 -

 Key: ZOOKEEPER-768
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-768
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Affects Versions: 3.4.0
 Environment: ubuntu lucid (10.04), zookeeper trunk (java/c/zkpython)
Reporter: Kapil Thangavelu
 Attachments: zkpython-segfault-client-log.txt, 
 zkpython-segfault-on-close-core.bz2, zkpython-segfault-stack-traces.txt, 
 zkpython-segfault.py


 While trying to create a test case showing slow average add_auth, i stumbled 
 upon a test case that reliably segfaults for me, albeit with variable amount 
 of iterations (anwhere from 0 to 20 typically). fwiw, I've got about 220 
 processes in my test environment (ubuntu lucid 10.04). The test case opens a 
 connection, adds authentication to it, and closes the connection, in a loop. 
 I'm including the sample program and the gdb stack traces from the core file. 
 I can upload the core file if thats helpful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-06 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864878#action_12864878
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Hi Sergey - 

Can you attach the logs from (at least) the leader node to this ticket? I'd 
like to figure this one out asap.

cheers,
Henry

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
 Fix For: 3.3.0

 Attachments: zoo1.cfg


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2010-05-06 Thread Sam Baskinger (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Baskinger updated ZOOKEEPER-767:


Attachment: (was: Lock.java.patch)

 Submitting Demo/Recipe Shared / Exclusive Lock Code
 ---

 Key: ZOOKEEPER-767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
 Project: Zookeeper
  Issue Type: Improvement
  Components: recipes
Affects Versions: 3.3.0
Reporter: Sam Baskinger
Assignee: Sam Baskinger
Priority: Minor
 Fix For: 3.4.0


 Networked Insights would like to share-back some code for shared/exclusive 
 locking that we are using in our labs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2010-05-06 Thread Sam Baskinger (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Baskinger updated ZOOKEEPER-767:


Attachment: ZOOKEEPER-767.patch

Unit tests and new SharedExclusiveLock recipe implementation.

 Submitting Demo/Recipe Shared / Exclusive Lock Code
 ---

 Key: ZOOKEEPER-767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
 Project: Zookeeper
  Issue Type: Improvement
  Components: recipes
Affects Versions: 3.3.0
Reporter: Sam Baskinger
Assignee: Sam Baskinger
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-767.patch


 Networked Insights would like to share-back some code for shared/exclusive 
 locking that we are using in our labs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2010-05-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864890#action_12864890
 ] 

Hadoop QA commented on ZOOKEEPER-767:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12443889/ZOOKEEPER-767.patch
  against trunk revision 941521.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 26 release audit warnings 
(more than the trunk's current 24 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/85/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/85/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/85/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/85/console

This message is automatically generated.

 Submitting Demo/Recipe Shared / Exclusive Lock Code
 ---

 Key: ZOOKEEPER-767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
 Project: Zookeeper
  Issue Type: Improvement
  Components: recipes
Affects Versions: 3.3.0
Reporter: Sam Baskinger
Assignee: Sam Baskinger
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-767.patch


 Networked Insights would like to share-back some code for shared/exclusive 
 locking that we are using in our labs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-770) Slow add_auth calls with multi-threaded client

2010-05-06 Thread Kapil Thangavelu (JIRA)
Slow add_auth calls with multi-threaded client
--

 Key: ZOOKEEPER-770
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-770
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib-bindings
Affects Versions: 3.3.0, 3.4.0
 Environment: ubuntu lucid (10.04), zk trunk (3.4)
Reporter: Kapil Thangavelu


Calls to add_auth are a bit slow from the c client library. The auth callback 
typically takes multiple seconds to fire. I instrumented the java, c binding, 
and python binding with a few log statements to find out where the slowness was 
occuring ( 
http://bazaar.launchpad.net/~hazmat/zookeeper/fast-auth-instrumented/revision/647).
 It looks like when the io thread polls, it doesn't register interest in the 
incoming packet, so the auth success message from the server and the auth 
callback are only processed when the poll timeouts. I tried modifying 
mt_adapter.c so the poll registers interest in both events, this causes a 
considerably more wakeups but it does address the issue of making add_auth 
fast.  I think the ideal solution would be some sort of additional auth 
handshake state on the handle, that zookeeper_interest could utilize to suggest 
both POLLIN|POLLOUT are wanted for subsequent calls to poll during the auth 
handshake handle state.

i'm attaching a script that takes 13s or 1.6s for the auth callback depending 
on the session time out value (which in turn figures into the calculation of 
the poll timeout).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-770) Slow add_auth calls with multi-threaded client

2010-05-06 Thread Kapil Thangavelu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Thangavelu updated ZOOKEEPER-770:
---

Attachment: authtest.py

script that demonstrates that time for auth callbacks is dependent on session 
timeout, which is used to calculate poll timeout.

 Slow add_auth calls with multi-threaded client
 --

 Key: ZOOKEEPER-770
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-770
 Project: Zookeeper
  Issue Type: Bug
  Components: c client, contrib-bindings
Affects Versions: 3.3.0, 3.4.0
 Environment: ubuntu lucid (10.04), zk trunk (3.4)
Reporter: Kapil Thangavelu
Priority: Minor
 Attachments: authtest.py


 Calls to add_auth are a bit slow from the c client library. The auth callback 
 typically takes multiple seconds to fire. I instrumented the java, c binding, 
 and python binding with a few log statements to find out where the slowness 
 was occuring ( 
 http://bazaar.launchpad.net/~hazmat/zookeeper/fast-auth-instrumented/revision/647).
  It looks like when the io thread polls, it doesn't register interest in the 
 incoming packet, so the auth success message from the server and the auth 
 callback are only processed when the poll timeouts. I tried modifying 
 mt_adapter.c so the poll registers interest in both events, this causes a 
 considerably more wakeups but it does address the issue of making add_auth 
 fast.  I think the ideal solution would be some sort of additional auth 
 handshake state on the handle, that zookeeper_interest could utilize to 
 suggest both POLLIN|POLLOUT are wanted for subsequent calls to poll during 
 the auth handshake handle state.
 i'm attaching a script that takes 13s or 1.6s for the auth callback depending 
 on the session time out value (which in turn figures into the calculation of 
 the poll timeout).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-06 Thread Sergey Doroshenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Doroshenko updated ZOOKEEPER-769:


Attachment: follower.log
leader.log
observer.log

Logs

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
 Fix For: 3.3.0

 Attachments: follower.log, leader.log, observer.log, zoo1.cfg


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2010-05-06 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-767:
---

Status: Open  (was: Patch Available)

Thanks Sam, release audit failure typically means that you are missing license 
headers in your new source files. Could you update the patch? (checkout the 
existing source files for example of what the license header should be)


 Submitting Demo/Recipe Shared / Exclusive Lock Code
 ---

 Key: ZOOKEEPER-767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
 Project: Zookeeper
  Issue Type: Improvement
  Components: recipes
Affects Versions: 3.3.0
Reporter: Sam Baskinger
Assignee: Sam Baskinger
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-767.patch


 Networked Insights would like to share-back some code for shared/exclusive 
 locking that we are using in our labs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2010-05-06 Thread Sam Baskinger (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Baskinger updated ZOOKEEPER-767:


Attachment: (was: ZOOKEEPER-767.patch)

 Submitting Demo/Recipe Shared / Exclusive Lock Code
 ---

 Key: ZOOKEEPER-767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
 Project: Zookeeper
  Issue Type: Improvement
  Components: recipes
Affects Versions: 3.3.0
Reporter: Sam Baskinger
Assignee: Sam Baskinger
Priority: Minor
 Fix For: 3.4.0


 Networked Insights would like to share-back some code for shared/exclusive 
 locking that we are using in our labs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2010-05-06 Thread Sam Baskinger (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864913#action_12864913
 ] 

Sam Baskinger commented on ZOOKEEPER-767:
-

I was wondering about that! Thanks, Patrick. I've got a few moments right now 
to get that updated.

Naive question, do I need to invalidate / reflag the issue as having a patch 
for the build to pick it up? Thank you!

Sam Baskinger





 Submitting Demo/Recipe Shared / Exclusive Lock Code
 ---

 Key: ZOOKEEPER-767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
 Project: Zookeeper
  Issue Type: Improvement
  Components: recipes
Affects Versions: 3.3.0
Reporter: Sam Baskinger
Assignee: Sam Baskinger
Priority: Minor
 Fix For: 3.4.0


 Networked Insights would like to share-back some code for shared/exclusive 
 locking that we are using in our labs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2010-05-06 Thread Sam Baskinger (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Baskinger updated ZOOKEEPER-767:


Status: Patch Available  (was: Open)

This patch is the same as the previous one of the same name but with the 
license block added to the top of the test class.

 Submitting Demo/Recipe Shared / Exclusive Lock Code
 ---

 Key: ZOOKEEPER-767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
 Project: Zookeeper
  Issue Type: Improvement
  Components: recipes
Affects Versions: 3.3.0
Reporter: Sam Baskinger
Assignee: Sam Baskinger
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-767.patch


 Networked Insights would like to share-back some code for shared/exclusive 
 locking that we are using in our labs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2010-05-06 Thread Sam Baskinger (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Baskinger updated ZOOKEEPER-767:


Attachment: ZOOKEEPER-767.patch

Unit tests and recipe implementation of a SharedExclusiveLock. This new 
attachment contains copyright/license information for the test class.

 Submitting Demo/Recipe Shared / Exclusive Lock Code
 ---

 Key: ZOOKEEPER-767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
 Project: Zookeeper
  Issue Type: Improvement
  Components: recipes
Affects Versions: 3.3.0
Reporter: Sam Baskinger
Assignee: Sam Baskinger
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-767.patch


 Networked Insights would like to share-back some code for shared/exclusive 
 locking that we are using in our labs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2010-05-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864925#action_12864925
 ] 

Hadoop QA commented on ZOOKEEPER-767:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12443903/ZOOKEEPER-767.patch
  against trunk revision 941521.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/86/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/86/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/86/console

This message is automatically generated.

 Submitting Demo/Recipe Shared / Exclusive Lock Code
 ---

 Key: ZOOKEEPER-767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
 Project: Zookeeper
  Issue Type: Improvement
  Components: recipes
Affects Versions: 3.3.0
Reporter: Sam Baskinger
Assignee: Sam Baskinger
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-767.patch


 Networked Insights would like to share-back some code for shared/exclusive 
 locking that we are using in our labs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code

2010-05-06 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864936#action_12864936
 ] 

Patrick Hunt commented on ZOOKEEPER-767:


Sam, yes, you have to cancel/submit for the system to pickup the new 
attachment. 

Also you should just upload the new patch with the same name. JIRA will handle 
this properly (and as a result you can see the history of the patch as changes 
are made).

 Submitting Demo/Recipe Shared / Exclusive Lock Code
 ---

 Key: ZOOKEEPER-767
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767
 Project: Zookeeper
  Issue Type: Improvement
  Components: recipes
Affects Versions: 3.3.0
Reporter: Sam Baskinger
Assignee: Sam Baskinger
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-767.patch


 Networked Insights would like to share-back some code for shared/exclusive 
 locking that we are using in our labs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-06 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864953#action_12864953
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Sergey - 

In the cfg files for nodes 3 and 5, did you include the following line? 

peerType=observer

See http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html for 
details. The observer log contains this line:

2010-05-06 22:46:00,876 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:quorump...@642] - FOLLOWING

which is a big red flag because observers should never adopt the FOLLOWING 
state. 

If I don't have that line I can reproduce your issue. If I add it, the 
observers work as expected. Can you check your cfg files?

cheers,
Henry

 Leader can treat observers as quorum members
 

 Key: ZOOKEEPER-769
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment: Ubuntu Karmic x64
Reporter: Sergey Doroshenko
 Fix For: 3.3.0

 Attachments: follower.log, leader.log, observer.log, zoo1.cfg


 In short: it seems leader can treat observers as quorum members.
 Steps to repro:
 1. Server configuration: 3 voters, 2 observers (attached).
 2. Bring up 2 voters and one observer. It's enough for quorum.
 3. Shut down the one from the quorum who is the follower.
 As I understand, expected result is that leader will start a new election 
 round so that to regain quorum.
 But the real situation is that it just says goodbye to that follower, and is 
 still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
 trying to regain a quorum).
 (Expectedly, if on step 3 we shut down the leader, not the follower, 
 remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-701) GSoC 2010: Monitoring Recipes and Web-based Administrative Interface

2010-05-06 Thread Savu Andrei (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864970#action_12864970
 ] 

Savu Andrei commented on ZOOKEEPER-701:
---

I have created a wiki page for tracking my work on this project. You can find 
it at the following url:

http://wiki.apache.org/hadoop/ZooKeeper/GSoCMonitoringAndWebInterface



 GSoC 2010: Monitoring Recipes and Web-based Administrative Interface
 

 Key: ZOOKEEPER-701
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-701
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Savu Andrei
 Attachments: milestones.txt


 Monitoring Recipes And Web-based Administrative Interface
 Mentor: Patrick Hunt (ph...@apache.org)
 Requirements:
 Modern web platform - e.g. Django. Some design or UI skills would help. Java 
 for adding methods to ZooKeeper.
 Description:
 ZooKeeper is a complex distributed system. Understanding how well it is 
 running is tremendously important. Patrick Hunt has created a Django-based 
 dashboard (see http://github.com/phunt/zookeeper_dashboard) that allows some 
 insight into how ZooKeeper is running. This is a great foundation on which to 
 build; however there are improvements that could be made! This project would 
 capture much more information from ZooKeeper, adding hooks to retrieve it 
 where necessary and visualise it in a appealing and useful way. Integration 
 with Ganglia would be a definite plus.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.