ZooKeeper ensemble configuration generator

2009-08-05 Thread Patrick Hunt
This is currently more of a developer tool but I thought it might be 
useful for users as well -- a basic ZooKeeper ensemble configuration 
generator that takes some of the drudge work out of generating configs. 
I got sick of creating these by hand for the various setups I have (esp 
when experimenting) so I decided to build upon an existing templating 
system. (python/cheetah). It's up to github, feel free to check it out 
and fork/patches/comments/etc...


http://github.com/phunt/zkconf/tree/master

Patrick


Re: BUILDS ARE BACK NORMAL

2009-08-05 Thread Mahadev Konar
HI all, 
 As giri mentioned, the builds are back to normal and so is the patch
process.
http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/Zookeeper-Patch-ves
ta.apache.org/

The patches are being run against hudson, so you DO NOT need to cancel and
resubmit patches.

Thanks
mahadev


On 8/5/09 9:50 PM, "Giridharan  Kesavan"  wrote:

> Restarted all the build jobs on hudson; Builds are running fine.
> Build failures are due to  " /tmp: File system full, swap space limit exceeded
> "
> 
> Thanks,
> -Giri
> 
>> -Original Message-
>> From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com]
>> Sent: Thursday, August 06, 2009 9:16 AM
>> To: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
>> common-...@hadoop.apache.org; pig-...@hadoop.apache.org; zookeeper-
>> d...@hadoop.apache.org
>> Subject: build failures on hudson zones
>> 
>> Build on hudson.zones are failing as the zonestorage for hudson is
>> full.
>> I 've sent an email to the ASF infra team about the space issues on
>> hudson zones.
>> 
>> Once the issues is resolved I would restart hudson for builds.
>> 
>> Thanks,
>> Giri
>> 
> 
> 



[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Patch Available  (was: Open)

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161355 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:34032]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d516

[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739898#action_12739898
 ] 

Benjamin Reed commented on ZOOKEEPER-483:
-

I've addressed 1) in the attached patch.

for 2) we are not eating the IOException. we are actually shutting things down. 
the bug is actually that we are passing it up to the upper layer, which does 
not know anything about the follower thread. we need to handle it here.

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zoo

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161355 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:34032]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d516011c 

BUILDS ARE BACK NORMAL

2009-08-05 Thread Giridharan Kesavan
Restarted all the build jobs on hudson; Builds are running fine.
Build failures are due to  " /tmp: File system full, swap space limit exceeded "

Thanks,
-Giri

> -Original Message-
> From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com]
> Sent: Thursday, August 06, 2009 9:16 AM
> To: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> common-...@hadoop.apache.org; pig-...@hadoop.apache.org; zookeeper-
> d...@hadoop.apache.org
> Subject: build failures on hudson zones
> 
> Build on hudson.zones are failing as the zonestorage for hudson is
> full.
> I 've sent an email to the ASF infra team about the space issues on
> hudson zones.
> 
> Once the issues is resolved I would restart hudson for builds.
> 
> Thanks,
> Giri
> 



build failures on hudson zones

2009-08-05 Thread Giridharan Kesavan
Build on hudson.zones are failing as the zonestorage for hudson is full.
I 've sent an email to the ASF infra team about the space issues on hudson 
zones.

Once the issues is resolved I would restart hudson for builds.

Thanks,
Giri




[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Flavio Paiva Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Paiva Junqueira updated ZOOKEEPER-498:
-

Status: Patch Available  (was: Open)

> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Flavio Paiva Junqueira
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
> zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Flavio Paiva Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Paiva Junqueira updated ZOOKEEPER-498:
-

Attachment: ZOOKEEPER-498.patch

I have generated a patch for this issue. I verified that I didn't do the 
correct checks in ZOOKEEPER-491, so I try to fix it in this patch. I have also 
modified the test to fix the problem with the fail assertion, and I have 
inspected the logs to see if it is behaving as expected. I can see no problem 
at this time with this patch.

If someone else is interested in checking it out, please do it.

> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Flavio Paiva Junqueira
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
> zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739891#action_12739891
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-498:
--

Pat, we have a description of how to configure in the "Cluster options" of the 
Administrator guide. We are missing an example, which is in the source code as 
you point out.

> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Flavio Paiva Junqueira
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
> zk498-test.tar.gz, zoo.cfg
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-490:
---

Status: Patch Available  (was: Open)

> the java docs for session creation are misleading/incomplete
> 
>
> Key: ZOOKEEPER-490
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-490.patch
>
>
> the javadoc for ZooKeeper constructor says:
>  * The client object will pick an arbitrary server and try to connect to 
> it.
>  * If failed, it will try the next one in the list, until a connection is
>  * established, or all the servers have been tried.
> the "or all server tried" phrase is misleading, it should indicate that we 
> retry until success, con closed, or session expired. 
> we also need ot mention that connection is async, that constructor returns 
> immed and you need to look for connection event in watcher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-490:
---

Attachment: ZOOKEEPER-490.patch

this patch updates the javadoc for zk construction
talks about async nature
talks about thread safety


> the java docs for session creation are misleading/incomplete
> 
>
> Key: ZOOKEEPER-490
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1, 3.2.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-490.patch
>
>
> the javadoc for ZooKeeper constructor says:
>  * The client object will pick an arbitrary server and try to connect to 
> it.
>  * If failed, it will try the next one in the list, until a connection is
>  * established, or all the servers have been tried.
> the "or all server tried" phrase is misleading, it should indicate that we 
> retry until success, con closed, or session expired. 
> we also need ot mention that connection is async, that constructor returns 
> immed and you need to look for connection event in watcher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-500) Async methods shouldnt throw exceptions

2009-08-05 Thread Utkarsh Srivastava (JIRA)
Async methods shouldnt throw exceptions
---

 Key: ZOOKEEPER-500
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-500
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bookkeeper
Reporter: Utkarsh Srivastava


Async methods like asyncLedgerCreate and Open shouldnt be throwing 
InterruptedException and BKExceptions. 

The present method signatures lead to messy application code since one is 
forced to have error handling code in 2 places: inside the callback to handler 
a non-OK return code, and outside for handling the exceptions thrown by the 
call. 

There should be only one way to indicate error conditions, and that should be 
through a non-ok return code to the callback.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739796#action_12739796
 ] 

Patrick Hunt commented on ZOOKEEPER-498:


there are docs in the source code that provide good low level detail on flex 
quorum implementation

HOWEVER, there are NO docs in the Ops guide detailing user level flex quorum 
operation

we need to add docs (as part of this fix) to forrest detailing how to 
operate/troubleshoot/debug flex quorum


> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Flavio Paiva Junqueira
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
> zk498-test.tar.gz, zoo.cfg
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739789#action_12739789
 ] 

Patrick Hunt commented on ZOOKEEPER-498:


Todd,I did see an issue with your config, it's not:

group.1:1:2:3

rather it's:

group.1=1:2:3

(should be = not : )


Regardless though - even after I fix this it's still not forming a cluster 
properly, we're still looking.


> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Flavio Paiva Junqueira
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
> zk498-test.tar.gz, zoo.cfg
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739787#action_12739787
 ] 

Patrick Hunt commented on ZOOKEEPER-498:


Please fix the following as well - incorrect logging levels are being used in 
quorum code, example:

2009-08-05 15:17:02,733 - ERROR [WorkerSender Thread:quorumcnxmana...@341] - 
There is a connection for server 1
2009-08-05 15:17:02,753 - ERROR [WorkerSender Thread:quorumcnxmana...@341] - 
There is a connection for server 2

this is INFO, not ERROR


> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Flavio Paiva Junqueira
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
> zk498-test.tar.gz, zoo.cfg
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-498:
--

Assignee: Flavio Paiva Junqueira  (was: Patrick Hunt)

> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Flavio Paiva Junqueira
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
> zk498-test.tar.gz, zoo.cfg
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-499:
---

Status: Patch Available  (was: Open)

> electionAlg should default to FLE (3) - regression
> --
>
> Key: ZOOKEEPER-499
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server, tests
>Affects Versions: 3.2.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-499.patch, ZOOKEEPER-499_br3.2.patch
>
>
> there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
> (incorrectly defaults to 0)
> also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-499:
---

Attachment: ZOOKEEPER-499_br3.2.patch
ZOOKEEPER-499.patch

patches to fix on trunk and branch (br3.2 is the branch patch)

this fixes the problem - electionAlg again defaults to 3
it also adds a test to verify fle is used by default
it also fixes a test that fails if fle is used (vs algo 0) which is due to a 
difference in the way jdk exposes
  unresolved host names when using udp vs tcp.


> electionAlg should default to FLE (3) - regression
> --
>
> Key: ZOOKEEPER-499
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server, tests
>Affects Versions: 3.2.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-499.patch, ZOOKEEPER-499_br3.2.patch
>
>
> there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
> (incorrectly defaults to 0)
> also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration

2009-08-05 Thread Todd Greenwood
Mahadev, comments inline:

> -Original Message-
> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> Sent: Wednesday, August 05, 2009 1:47 PM
> To: zookeeper-dev@hadoop.apache.org
> Subject: Re: Optimized WAN ZooKeeper Config : Multi-Ensemble
configuration
> 
> Todd,
>  Comments in line:
> 
> 
> On 8/5/09 12:10 PM, "Todd Greenwood" 
wrote:
> 
> > Flavio/Patrick/Mahadev -
> >
> > Thanks for your support to date. As I understand it, the sticky
points
> > w/ respect to WAN deployments are:
> >
> > 1. Leader Election:
> >
> > Leader elections in the WAN config (pod zk server weight = 0) is a
bit
> > troublesome (ZOOKEEPER-498)
> Yes, until ZOOKEEPER-498 is fixed, you wont be able to use it with
groups
> and zero weight.
> 
> >
> > 2. Network Connectivity Required:
> >
> > ZooKeeper clients cannot read/write to ZK Servers if the Server does
not
> > have network connectivity to the quorum. In short, there is a hard
> > requirement to have network connectivity in order for the clients to
> > access the shared memory graph in ZK.
> Yes
> 
> >
> > Alternative
> > ---
> >
> > I have seen some discussion about in the past re: multi-ensemble
> > solutions. Essentially, put one ensemble in each physical location
> > (POD), and another in your DC, and have a fairly simple process
> > coordinate synchronizing the various ensembles. If the POD writes
can be
> > confined to a sub-tree in the master graph, then this should be
fairly
> > simple. I'm imagining the following:
> >
> > DC (master) graph:
> > /root/pods/1/data/item1
> > /root/pods/1/data/item2
> > /root/pods/1/data/item3
> > /root/pods/2
> > /root/pods/3
> > ...etc
> > /root/shared/allpods/readonly/data/item1
> > /root/shared/allpods/readonly/data/item2
> > ...etc
> >
> > This has the advantage of minimizing cross pod traffic, which could
be a
> > real perf killer in an WAN. It also provides transacted writes in
the
> > PODs, even in the disconnected state. Clearly, another portion of
the
> > business logic has to reconcile the DC (master) graph such that each
of
> > the pods data items are processed, etc.
> >
> > Does anyone have any experience with this (pitfalls, suggestions,
etc.?)
> As far as I understand is that you mean that have a master Cluster
with
> other in a different data center syncing with the master (just a
subtree)?
> Is that correct?
> 
> If yes, this is what one of our users in Yahoo! Search do. They have a
> master cluster and a smaller cluster in a different datacenter and a
> brdige
> that copies data from the master cluster (only a subtree) to the
smaller
> one
> and keeps them in syncs.
> 

Yes, this is exactly what I'm proposing. With the addition that I'll
sync subtrees in both directions, and have a separate process reconcile
data from the various pods, like so:

#pod1 ensemble
/root/a/b

#pod2 ensemble
/root/a/b

#dc ensemble
/root/shared/foo/bar

# Mapping (modeled after perforce client config)
# [ensemble]:[path] [ensemble]:[path]
# sync pods to dc
[POD1]:/root/... [DC]:/root/pods/POD1/...
[POD2]:/root/... [DC]:/root/pods/POD2/...
# sync dc to pods
[DC]:/root/shared/... [POD1]:/shared/...
[DC]:/root/shared/... [POD2]:/shared/...
[DC]:/root/shared/... [POD3]:/shared/...

Now, for our needs, we'd like the DC data aggregated, so I'll have
another process handle aggregating the pod specific data like so:

POD Data Aggregator: aggregate data in [DC]:/root/pods/POD(N) to
[DC]:/root/aggregated/data.

This is just off the top of my head.

-Todd

> 
> Thanks
> mahadev
> >
> > -Todd



Re: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration

2009-08-05 Thread Mahadev Konar
Todd,
 Comments in line:


On 8/5/09 12:10 PM, "Todd Greenwood"  wrote:

> Flavio/Patrick/Mahadev -
> 
> Thanks for your support to date. As I understand it, the sticky points
> w/ respect to WAN deployments are:
> 
> 1. Leader Election:
> 
> Leader elections in the WAN config (pod zk server weight = 0) is a bit
> troublesome (ZOOKEEPER-498)
Yes, until ZOOKEEPER-498 is fixed, you wont be able to use it with groups
and zero weight.

> 
> 2. Network Connectivity Required:
> 
> ZooKeeper clients cannot read/write to ZK Servers if the Server does not
> have network connectivity to the quorum. In short, there is a hard
> requirement to have network connectivity in order for the clients to
> access the shared memory graph in ZK.
Yes

> 
> Alternative
> ---
> 
> I have seen some discussion about in the past re: multi-ensemble
> solutions. Essentially, put one ensemble in each physical location
> (POD), and another in your DC, and have a fairly simple process
> coordinate synchronizing the various ensembles. If the POD writes can be
> confined to a sub-tree in the master graph, then this should be fairly
> simple. I'm imagining the following:
> 
> DC (master) graph:
> /root/pods/1/data/item1
> /root/pods/1/data/item2
> /root/pods/1/data/item3
> /root/pods/2
> /root/pods/3
> ...etc
> /root/shared/allpods/readonly/data/item1
> /root/shared/allpods/readonly/data/item2
> ...etc
> 
> This has the advantage of minimizing cross pod traffic, which could be a
> real perf killer in an WAN. It also provides transacted writes in the
> PODs, even in the disconnected state. Clearly, another portion of the
> business logic has to reconcile the DC (master) graph such that each of
> the pods data items are processed, etc.
> 
> Does anyone have any experience with this (pitfalls, suggestions, etc.?)
As far as I understand is that you mean that have a master Cluster with
other in a different data center syncing with the master (just a subtree)?
Is that correct? 

If yes, this is what one of our users in Yahoo! Search do. They have a
master cluster and a smaller cluster in a different datacenter and a brdige
that copies data from the master cluster (only a subtree) to the smaller one
and keeps them in syncs.


Thanks
mahadev
> 
> -Todd



[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-499:
---

Release Note: 
workaround in 3.2.0 (this only effects 3.2.0)

set electionAlg=3 in server config files.

> electionAlg should default to FLE (3) - regression
> --
>
> Key: ZOOKEEPER-499
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server, tests
>Affects Versions: 3.2.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
>
> there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
> (incorrectly defaults to 0)
> also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-462) Last hint for open ledger

2009-08-05 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-462:


Fix Version/s: 3.3.0

> Last hint for open ledger
> -
>
> Key: ZOOKEEPER-462
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-462
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: contrib-bookkeeper
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-462.patch
>
>
> In some use cases of BookKeeper, it is useful to be able to read from a 
> ledger before closing the ledger. To enable such a feature, the writer has to 
> be able to communicate to a reader how many entries it has been able to write 
> successfully. The main idea of this jira is to continuously update a znode 
> with the number of successful writes, and a reader can, for example, watch 
> the node for changes.
>  I was thinking of having a configuration parameter to state how often a 
> writer should update the hint on ZooKeeper (e.g., every 1000 requests, every 
> 10,000 requests). Clearly updating more often increases the overhead of 
> writing to ZooKeeper, although the impact on the performance of writes to 
> BookKeeper should be minimal given that we make an asynchronous call to 
> update the hint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-05 Thread Patrick Hunt (JIRA)
electionAlg should default to FLE (3) - regression
--

 Key: ZOOKEEPER-499
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
 Project: Zookeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Priority: Blocker
 Fix For: 3.2.1, 3.3.0


there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
(incorrectly defaults to 0)

also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-499:
--

Assignee: Patrick Hunt

> electionAlg should default to FLE (3) - regression
> --
>
> Key: ZOOKEEPER-499
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server, tests
>Affects Versions: 3.2.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
>
> there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
> (incorrectly defaults to 0)
> also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Optimized WAN ZooKeeper Config : Multi-Ensemble configuration

2009-08-05 Thread Todd Greenwood
Flavio/Patrick/Mahadev -

Thanks for your support to date. As I understand it, the sticky points
w/ respect to WAN deployments are:

1. Leader Election: 

Leader elections in the WAN config (pod zk server weight = 0) is a bit
troublesome (ZOOKEEPER-498)

2. Network Connectivity Required: 

ZooKeeper clients cannot read/write to ZK Servers if the Server does not
have network connectivity to the quorum. In short, there is a hard
requirement to have network connectivity in order for the clients to
access the shared memory graph in ZK.

Alternative
---

I have seen some discussion about in the past re: multi-ensemble
solutions. Essentially, put one ensemble in each physical location
(POD), and another in your DC, and have a fairly simple process
coordinate synchronizing the various ensembles. If the POD writes can be
confined to a sub-tree in the master graph, then this should be fairly
simple. I'm imagining the following:

DC (master) graph:
/root/pods/1/data/item1
/root/pods/1/data/item2
/root/pods/1/data/item3
/root/pods/2
/root/pods/3
...etc
/root/shared/allpods/readonly/data/item1
/root/shared/allpods/readonly/data/item2
...etc

This has the advantage of minimizing cross pod traffic, which could be a
real perf killer in an WAN. It also provides transacted writes in the
PODs, even in the disconnected state. Clearly, another portion of the
business logic has to reconcile the DC (master) graph such that each of
the pods data items are processed, etc.

Does anyone have any experience with this (pitfalls, suggestions, etc.?)

-Todd


RE: Unending Leader Elections in WAN deploy

2009-08-05 Thread Todd Greenwood
IT says yes, there are firewalls, but that yes, there is full
connectivity between each of the zk servers.

> -Original Message-
> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> Sent: Tuesday, August 04, 2009 6:01 PM
> To: zookeeper-dev@hadoop.apache.org
> Subject: Re: Unending Leader Elections in WAN deploy
> 
> Hi todd,
>   I see a lot of
> 
> java.net.ConnectException: Connection refused
> at sun.nio.ch.Net.connect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
> at
java.nio.channels.SocketChannel.open(SocketChannel.java:146)
> at
>
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnx
Ma
> na
> ger.java:324)
> at
>
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxMana
ge
> r.
> java:304)
> at
>
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSe
nd
> er
> .process(FastLeaderElection.java:317)
> at
>
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSe
nd
> er
> .run(FastLeaderElection.java:290)
> at java.lang.Thread.run(Thread.java:619)
> 
> 
> Is it possible that there is some firewall? Can all the servers 1-9
> connect
> to all the others using ports that you specified in zoo.cfg i.e
2888/3888?
> 
> 
> Thanks
> mahadev
> 
> 
> On 8/4/09 4:56 PM, "Todd Greenwood"  wrote:
> 
> > Looks like we're not getting *any* leader elected now Logs
attached.
> >
> >> -Original Message-
> >> From: Todd Greenwood [mailto:to...@audiencescience.com]
> >> Sent: Tuesday, August 04, 2009 4:07 PM
> >> To: zookeeper-dev@hadoop.apache.org
> >> Subject: RE: Unending Leader Elections in WAN deploy
> >>
> >> Patrick, thanks! I'll forward on to IT and I'll report back to you
> >> shortly...
> >>
> >>> -Original Message-
> >>> From: Patrick Hunt [mailto:ph...@apache.org]
> >>> Sent: Tuesday, August 04, 2009 3:55 PM
> >>> To: zookeeper-dev@hadoop.apache.org
> >>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>
> >>> Todd, Mahadev and I looked at this and it turns out to be a
> >> regression.
> >>> Ironically a patch I created for 3.2 branch to add quorum tests
> >> actually
> >>> broke the quorum config -- a default value for a config parameter
> > was
> >>> lost. I'm going to submit a patch asap to get the default back,
but
> >> for
> >>> the time being you can set:
> >>>
> >>> electionAlg=3
> >>>
> >>> in each of your config files.
> >>>
> >>> You should see reference to FastLeaderElection in your log files
if
> >> this
> >>> parameter is set correctly.
> >>>
> >>> Sorry for the trouble,
> >>>
> >>> Patrick
> >>>
> >>> Todd Greenwood wrote:
>  Mahadev,
> 
>  I just heard from IT that this build behaves in exactly the same
> > way
> >> as
>  previous versions, e.g. we get continuous leader elections that
>  disconnect the followers and then get re-elected, and
> >> disconnect...etc.
> 
>  This is from a fresh sync to the 3.2 branch:
> 
>  svn co
> 
> > http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
>  ./branch-3.2
> 
>  CHANGES.TXT show the various fixes included:
> 
> 
> >>
> >
to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
>  /src/original$ head -n 50 branch-3.2/CHANGES.txt
>  Release 3.2.1
> 
>  Backward compatibile changes:
> 
>  BUGFIXES:
>    ZOOKEEPER-468. avoid compile warning in send_auth_info().
(chris
> >> via
>  flavio)
> 
>    ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten
(chris
> >> via
>  mahadev)
> 
>    ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via
> > mahadev)
> 
>    ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris
> > via
>  mahadev)
> 
>    ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
>    (giri via mahadev)
> 
>    ZOOKEEPER-467.  Change log level in BookieHandle (flavio via
> >> mahadev)
> 
>    ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent
> >> immediate
>    failure. (chris via mahadev)
> 
>    ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev
> >> via
>  phunt)
> 
>    ZOOKEEPER-457. Make ZookeeperMain public, support for HBase
(and
>  other)
>    embedded clients (ryan rawson via phunt)
> 
>    ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio
> >> via
>  mahadev)
> 
>    ZOOKEEPER-479.  QuorumHierarchical does not count groups
> > correctly
>    (flavio via mahadev)
> 
>    ZOOKEEPER-466. crash on zookeeper_close() when using auth with
> >> empty
>  cert
>    (Chris Darroch via phunt)
> 
>    ZOOKEEPER-480. FLE should perform leader check when node is not
>  leading and
>    add vote of follower (flavio via mahadev)
> 
>    ZOOKEEPER-491. Prevent zero-weight servers from being elected
> >> (fl

[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-498:
---

Attachment: zk498-test.tar.gz

I attached zk498-test.tar.gz - this is a 5 server config (2 0weight) that fails 
to achieve quorum.

run start.sh/stop.sh and checkout the individual logs for details.



> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Patrick Hunt
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
> zk498-test.tar.gz, zoo.cfg
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739609#action_12739609
 ] 

Patrick Hunt commented on ZOOKEEPER-498:


Looks to me like 0 weight is still busted, fle0weighttest is actually failing 
on my machine, however it's reported as success:
- Standard Error -
Exception in thread "Thread-108" junit.framework.AssertionFailedError: Elected 
zero-weight server
at junit.framework.Assert.fail(Assert.java:47)
at 
org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138)
-  ---

this is probably due because the test is calling assert in a thread other than 
the main test thread - which junit will not track/knowabout.

One problem I see with these tests (0weight test I looked at) -- it doesn't 
have a client attempt to connect to the various servers as part of declaring 
success. Really we should only consider "success"ful test (ie assert that) if a 
client can connect to each server in the cluster and change/seechanges. As part 
of fixing this we really need to do a sanity check by testing the various 
command lines and checking that a client can connect.

I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new epoch 
seems to just thrash...

Also I tried 3 & 5 server quorums "by hand from the command line" with 0 weight 
and they see similar issues to what Todd is seeing.

this is happening for me on both the trunk and 3.2 branch source.

> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Patrick Hunt
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.

2009-08-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739588#action_12739588
 ] 

Hadoop QA commented on ZOOKEEPER-484:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12415556/ZOOKEEPER-484.patch
  against trunk revision 800990.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/167/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/167/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/167/console

This message is automatically generated.

> Clients get SESSION MOVED exception when switching from follower to a leader.
> -
>
> Key: ZOOKEEPER-484
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.0
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: sessionTest.patch, ZOOKEEPER-484.patch
>
>
> When a client is connected to follower and get disconnected and connects to a 
> leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new 
> feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO 
> NOT have this problem. The fix is to make sure the ownership of a connection 
> gets changed when a session moves from follower to the leader. The workaround 
> to it in 3.2.0 would be to swithc off connection from clients to the leader. 
> take a look at *leaderServers* java property in 
> http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: hudson patch build back to normal

2009-08-05 Thread Patrick Hunt

Thanks Giri!

Patrick

Giridharan Kesavan wrote:

If you have changed the jira status to patch available in the last couple of 
days please resubmit your patch for hudson to pick your patch for testing.
-Giri


-Original Message-
From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com]
Sent: Wednesday, August 05, 2009 7:18 PM
To: zookeeper-dev@hadoop.apache.org
Cc: Nigel Daley
Subject: hudson patch build back to normal

Sendmail issues on hudson.zones is fixed now and patch build for
zookeeper is restarted.

Regards,
Giri


RE: hudson patch build back to normal

2009-08-05 Thread Giridharan Kesavan
If you have changed the jira status to patch available in the last couple of 
days please resubmit your patch for hudson to pick your patch for testing.
-Giri

> -Original Message-
> From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com]
> Sent: Wednesday, August 05, 2009 7:18 PM
> To: zookeeper-dev@hadoop.apache.org
> Cc: Nigel Daley
> Subject: hudson patch build back to normal
> 
> Sendmail issues on hudson.zones is fixed now and patch build for
> zookeeper is restarted.
> 
> Regards,
> Giri


hudson patch build back to normal

2009-08-05 Thread Giridharan Kesavan
Sendmail issues on hudson.zones is fixed now and patch build for zookeeper is 
restarted.

Regards,
Giri


[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.

2009-08-05 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated ZOOKEEPER-484:
-

Status: Patch Available  (was: Open)

> Clients get SESSION MOVED exception when switching from follower to a leader.
> -
>
> Key: ZOOKEEPER-484
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.0
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: sessionTest.patch, ZOOKEEPER-484.patch
>
>
> When a client is connected to follower and get disconnected and connects to a 
> leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new 
> feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO 
> NOT have this problem. The fix is to make sure the ownership of a connection 
> gets changed when a session moves from follower to the leader. The workaround 
> to it in 3.2.0 would be to swithc off connection from clients to the leader. 
> take a look at *leaderServers* java property in 
> http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.

2009-08-05 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated ZOOKEEPER-484:
-

Status: Open  (was: Patch Available)

resubmitting the patch to the patch queue.

> Clients get SESSION MOVED exception when switching from follower to a leader.
> -
>
> Key: ZOOKEEPER-484
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.0
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: sessionTest.patch, ZOOKEEPER-484.patch
>
>
> When a client is connected to follower and get disconnected and connects to a 
> leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new 
> feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO 
> NOT have this problem. The fix is to make sure the ownership of a connection 
> gets changed when a session moves from follower to the leader. The workaround 
> to it in 3.2.0 would be to swithc off connection from clients to the leader. 
> take a look at *leaderServers* java property in 
> http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-447) zkServer.sh doesn't allow different config files to be specified on the command line

2009-08-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739429#action_12739429
 ] 

Hudson commented on ZOOKEEPER-447:
--

Integrated in ZooKeeper-trunk #405 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/])
. zkServer.sh doesn't allow different config files to be specified on the 
command line


> zkServer.sh doesn't allow different config files to be specified on the 
> command line
> 
>
> Key: ZOOKEEPER-447
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-447
> Project: Zookeeper
>  Issue Type: Improvement
>Affects Versions: 3.1.1, 3.2.0
>Reporter: Henry Robinson
>Assignee: Henry Robinson
>Priority: Minor
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-447.patch
>
>
> Unless I'm missing something, you can change the directory that the zoo.cfg 
> file is in by setting ZOOCFGDIR but not the name of the file itself.
> I find it convenient myself to specify the config file on the command line, 
> but we should also let it be specified by environment variable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-480) FLE should perform leader check when node is not leading and add vote of follower

2009-08-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739427#action_12739427
 ] 

Hudson commented on ZOOKEEPER-480:
--

Integrated in ZooKeeper-trunk #405 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/])
. FLE should perform leader check when node is not leading and add vote of 
follower (flavio via mahadev)


> FLE should perform leader check when node is not leading and add vote of 
> follower
> -
>
> Key: ZOOKEEPER-480
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-480
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-480-3.2branch.patch, 
> ZOOKEEPER-480-3.2branch.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, 
> ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch
>
>
> As a server may join leader election while others have already elected a 
> leader, it is necessary that a server handles some special cases of leader 
> election when notifications are from servers that are either LEADING or 
> FOLLOWING. In such special cases, we check if we have received a message from 
> the leader to declare a leader elected. This check does not consider the case 
> that the process performing the check might be a recently elected leader, and 
> consequently the check fails.
> This patch also adds a new case, which corresponds to adding a vote to 
> recvset when the notification is from a process LEADING or FOLLOWING. This 
> fixes the case raised in ZOOKEEPER-475.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-493) patch for command line setquota

2009-08-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739428#action_12739428
 ] 

Hudson commented on ZOOKEEPER-493:
--

Integrated in ZooKeeper-trunk #405 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/])
. patch for command line setquota


> patch for command line setquota 
> 
>
> Key: ZOOKEEPER-493
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-493
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.2.0
>Reporter: steve bendiola
>Assignee: steve bendiola
>Priority: Minor
> Fix For: 3.2.1, 3.3.0
>
> Attachments: quotafix.patch, ZOOKEEPER-493.patch
>
>
> the command line "setquota" tries to use argument 3 as both a path and a value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-491) Prevent zero-weight servers from being elected

2009-08-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739426#action_12739426
 ] 

Hudson commented on ZOOKEEPER-491:
--

Integrated in ZooKeeper-trunk #405 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/])
. Prevent zero-weight servers from being elected. (flavio via mahadev)


> Prevent zero-weight servers from being elected
> --
>
> Key: ZOOKEEPER-491
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-491
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: leaderElection
>Affects Versions: 3.2.0
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-491-3.2branch.patch, ZOOKEEPER-491.patch
>
>
> This is a fix to prevent zero-weight servers from being elected leaders. This 
> will allow in wide-area scenarios to restrict the set of servers that can 
> lead the ensemble.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.