[jira] [Created] (ZOOKEEPER-3922) Add support for two server ZooKeeper with hardware oracle

2020-08-27 Thread Benjamin Reed (Jira)
Benjamin Reed created ZOOKEEPER-3922:


 Summary: Add support for two server ZooKeeper with hardware oracle
 Key: ZOOKEEPER-3922
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3922
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Benjamin Reed


Currently, we cannot really have ZooKeeper ensembles of size less than 3 and 
still tolerate failures. However, with hardware support for failure detection, 
we could support a 2 server ensemble and still tolerate the failure of one 
machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3108) use a new property server.id in the zoo.cfg to substitute for myid file

2018-09-14 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615566#comment-16615566
 ] 

Benjamin Reed commented on ZOOKEEPER-3108:
--

the reason we kept myid out of the config is so that all the servers can use 
the same configuration file. the id would then be tied with the data.

the id of the server should be a rather permanent thing. for example, if you 
have an ensemble that has ids host1=1, host2=2, host3=3 and an observer with id 
of host4=4, today to make host3 an observer and host4 a participant you have to 
go through reconfiguration. with this id configuration option it is tempting to 
just change the ids (host4=3 and host3=4). this can result in data loss or 
corruption.

it's not a show stopper, but we do need to document it properly: even though 
the id can be set via the configuration file, it should be considered bound to 
the data directory.

> use a new property server.id in the zoo.cfg to substitute for myid file
> ---
>
> Key: ZOOKEEPER-3108
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3108
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: maoling
>Assignee: maoling
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When use zk in distributional model,we need to touch a myid file in 
> dataDir.then write a unique number to it.It is inconvenient and not 
> user-friendly,Look at an example from other distribution system such as 
> kafka:it just uses broker.id=0 in the server.properties to indentify a unique 
> server node.This issue is going to abandon the myid file and use a new 
> property such as server.id=0 in the zoo.cfg. this fix will be applied to 
> master branch,branch-3.5+,
> keep branch-3.4 unchaged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3104) Potential data inconsistency due to NEWLEADER packet being sent too early during SNAP sync

2018-08-03 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3104.
--
   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 583
[https://github.com/apache/zookeeper/pull/583]

> Potential data inconsistency due to NEWLEADER packet being sent too early 
> during SNAP sync
> --
>
> Key: ZOOKEEPER-3104
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3104
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, in SNAP sync, the leader will start queuing the proposal/commits 
> and the NEWLEADER packet before sending over the snapshot over wire. So it's 
> possible that the zxid associated with the snapshot might be higher than all 
> the packets queued before NEWLEADER.
>  
> When the follower received the snapshot, it will apply all the txns queued 
> before NEWLEADER, which may not cover all the txns up to the zxid in the 
> snapshot. After that, it will write the snapshot out to disk with the zxid 
> associated with the snapshot. In case the server crashed after writing this 
> out, when loading the data from disk, it will use zxid of the snapshot file 
> to sync with leader, and it could cause data inconsistent, because we only 
> replayed partial of the historical data during previous syncing.
>  
> NEWLEADER packet means the learner now has the correct and almost up to data 
> state as leader, so it makes more sense to move the NEWLEADER packet after 
> sending over snapshot, and this is what we did in the fix.
>  
> Besides this, the socket timeout is changed to use smaller sync timeout after 
> received NEWLEADER ack, in high write traffic ensembles with large snapshot, 
> the follower might be timed out by leader before finishing sending over those 
> queued txns after writing snapshot out, which could cause the follower 
> staying in syncing state forever. Move the NEWLEADER packet after sending 
> over snapshot can avoid this issue as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3036) Unexpected exception in zookeeper

2018-07-30 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561968#comment-16561968
 ] 

Benjamin Reed commented on ZOOKEEPER-3036:
--

can you give a bit more detail as to what happened? there were 3 servers. it 
sounds like one of the followers failed right? the leader should keep working 
with the other follower alive. did the leader actually shutdown as well?

> Unexpected exception in zookeeper
> -
>
> Key: ZOOKEEPER-3036
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.4.10
> Environment: 3 Zookeepers, 5 kafka servers
>Reporter: Oded
>Priority: Critical
>
> We got an issue with one of the zookeeprs (Leader), causing the entire kafka 
> cluster to fail:
> 2018-05-09 02:29:01,730 [myid:3] - ERROR 
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected 
> exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>     at java.net.SocketInputStream.socketRead0(Native Method)
>     at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>     at java.net.SocketInputStream.read(SocketInputStream.java:171)
>     at java.net.SocketInputStream.read(SocketInputStream.java:141)
>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>     at java.io.DataInputStream.readInt(DataInputStream.java:387)
>     at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>     at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>     at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>     at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
> 2018-05-09 02:29:01,730 [myid:3] - WARN  
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - *** GOODBYE 
> /192.168.0.91:42490 
>  
> We would expect that zookeeper will choose another Leader and the Kafka 
> cluster will continue to work as expected, but that was not the case.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3095) Connect string fix for non-existent hosts

2018-07-27 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3095.
--
Resolution: Fixed

Issue resolved by pull request 579
[https://github.com/apache/zookeeper/pull/579]

> Connect string fix for non-existent hosts
> -
>
> Key: ZOOKEEPER-3095
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3095
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: other
>Affects Versions: 3.4.0
>Reporter: Mohamed Jeelani
>Assignee: Mohamed Jeelani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Connect string fix for non-existent hosts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3072) Race condition in throttling

2018-07-27 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3072.
--
Resolution: Fixed

> Race condition in throttling
> 
>
> Key: ZOOKEEPER-3072
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3072
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4
>Reporter: Botond Hejj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.4
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There is a race condition in the server throttling code. It is possible that 
> the disableRecv is called after enableRecv.
> Basically, the I/O work thread does this in processPacket: 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102]
>  
>     submitRequest(si);
>     }
>     }
>     cnxn.incrOutstandingRequests(h);
>     }
>  
> incrOutstandingRequests() checks for limit breach, and potentially turns on 
> throttling, 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384]
>  
> submitRequest() will create a logical request and en-queue it so that 
> Processor thread can pick it up. After being de-queued by Processor thread, 
> it does necessary handling, and then calls this 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459]
>  :
>  
>     cnxn.sendResponse(hdr, rsp, "response");
>  
> and in sendResponse(), it first appends to outgoing buffer, and then checks 
> if un-throttle is needed:  
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708]
>  
> However, if there is a context switch between submitRequest() and 
> cnxn.incrOutstandingRequests(), so that Processor thread completes 
> cnxn.sendResponse() call before I/O thread switches back, then enableRecv() 
> will happen before disableRecv(), and enableRecv() will fail the CAS ops, 
> while disableRecv() will succeed, resulting in a deadlock: un-throttle is 
> needed for letting in requests, and sendResponse is needed to trigger 
> un-throttle, but sendResponse() requires an incoming message. From that point 
> on, ZK server will no longer select the affected client socket for read, 
> leading to the observed client-side failure in the subject.
> If you would like to reproduce this than setting the globalOutstandingLimit 
> down to 1 makes this reproducible easier as throttling starts with less 
> requests. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3072) Race condition in throttling

2018-07-27 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-3072:
-
Fix Version/s: 3.5.4
   3.6.0

> Race condition in throttling
> 
>
> Key: ZOOKEEPER-3072
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3072
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4
>Reporter: Botond Hejj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.4, 3.6.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There is a race condition in the server throttling code. It is possible that 
> the disableRecv is called after enableRecv.
> Basically, the I/O work thread does this in processPacket: 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102]
>  
>     submitRequest(si);
>     }
>     }
>     cnxn.incrOutstandingRequests(h);
>     }
>  
> incrOutstandingRequests() checks for limit breach, and potentially turns on 
> throttling, 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384]
>  
> submitRequest() will create a logical request and en-queue it so that 
> Processor thread can pick it up. After being de-queued by Processor thread, 
> it does necessary handling, and then calls this 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459]
>  :
>  
>     cnxn.sendResponse(hdr, rsp, "response");
>  
> and in sendResponse(), it first appends to outgoing buffer, and then checks 
> if un-throttle is needed:  
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708]
>  
> However, if there is a context switch between submitRequest() and 
> cnxn.incrOutstandingRequests(), so that Processor thread completes 
> cnxn.sendResponse() call before I/O thread switches back, then enableRecv() 
> will happen before disableRecv(), and enableRecv() will fail the CAS ops, 
> while disableRecv() will succeed, resulting in a deadlock: un-throttle is 
> needed for letting in requests, and sendResponse is needed to trigger 
> un-throttle, but sendResponse() requires an incoming message. From that point 
> on, ZK server will no longer select the affected client socket for read, 
> leading to the observed client-side failure in the subject.
> If you would like to reproduce this than setting the globalOutstandingLimit 
> down to 1 makes this reproducible easier as throttling starts with less 
> requests. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3061) add more details to 'Unhandled scenario for peer' log.warn message

2018-07-27 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3061.
--
   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 555
[https://github.com/apache/zookeeper/pull/555]

> add more details to 'Unhandled scenario for peer' log.warn message
> --
>
> Key: ZOOKEEPER-3061
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3061
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Christine Poerschke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-3061.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> A few lines earlier the {{LOG.info("Synchronizing with Follower sid: ...}} 
> logging already contains most relevant details but it would be convenient to 
> more directly have full details in the {{LOG.warn("Unhandled scenario for 
> peer sid: ...}} itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3073) fix couple of typos

2018-07-10 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3073.
--
Resolution: Fixed

> fix couple of typos
> ---
>
> Key: ZOOKEEPER-3073
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3073
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Christine Poerschke
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Saw a number of open pull requests concerning typos but without associated 
> JIRA ticket and so here taking the opportunity to gather them up (where not 
> already otherwise taken care of) plus couple of additions I noticed whilst my 
> other code was doing its compiling-and-testing thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3078) Remove unused print_completion_queue function

2018-07-10 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3078.
--
Resolution: Fixed

> Remove unused print_completion_queue function
> -
>
> Key: ZOOKEEPER-3078
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3078
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.5.4
>Reporter: Kent R. Spillner
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The function print_completion_queue in zookeeper.c causes compilation errors 
> with GCC 8.  However, this function is unused and can safely be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3079) Fix unsafe use of sprintf(3) for creating IP address strings

2018-07-10 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-3079.
--
Resolution: Fixed

> Fix unsafe use of sprintf(3) for creating IP address strings
> 
>
> Key: ZOOKEEPER-3079
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3079
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.5.4
>Reporter: Kent R. Spillner
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The function format_endpoint_info in zookeeper.c causes compiler errors when 
> building with GCC 8 due to a potentially unsafe use of sprintf(3).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-2886) Permanent session moved error in multi-op only connections

2018-07-10 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-2886.
--
Resolution: Fixed

> Permanent session moved error in multi-op only connections
> --
>
> Key: ZOOKEEPER-2886
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2886
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If there are slow followers, it's possible that the leader and the client 
> disagree on where the client is connecting to, therefore the client keeps 
> getting "Session Moved" error. Partial of the issue fixed in Jira: 
> ZOOKEEPER-710, but leaves the issue in multi-op only connection. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3056) Fails to load database with missing snapshot file but valid transaction log file

2018-06-11 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508441#comment-16508441
 ] 

Benjamin Reed commented on ZOOKEEPER-3056:
--

how are you getting in a state where you have log file but no snapshot? is it 
that a machine starts up with no data and then diff syncs with the leader? or 
is there another case that i'm missing.

trying to use a txn log with no base snapshot seems frought with danger.

 

> Fails to load database with missing snapshot file but valid transaction log 
> file
> 
>
> Key: ZOOKEEPER-3056
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3056
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.5.4
>Reporter: Michael Han
>Priority: Critical
>
> [An 
> issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f14721a93b4b@%3Cdev.zookeeper.apache.org%3E]
>  was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing 
> snapshot file.
> The code complains about missing snapshot file is 
> [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206]
>  which is introduced as part of ZOOKEEPER-2325.
> With this check, ZK will not load the db without a snapshot file, even the 
> transaction log files are present and valid. This could be a problem for 
> restoring a ZK instance which does not have a snapshot file but have a sound 
> state (e.g. it crashes before being able to take the first snap shot with a 
> large snapCount parameter configured).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-2772) Delete node command does not honor Acl policy

2017-05-17 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-2772.
--
Resolution: Not A Bug

> Delete node command does not honor Acl policy
> -
>
> Key: ZOOKEEPER-2772
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2772
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.4.8, 3.4.10
>Reporter: joe smith
>
> I set the acl to not be able to delete a node - but was able to delete 
> regardless.
> I am not familiar with the code, but a reply from Martin in the user@ mailing 
> list seems to confirm the issue.  I will paste his response below - sorry for 
> the long listing.
> Martin's reply are inline prefixed with: MG>
> --
> From: joe smith 
> Sent: Tuesday, May 2, 2017 8:40 AM
> To: u...@zookeeper.apache.org
> Subject: Acl block detete not working
> Hi,
> I'm using 3.4.10 and setting custom aol to block deletion of a znode.  
> However, I'm able to delete the node even after I've set acl from cdrwa to 
> cra.
> Can anyone point out if I missed some step.
> Thanks for the help
> Here is the trace:
> [zk: localhost:2181(CONNECTED) 0] ls /
> [zookeeper]
> [zk: localhost:2181(CONNECTED) 1] create /test "data"
> Created /test
> [zk: localhost:2181(CONNECTED) 2] ls /
> [zookeeper, test]
> [zk: localhost:2181(CONNECTED) 3] addauth myfqdn localhost
> [zk: localhost:2181(CONNECTED) 4] setAcl /test myfqdn:localhost:cra
> cZxid = 0x2
> ctime = Tue May 02 08:28:42 EDT 2017
> mZxid = 0x2
> mtime = Tue May 02 08:28:42 EDT 2017
> pZxid = 0x2
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 4
> numChildren = 0
> MG>in SetAclCommand you can see the acl being parsed and acl being set by 
> setAcl into zk object
> List acl = AclParser.parse(aclStr);
> int version;
> if (cl.hasOption("v")) {
> version = Integer.parseInt(cl.getOptionValue("v"));
> } else {
> version = -1;
> }
> try {
> Stat stat = zk.setACL(path, acl, version);
> MG>later on in DeleteCommand there is no check for aforementioned acl 
> parameter
>   public boolean exec() throws KeeperException, InterruptedException {
> String path = args[1];
> int version;
> if (cl.hasOption("v")) {
> version = Integer.parseInt(cl.getOptionValue("v"));
> } else {
> version = -1;
> }
> try {
> zk.delete(path, version);
> } catch(KeeperException.BadVersionException ex) {
> err.println(ex.getMessage());
> }
> return false;
> MG>as seen here the testCase works properly saving the Zookeeper object
> LsCommand entity = new LsCommand();
> entity.setZk(zk);
> MG>but setACL does not save the zookeeper object anywhere but instead seems 
> to discard zookeeper object with accompanying ACLs
> MG>can you report this bug to Zookeeper?
> https://issues.apache.org/jira/browse/ZOOKEEPER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
> ZooKeeper - ASF JIRA - 
> issues.apache.org
> issues.apache.org
> Apache ZooKeeper is a service for coordinating processes of distributed 
> applications. Versions: Unreleased. Name Release date; Unreleased 3.2.3 : 
> Unreleased 3.3.7
> MG>Thanks Joe!
> [zk: localhost:2181(CONNECTED) 5] getAcl /test
> 'myfqdn,'localhost
> : cra
> [zk: localhost:2181(CONNECTED) 6] get /testdata
> cZxid = 0x2
> ctime = Tue May 02 08:28:42 EDT 2017
> mZxid = 0x2
> mtime = Tue May 02 08:28:42 EDT 2017
> pZxid = 0x2
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 4
> numChildren = 0
> [zk: localhost:2181(CONNECTED) 7] set /test "testwrite"
> Authentication is not valid : /test
> [zk: localhost:2181(CONNECTED) 8] delete /test
> [zk: localhost:2181(CONNECTED) 9] ls /
> [zookeeper]
> [zk: localhost:2181(CONNECTED) 10]
> The auth provider imple is here: 
> http://s000.tinyupload.com/?file_id=42827186839577179157



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2772) Delete node command does not honor Acl policy

2017-05-13 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009215#comment-16009215
 ] 

Benjamin Reed commented on ZOOKEEPER-2772:
--

this appears to be a misunderstanding of what the DELETE acl protects. CREATE 
and DELETE are about restricting operations on children of the znode, not the 
znode itself.

> Delete node command does not honor Acl policy
> -
>
> Key: ZOOKEEPER-2772
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2772
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.4.8, 3.4.10
>Reporter: joe smith
>
> I set the acl to not be able to delete a node - but was able to delete 
> regardless.
> I am not familiar with the code, but a reply from Martin in the user@ mailing 
> list seems to confirm the issue.  I will paste his response below - sorry for 
> the long listing.
> Martin's reply are inline prefixed with: MG>
> --
> From: joe smith 
> Sent: Tuesday, May 2, 2017 8:40 AM
> To: u...@zookeeper.apache.org
> Subject: Acl block detete not working
> Hi,
> I'm using 3.4.10 and setting custom aol to block deletion of a znode.  
> However, I'm able to delete the node even after I've set acl from cdrwa to 
> cra.
> Can anyone point out if I missed some step.
> Thanks for the help
> Here is the trace:
> [zk: localhost:2181(CONNECTED) 0] ls /
> [zookeeper]
> [zk: localhost:2181(CONNECTED) 1] create /test "data"
> Created /test
> [zk: localhost:2181(CONNECTED) 2] ls /
> [zookeeper, test]
> [zk: localhost:2181(CONNECTED) 3] addauth myfqdn localhost
> [zk: localhost:2181(CONNECTED) 4] setAcl /test myfqdn:localhost:cra
> cZxid = 0x2
> ctime = Tue May 02 08:28:42 EDT 2017
> mZxid = 0x2
> mtime = Tue May 02 08:28:42 EDT 2017
> pZxid = 0x2
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 4
> numChildren = 0
> MG>in SetAclCommand you can see the acl being parsed and acl being set by 
> setAcl into zk object
> List acl = AclParser.parse(aclStr);
> int version;
> if (cl.hasOption("v")) {
> version = Integer.parseInt(cl.getOptionValue("v"));
> } else {
> version = -1;
> }
> try {
> Stat stat = zk.setACL(path, acl, version);
> MG>later on in DeleteCommand there is no check for aforementioned acl 
> parameter
>   public boolean exec() throws KeeperException, InterruptedException {
> String path = args[1];
> int version;
> if (cl.hasOption("v")) {
> version = Integer.parseInt(cl.getOptionValue("v"));
> } else {
> version = -1;
> }
> try {
> zk.delete(path, version);
> } catch(KeeperException.BadVersionException ex) {
> err.println(ex.getMessage());
> }
> return false;
> MG>as seen here the testCase works properly saving the Zookeeper object
> LsCommand entity = new LsCommand();
> entity.setZk(zk);
> MG>but setACL does not save the zookeeper object anywhere but instead seems 
> to discard zookeeper object with accompanying ACLs
> MG>can you report this bug to Zookeeper?
> https://issues.apache.org/jira/browse/ZOOKEEPER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
> ZooKeeper - ASF JIRA - 
> issues.apache.org
> issues.apache.org
> Apache ZooKeeper is a service for coordinating processes of distributed 
> applications. Versions: Unreleased. Name Release date; Unreleased 3.2.3 : 
> Unreleased 3.3.7
> MG>Thanks Joe!
> [zk: localhost:2181(CONNECTED) 5] getAcl /test
> 'myfqdn,'localhost
> : cra
> [zk: localhost:2181(CONNECTED) 6] get /testdata
> cZxid = 0x2
> ctime = Tue May 02 08:28:42 EDT 2017
> mZxid = 0x2
> mtime = Tue May 02 08:28:42 EDT 2017
> pZxid = 0x2
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 4
> numChildren = 0
> [zk: localhost:2181(CONNECTED) 7] set /test "testwrite"
> Authentication is not valid : /test
> [zk: localhost:2181(CONNECTED) 8] delete /test
> [zk: localhost:2181(CONNECTED) 9] ls /
> [zookeeper]
> [zk: localhost:2181(CONNECTED) 10]
> The auth provider imple is here: 
> http://s000.tinyupload.com/?file_id=42827186839577179157



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2748) Admin command to voluntarily drop client connections

2017-04-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964730#comment-15964730
 ] 

Benjamin Reed commented on ZOOKEEPER-2748:
--

we do this using the JMX interface. i think that is better since you avoid 
security issues. well at least you push the security issues to JMX

> Admin command to voluntarily drop client connections
> 
>
> Key: ZOOKEEPER-2748
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2748
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: server
>Reporter: Marco P.
>Assignee: Marco P.
>Priority: Minor
>
> In certain circumstances, it would be useful to be able to move clients from 
> one server to another.
> One example: a quorum that consists of 3 servers (A,B,C) with 1000 active 
> client session, where 900 clients are connected to server A, and the 
> remaining 100 are split over B and C (see example below for an example of how 
> this can happen).
> A will do a lot more work than B, C. 
> Overall throughput will benefit by having the clients more evenly divided.
> In case of A failure, all its client will create an avalanche by migrating en 
> masse to a different server.
> There are other possible use cases for a mechanism to move clients: 
>  - Migrate away all clients before a server restart
>  - Migrate away part of clients in response to runtime metrics (CPU/Memory 
> usage, ...)
>  - Shuffle clients after adding more server capacity (i.e. adding Observer 
> nodes)
> The simplest form of rebalancing which does not require major changes of 
> protocol or client code consists of requesting a server to voluntarily drop 
> some number of connections.
> Clients should be able to transparently move to a different server.
> Patch introducing 4-letter commands to shed clients:
> https://github.com/apache/zookeeper/pull/215
> -- -- --
> How client imbalance happens in the first place, an example.
> Imagine servers A, B, C and 1000 clients connected.
> Initially clients are spread evenly (i.e. 333 clients per server).
> A: 333 (restarts: 0)
> B: 333 (restarts: 0)
> C: 334 (restarts: 0)
> Now restart servers a few times, always in A, B, C order (e.g. to pick up a 
> software upgrades or configuration changes).
> Restart A:
> A: 0 (restarts: 1)
> B: 499 (restarts: 0)
> C: 500 (restarts: 0)
> Restart B:
> A: 250 (restarts: 1)
> B: 0 (restarts: 1)
> C: 750 (restarts: 0)
> Restart C:
> A: 625 (restarts: 1)
> B: 375 (restarts: 1)
> C: 0 (restarts: 1)
> The imbalance is pretty bad already. C is idle while A has a lot of work.
> A second round of restarts makes the situation even worse:
> Restart A:
> A: 0 (restarts: 2)
> B: 688 (restarts: 1)
> C: 313 (restarts: 1)
> Restart B:
> A: 344 (restarts: 2)
> B: 657 (restarts: 1)
> C: 0 (restarts: 1)
> Restart C:
> A: 673 (restarts: 2)
> B: 328 (restarts: 1)
> C: 0 (restarts: 1)
> Large cluster (5, 7, 9 servers) make the imbalance even more evident.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2693) DOS attack on wchp/wchc four letter words (4lw)

2017-03-16 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929091#comment-15929091
 ] 

Benjamin Reed commented on ZOOKEEPER-2693:
--

can someone put a good link to the exploit in the description? a cache isn't an 
appropriate link to use.

> DOS attack on wchp/wchc four letter words (4lw)
> ---
>
> Key: ZOOKEEPER-2693
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2693
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: security, server
>Affects Versions: 3.4.0, 3.5.1, 3.5.2
>Reporter: Patrick Hunt
>Assignee: Michael Han
>Priority: Blocker
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2693-01.patch
>
>
> The wchp/wchc four letter words can be exploited in a DOS attack on the ZK 
> client port - typically 2181. The following POC attack was recently published 
> on the web:
> https://webcache.googleusercontent.com/search?q=cache:_CNGIz10PRYJ:https://www.exploit-db.com/exploits/41277/+=14=en=clnk=us
> The most straightforward way to block this attack is to not allow access to 
> the client port to non-trusted clients - i.e. firewall the ZooKeeper service 
> and only allow access to trusted applications using it for coordination.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2017-02-19 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873747#comment-15873747
 ] 

Benjamin Reed commented on ZOOKEEPER-2184:
--

another option would be to have a  background worker that periodically wakes up 
and re-resolves hosts every few minutes. if we ever get a connection failure we 
could use that to kick the background worker to run right away.

> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.5.0
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Flavio Junqueira
>  Labels: easyfix, patch
> Fix For: 3.5.3, 3.4.11
>
> Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (ZOOKEEPER-27) Unique DB identifiers for servers and clients

2017-02-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed reassigned ZOOKEEPER-27:
--

 Assignee: (was: Mahadev konar)
Fix Version/s: (was: 3.0.0)
   3.6.0

> Unique DB identifiers for servers and clients
> -
>
> Key: ZOOKEEPER-27
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-27
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, server
>Reporter: Patrick Hunt
> Fix For: 3.6.0
>
>
> Moved from SourceForge to Apache.
> http://sourceforge.net/tracker/index.php?func=detail=1937075_id=209147=1008547
> here is the text from sourceforge:
> There should be a persistent unique identifier for an instance of ZooKeeper. 
> Currently, if you bring a cluster down without stopping clients and 
> reinitialize the servers, the servers will start logging client zxid errors 
> because the clients have seen a later transaction than the server has. In 
> reality the clients should detect that they are now talking to a new instance 
> of the database and close the session.
> A similar problem occurs when a server fails in a cluster of three machines, 
> and the other two machines are reinitialized and restarted. If the failed 
> machine starts up again, there is a chance that the old machine may get 
> elected leader (since it will have the highest zxid) and overwrite new data.
> A unique random id should probably get generated when a new cluster comes up. 
> (It is easy to detect since the zxid will be zero.) Leader Election and the 
> Leader should validate that the peers have the same database id. Clients 
> should also validate that they are talking to servers with the same database 
> id during a session.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-27) Unique DB identifiers for servers and clients

2017-02-06 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854717#comment-15854717
 ] 

Benjamin Reed commented on ZOOKEEPER-27:


had the joy of running into this problem today. this issue was prematurely 
closed.

> Unique DB identifiers for servers and clients
> -
>
> Key: ZOOKEEPER-27
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-27
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, server
>Reporter: Patrick Hunt
>Assignee: Mahadev konar
> Fix For: 3.0.0
>
>
> Moved from SourceForge to Apache.
> http://sourceforge.net/tracker/index.php?func=detail=1937075_id=209147=1008547
> here is the text from sourceforge:
> There should be a persistent unique identifier for an instance of ZooKeeper. 
> Currently, if you bring a cluster down without stopping clients and 
> reinitialize the servers, the servers will start logging client zxid errors 
> because the clients have seen a later transaction than the server has. In 
> reality the clients should detect that they are now talking to a new instance 
> of the database and close the session.
> A similar problem occurs when a server fails in a cluster of three machines, 
> and the other two machines are reinitialized and restarted. If the failed 
> machine starts up again, there is a chance that the old machine may get 
> elected leader (since it will have the highest zxid) and overwrite new data.
> A unique random id should probably get generated when a new cluster comes up. 
> (It is easy to detect since the zxid will be zero.) Leader Election and the 
> Leader should validate that the peers have the same database id. Clients 
> should also validate that they are talking to servers with the same database 
> id during a session.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ZOOKEEPER-27) Unique DB identifiers for servers and clients

2017-02-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-27:
---
Description: 
Moved from SourceForge to Apache.
http://sourceforge.net/tracker/index.php?func=detail=1937075_id=209147=1008547

here is the text from sourceforge:

There should be a persistent unique identifier for an instance of ZooKeeper. 
Currently, if you bring a cluster down without stopping clients and 
reinitialize the servers, the servers will start logging client zxid errors 
because the clients have seen a later transaction than the server has. In 
reality the clients should detect that they are now talking to a new instance 
of the database and close the session.

A similar problem occurs when a server fails in a cluster of three machines, 
and the other two machines are reinitialized and restarted. If the failed 
machine starts up again, there is a chance that the old machine may get elected 
leader (since it will have the highest zxid) and overwrite new data.

A unique random id should probably get generated when a new cluster comes up. 
(It is easy to detect since the zxid will be zero.) Leader Election and the 
Leader should validate that the peers have the same database id. Clients should 
also validate that they are talking to servers with the same database id during 
a session.

  was:
Moved from SourceForge to Apache.
http://sourceforge.net/tracker/index.php?func=detail=1937075_id=209147=1008547


> Unique DB identifiers for servers and clients
> -
>
> Key: ZOOKEEPER-27
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-27
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, server
>Reporter: Patrick Hunt
>Assignee: Mahadev konar
> Fix For: 3.0.0
>
>
> Moved from SourceForge to Apache.
> http://sourceforge.net/tracker/index.php?func=detail=1937075_id=209147=1008547
> here is the text from sourceforge:
> There should be a persistent unique identifier for an instance of ZooKeeper. 
> Currently, if you bring a cluster down without stopping clients and 
> reinitialize the servers, the servers will start logging client zxid errors 
> because the clients have seen a later transaction than the server has. In 
> reality the clients should detect that they are now talking to a new instance 
> of the database and close the session.
> A similar problem occurs when a server fails in a cluster of three machines, 
> and the other two machines are reinitialized and restarted. If the failed 
> machine starts up again, there is a chance that the old machine may get 
> elected leader (since it will have the highest zxid) and overwrite new data.
> A unique random id should probably get generated when a new cluster comes up. 
> (It is easy to detect since the zxid will be zero.) Leader Election and the 
> Leader should validate that the peers have the same database id. Clients 
> should also validate that they are talking to servers with the same database 
> id during a session.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election

2017-01-13 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-261.
-
Resolution: Fixed

committed to master

> Reinitialized servers should not participate in leader election
> ---
>
> Key: ZOOKEEPER-261
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection, quorum
>Reporter: Benjamin Reed
>
> A server that has lost its data should not participate in leader election 
> until it has resynced with a leader. Our leader election algorithm and 
> NEW_LEADER commit assumes that the followers voting on a leader have not lost 
> any of their data. We should have a flag in the data directory saying whether 
> or not the data is preserved so that the the flag will be cleared if the data 
> is ever cleared.
> Here is the problematic scenario: you have have ensemble of machines A, B, 
> and C. C is down. the last transaction seen by C is z. a transaction, z+1, is 
> committed on A and B. Now there is a power outage. B's data gets 
> reinitialized. when power comes back up, B and C comes up, but A does not. C 
> will be elected leader and transaction z+1 is lost. (note, this can happen 
> even if all three machines are up and C just responds quickly. in that case C 
> would tell A to truncate z+1 from its log.) in theory we haven't violated our 
> 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, 
> but it would be nice if when we don't have quorum that system stops working 
> rather than works incorrectly if we lose quorum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-15 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752196#comment-15752196
 ] 

Benjamin Reed commented on ZOOKEEPER-2325:
--

[~rgs] can you commit this? we need it to get ZOOKEEPER-261 in.

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2640) fix test coverage for single threaded C-API

2016-12-04 Thread Benjamin Reed (JIRA)
Benjamin Reed created ZOOKEEPER-2640:


 Summary: fix test coverage for single threaded C-API
 Key: ZOOKEEPER-2640
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2640
 Project: ZooKeeper
  Issue Type: Test
  Components: c client, tests
Reporter: Benjamin Reed


the tests for the C-API are mostly for the multithreaded API. we need to get 
better coverage for the single threaded API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-11-29 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706785#comment-15706785
 ] 

Benjamin Reed commented on ZOOKEEPER-2325:
--

hey andrew, i've merged all the patches into a pull request. can you take a 
look and make sure everything looks ok?

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2014) Only admin should be allowed to reconfig a cluster

2016-11-07 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645962#comment-15645962
 ] 

Benjamin Reed commented on ZOOKEEPER-2014:
--

shall i commit it or are we waiting on something else?


> Only admin should be allowed to reconfig a cluster
> --
>
> Key: ZOOKEEPER-2014
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2014
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Raul Gutierrez Segales
>Assignee: Michael Han
>Priority: Blocker
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch
>
>
> ZOOKEEPER-107 introduces reconfiguration support via the reconfig() call. We 
> should, at the very least, ensure that only the Admin can reconfigure a 
> cluster. Perhaps restricting access to /zookeeper/config as well, though this 
> is debatable. Surely one could ensure Admin only access via an ACL, but that 
> would leave everyone who doesn't use ACLs unprotected. We could also force a 
> default ACL to make it a bit more consistent (maybe).
> Finally, making reconfig() only available to Admins means they have to run 
> with zookeeper.DigestAuthenticationProvider.superDigest (which I am not sure 
> if everyone does, or how would it work with other authentication providers). 
> Review board https://reviews.apache.org/r/51546/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2014) Only admin should be allowed to reconfig a cluster

2016-11-03 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635265#comment-15635265
 ] 

Benjamin Reed commented on ZOOKEEPER-2014:
--

ah that makes sense. i didn't dig deep enough :) it is sad that an exception 
"that should never happen" has such a big impact on the code. shouldn't we have 
thrown a runtime exception? i think it would have eliminated a lot of this 
patch...

this is just an observation not a vote :)

> Only admin should be allowed to reconfig a cluster
> --
>
> Key: ZOOKEEPER-2014
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2014
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Raul Gutierrez Segales
>Assignee: Michael Han
>Priority: Blocker
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch
>
>
> ZOOKEEPER-107 introduces reconfiguration support via the reconfig() call. We 
> should, at the very least, ensure that only the Admin can reconfigure a 
> cluster. Perhaps restricting access to /zookeeper/config as well, though this 
> is debatable. Surely one could ensure Admin only access via an ACL, but that 
> would leave everyone who doesn't use ACLs unprotected. We could also force a 
> default ACL to make it a bit more consistent (maybe).
> Finally, making reconfig() only available to Admins means they have to run 
> with zookeeper.DigestAuthenticationProvider.superDigest (which I am not sure 
> if everyone does, or how would it work with other authentication providers). 
> Review board https://reviews.apache.org/r/51546/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2014) Only admin should be allowed to reconfig a cluster

2016-11-03 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634422#comment-15634422
 ] 

Benjamin Reed commented on ZOOKEEPER-2014:
--

just a curious observer: why are we propagating the NoNodeException everywhere? 
i wasn't clear from the patch why that suddenly popped up as part of the change.

> Only admin should be allowed to reconfig a cluster
> --
>
> Key: ZOOKEEPER-2014
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2014
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Raul Gutierrez Segales
>Assignee: Michael Han
>Priority: Blocker
> Fix For: 3.5.3
>
> Attachments: ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch, 
> ZOOKEEPER-2014.patch, ZOOKEEPER-2014.patch
>
>
> ZOOKEEPER-107 introduces reconfiguration support via the reconfig() call. We 
> should, at the very least, ensure that only the Admin can reconfigure a 
> cluster. Perhaps restricting access to /zookeeper/config as well, though this 
> is debatable. Surely one could ensure Admin only access via an ACL, but that 
> would leave everyone who doesn't use ACLs unprotected. We could also force a 
> default ACL to make it a bit more consistent (maybe).
> Finally, making reconfig() only available to Admins means they have to run 
> with zookeeper.DigestAuthenticationProvider.superDigest (which I am not sure 
> if everyone does, or how would it work with other authentication providers). 
> Review board https://reviews.apache.org/r/51546/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2619) Client library reconnecting breaks FIFO client order

2016-11-03 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632820#comment-15632820
 ] 

Benjamin Reed commented on ZOOKEEPER-2619:
--

i would love to see ZOOKEEPER-22 fixed, but i don't think it will be fixed 
anytime soon. (it would be awesome to be surprised though :)

@diego perhaps you could implement your idea in your go client implementation 
and propose it again if it works out well? i like the getConnection proposal.

> Client library reconnecting breaks FIFO client order
> 
>
> Key: ZOOKEEPER-2619
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010 
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
>  ZooKeeper provides "FIFO client order: all requests from a given client are 
> executed in the order that they were sent by the client." I believe 
> applications written using the Java client library are unable to rely on this 
> guarantee, and any current application that does so is broken. Other client 
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described 
> on Page 4 (right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. 
> Applying the above definition, if the ZooKeeper database contains /pointer, 
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection 
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following 
> three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined 
> operations, then upon noticing a connection loss, issues a series of recovery 
> operations, repeating the recovery procedure as necessary. If a pipelined 
> operation fails, all subsequent operations in the pipeline must also fail. 
> Yet the client must also carry on eventually: the recovery operations cannot 
> be trivially failed forever. Unfortunately, the client library does not know 
> where the pipelined operations end and the recovery operations begin. At the 
> time of a connection loss, subsequent pipelined operations may or may not be 
> queued in the library; others might be upcoming in the application thread. If 
> the library re-establishes a connection too early, it will send pipelined 
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its 
> callbacks and watchers for connection loss events, and do its best to stop 
> the subsequent pipelined operations at the first sign of a connection loss. 
> In addition to being a large burden for the application, this does not solve 
> the problem all the time. In particular, if the callback thread is delayed 
> significantly (as can happen due to excessive computation or scheduling 
> hiccups), the application may not learn about the connection loss event until 
> after the connection has been re-established and after dependent pipelined 
> operations have already been transmitted over the new connection.
> I suggest the following API changes to fix the problem:
>  - Add a method ZooKeeper.getConnection() returning a ZKConnection object. 
> ZKConnection would wrap a TCP connection. It would include all synchronous 
> and asynchronous operations currently defined on the ZooKeeper class. Upon a 
> connection loss on a ZKConnection, all subsequent operations on the same 
> ZKConnection would return a Connection Loss error. Upon noticing, the client 
> would need to call ZooKeeper.getConnection() again to get a working 
> ZKConnection object, and it would execute its recovery procedure on this new 
> connection.
>  - Deprecate all asynchronous methods on the ZooKeeper object. These are 
> unsafe to use if the caller 

[jira] [Commented] (ZOOKEEPER-2619) Client library reconnecting breaks FIFO client order

2016-11-03 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632811#comment-15632811
 ] 

Benjamin Reed commented on ZOOKEEPER-2619:
--

multi will handle some of the use cases, but a simple one that it doesn't 
handle is if you want to implement swap:

zk.getData(znode, ...)
zk.setData(znode, ...)

you can't do that with multi (and i don't think we should extend multi to do it 
:)

mutli also doesn't handle the case when you are updating lots of data and would 
go over max packet size.



> Client library reconnecting breaks FIFO client order
> 
>
> Key: ZOOKEEPER-2619
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010 
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
>  ZooKeeper provides "FIFO client order: all requests from a given client are 
> executed in the order that they were sent by the client." I believe 
> applications written using the Java client library are unable to rely on this 
> guarantee, and any current application that does so is broken. Other client 
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described 
> on Page 4 (right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. 
> Applying the above definition, if the ZooKeeper database contains /pointer, 
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection 
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following 
> three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined 
> operations, then upon noticing a connection loss, issues a series of recovery 
> operations, repeating the recovery procedure as necessary. If a pipelined 
> operation fails, all subsequent operations in the pipeline must also fail. 
> Yet the client must also carry on eventually: the recovery operations cannot 
> be trivially failed forever. Unfortunately, the client library does not know 
> where the pipelined operations end and the recovery operations begin. At the 
> time of a connection loss, subsequent pipelined operations may or may not be 
> queued in the library; others might be upcoming in the application thread. If 
> the library re-establishes a connection too early, it will send pipelined 
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its 
> callbacks and watchers for connection loss events, and do its best to stop 
> the subsequent pipelined operations at the first sign of a connection loss. 
> In addition to being a large burden for the application, this does not solve 
> the problem all the time. In particular, if the callback thread is delayed 
> significantly (as can happen due to excessive computation or scheduling 
> hiccups), the application may not learn about the connection loss event until 
> after the connection has been re-established and after dependent pipelined 
> operations have already been transmitted over the new connection.
> I suggest the following API changes to fix the problem:
>  - Add a method ZooKeeper.getConnection() returning a ZKConnection object. 
> ZKConnection would wrap a TCP connection. It would include all synchronous 
> and asynchronous operations currently defined on the ZooKeeper class. Upon a 
> connection loss on a ZKConnection, all subsequent operations on the same 
> ZKConnection would return a Connection Loss error. Upon noticing, the client 
> would need to call ZooKeeper.getConnection() again to get a working 
> ZKConnection object, and it would execute its recovery procedure on this new 
> connection.
>  - Deprecate all asynchronous methods on the 

[jira] [Commented] (ZOOKEEPER-2623) CheckVersion outside of Multi causes NullPointerException

2016-11-03 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632776#comment-15632776
 ] 

Benjamin Reed commented on ZOOKEEPER-2623:
--

i agree that we should handle this gracefully :) we should fix this.

> CheckVersion outside of Multi causes NullPointerException
> -
>
> Key: ZOOKEEPER-2623
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2623
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Diego Ongaro
>Priority: Minor
>
> I wasn't sure if check version (opcode 13) was permitted outside of a multi 
> op, so I tried it. My server crashed with a NullPointerException and became 
> unusable until restarted. I guess it's not allowed, but perhaps the server 
> should handle this more gracefully?
> Here are the server logs:
> {noformat}
> Accepted socket connection from /0:0:0:0:0:0:0:1:51737
> Session establishment request from client /0:0:0:0:0:0:0:1:51737 client's 
> lastZxid is 0x0
> Connection request from old client /0:0:0:0:0:0:0:1:51737; will be dropped if 
> server is in r-o mode
> Client attempting to establish new session at /0:0:0:0:0:0:0:1:51737
> :Fsessionid:0x10025651faa type:createSession cxid:0x0 
> zxid:0xfffe txntype:unknown reqpath:n/a
> Processing request:: sessionid:0x10025651faa type:createSession cxid:0x0 
> zxid:0xfffe txntype:unknown reqpath:n/a
> Got zxid 0x6065e expected 0x1
> Creating new log file: log.6065e
> Committing request:: sessionid:0x10025651faa type:createSession cxid:0x0 
> zxid:0x6065e txntype:-10 reqpath:n/a
> Processing request:: sessionid:0x10025651faa type:createSession cxid:0x0 
> zxid:0x6065e txntype:-10 reqpath:n/a
> :Esessionid:0x10025651faa type:createSession cxid:0x0 zxid:0x6065e 
> txntype:-10 reqpath:n/a
> sessionid:0x10025651faa type:createSession cxid:0x0 zxid:0x6065e 
> txntype:-10 reqpath:n/a
> Add a buffer to outgoingBuffers, sk sun.nio.ch.SelectionKeyImpl@28e9f397 is 
> valid: true
> Established session 0x10025651faa with negotiated timeout 2 for 
> client /0:0:0:0:0:0:0:1:51737
> :Fsessionid:0x10025651faa type:check cxid:0x1 zxid:0xfffe 
> txntype:unknown reqpath:/
> Processing request:: sessionid:0x10025651faa type:check cxid:0x1 
> zxid:0xfffe txntype:unknown reqpath:/
> Processing request:: sessionid:0x10025651faa type:check cxid:0x1 
> zxid:0xfffe txntype:unknown reqpath:/
> Exception causing close of session 0x10025651faa: Connection reset by peer
> :Esessionid:0x10025651faa type:check cxid:0x1 zxid:0xfffe 
> txntype:unknown reqpath:/
> IOException stack trace
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:320)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:530)
>   at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:162)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Unexpected exception
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.ZKDatabase.addCommittedProposal(ZKDatabase.java:252)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:127)
>   at 
> org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:362)
>   at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:162)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Committing request:: sessionid:0x10025651faa type:error cxid:0x1 
> zxid:0x6065f txntype:-1 reqpath:n/a
> Unregister MBean 
> [org.apache.ZooKeeperService:name0=ReplicatedServer_id1,name1=replica.1,name2=Follower,name3=Connections,name4="0:0:0:0:0:0:0:1",name5=0x10025651faa]
> Exception thrown by downstream processor, unable to continue.
> CommitProcessor exited loop!
> Closed socket connection for client /0:0:0:0:0:0:0:1:51737 which had 
> sessionid 0x10025651faa
> {noformat}

[jira] [Commented] (ZOOKEEPER-2592) Zookeeper is not recoverable once running system( machine on which zookeeper is running) is out of space

2016-11-03 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632763#comment-15632763
 ] 

Benjamin Reed commented on ZOOKEEPER-2592:
--

it sounds like we should close this as a duplicate. right?

> Zookeeper is not recoverable once running system( machine on which zookeeper 
> is running) is out of space
> 
>
> Key: ZOOKEEPER-2592
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2592
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Rakesh Kumar Singh
>Priority: Critical
>
> Zookeeper is not recoverable once running system( machine on which zookeeper 
> is running) is out of space 
> Steps to reproduce:-
> 1. Install zookeeper on standalone mode and start zookeeper
> 2. Make the machine physical memory full
> 3. Connect through client to zookeeper and trying create some znodes with 
> some data.
> 4. After sometime creating further znode will not happened as complete memory 
> is occupied
> 5. Now start creating space in that machine
> 6. Again connect through a client. Connection is fine. Now try to execute any 
> command like "ls / " it fails even though now space is more than 11gb
> Client log:-
> BLR107042:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin 
> # df -h
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/xvda2   36G   24G   11G  70% /
> udev1.9G  116K  1.9G   1% /dev
> tmpfs   1.9G 0  1.9G   0% /dev/shm
> BLR107042:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin 
> # ./zkCli.sh
> Connecting to localhost:2181
> 2016-09-19 22:50:20,227 [myid:] - INFO  [main:Environment@109] - Client 
> environment:zookeeper.version=3.5.1-alpha--1, built on 08/18/2016 08:20 GMT
> 2016-09-19 22:50:20,231 [myid:] - INFO  [main:Environment@109] - Client 
> environment:host.name=BLR107042
> 2016-09-19 22:50:20,231 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.version=1.7.0_79
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.vendor=Oracle Corporation
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.home=/usr/java/jdk1.7.0_79/jre
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.class.path=/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../build/classes:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../build/lib/*.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/slf4j-log4j12-1.7.5.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/slf4j-api-1.7.5.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/servlet-api-2.5-20081211.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/netty-3.7.0.Final.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/log4j-1.2.16.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jline-2.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jetty-util-6.1.26.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jetty-6.1.26.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/javacc.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jackson-mapper-asl-1.9.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jackson-core-asl-1.9.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/commons-cli-1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/ant-eclipse-1.0-jvm1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../zookeeper-3.5.1-alpha.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../src/java/lib/ant-eclipse-1.0-jvm1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../conf:/usr/java/jdk1.7.0_79/lib
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.io.tmpdir=/tmp
> 2016-09-19 22:50:20,234 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.compiler=
> 2016-09-19 22:50:20,235 [myid:] - INFO  [main:Environment@109] - Client 
> environment:os.name=Linux
> 2016-09-19 22:50:20,235 [myid:] - INFO  [main:Environment@109] - Client 
> environment:os.arch=amd64
> 2016-09-19 22:50:20,235 [myid:] - INFO  [main:Environment@109] - Client 
> environment:os.version=3.0.76-0.11-default
> 

[jira] [Commented] (ZOOKEEPER-761) Remove *synchronous* calls from the *single-threaded* C clieant API, since they are documented not to work

2016-10-27 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612002#comment-15612002
 ] 

Benjamin Reed commented on ZOOKEEPER-761:
-

not yet, i have a mac and tests don't seem to work on a mac. i haven't had a 
chance to test on linux yet.

> Remove *synchronous* calls from the *single-threaded* C clieant API, since 
> they are documented not to work
> --
>
> Key: ZOOKEEPER-761
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-761
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.1.1, 3.2.2
> Environment: RHEL 4u8 (Linux).  The issue is not OS-specific though.
>Reporter: Jozef Hatala
>Assignee: Benjamin Reed
>Priority: Minor
> Fix For: 3.5.3, 3.6.0
>
> Attachments: fix-sync-apis-in-st-adaptor.patch, 
> fix-sync-apis-in-st-adaptor.v2.patch
>
>
> Since the synchronous calls are 
> [known|http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client]
>  to be unimplemented in the single threaded version of the client library 
> libzookeeper_st.so, I believe that it would be helpful towards users of the 
> library if that information was also obvious from the header file.
> Anecdotally more than one of us here made the mistake of starting by using 
> the synchronous calls with the single-threaded library, and we found 
> ourselves debugging it.  An early warning would have been greatly appreciated.
> 1. Could you please add warnings to the doxygen blocks of all synchronous 
> calls saying that they are not available in the single-threaded API.  This 
> cannot be safely done with {{#ifdef THREADED}}, obviously, because the same 
> header file is included whichever client library implementation one is 
> compiling for.
> 2. Could you please bracket the implementation of all synchronous calls in 
> zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols 
> are not present in libzookeeper_st.so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-25 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607124#comment-15607124
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

btw thanx for the script edward! even though there were problems it made the 
process very easy!

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-2597.
--
Resolution: Fixed

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-25 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607121#comment-15607121
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

i used the script to commit the pull request. i also followed your 
instructions. here are a couple of things that went wrong:

1) you have the wrong repo line for adding the apache repo. it should be:
{code:borderStyle=solid}
  git remote add apache https://git-wip-us.apache.org/repos/asf/zookeeper.git
{code}

2) when things go bad it doesn't delete the branches it creates. i'm not sure 
if that is a bug or a feature. we should document that you need to remove the 
temporary branches before rerunning the script.

3) the script asks {{List pull request commits in squashed commit message? 
(y/n):}} i think the answer should be {{n}}

4) after the script ran i was very disappointed that the jira integration 
didn't work. we should make sure we run the following before running the script:

{code}
sudo pip install jira
{code}

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-761) Remove *synchronous* calls from the *single-threaded* C clieant API, since they are documented not to work

2016-10-25 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605468#comment-15605468
 ] 

Benjamin Reed commented on ZOOKEEPER-761:
-

for some reason my comments/changes do not get bridged. i've updated the pr to 
move zoo_remove_watchers into the #ifdef

> Remove *synchronous* calls from the *single-threaded* C clieant API, since 
> they are documented not to work
> --
>
> Key: ZOOKEEPER-761
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-761
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.1.1, 3.2.2
> Environment: RHEL 4u8 (Linux).  The issue is not OS-specific though.
>Reporter: Jozef Hatala
>Assignee: Benjamin Reed
>Priority: Minor
> Fix For: 3.5.3, 3.6.0
>
> Attachments: fix-sync-apis-in-st-adaptor.patch, 
> fix-sync-apis-in-st-adaptor.v2.patch
>
>
> Since the synchronous calls are 
> [known|http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client]
>  to be unimplemented in the single threaded version of the client library 
> libzookeeper_st.so, I believe that it would be helpful towards users of the 
> library if that information was also obvious from the header file.
> Anecdotally more than one of us here made the mistake of starting by using 
> the synchronous calls with the single-threaded library, and we found 
> ourselves debugging it.  An early warning would have been greatly appreciated.
> 1. Could you please add warnings to the doxygen blocks of all synchronous 
> calls saying that they are not available in the single-threaded API.  This 
> cannot be safely done with {{#ifdef THREADED}}, obviously, because the same 
> header file is included whichever client library implementation one is 
> compiling for.
> 2. Could you please bracket the implementation of all synchronous calls in 
> zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols 
> are not present in libzookeeper_st.so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-25 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605414#comment-15605414
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

i'll commit at end of day unless someone has an objection.

edward, can you put up the instructions. i'll follow them to do the commit :)

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-25 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605408#comment-15605408
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

for some reason my comments were not bridged over:

+1

i think there are still quite a few improvements that can be made to this 
script, for example it assumes that the repo 'apache-github' is setup, so it 
would be nice to check for that at the start of the script and then print out 
how to set it up if it isn't setup.

i'm thinking that we should get this checked in and then iterate on it as we 
use it. what do others think?

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2619) Client library reconnecting breaks FIFO client order

2016-10-24 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15604093#comment-15604093
 ] 

Benjamin Reed commented on ZOOKEEPER-2619:
--

thanx diego, you did express well what i was trying to say. i also like your 
proposal. there are probably more details to work out, like how would it look 
for the C api? i like how it encapsulates nicely the relation between a 
sequence of operations, and your example does make a compelling argument for 
also including the sync api.

do we have some applications that we can use to validate the api? it would be 
nice to validate the design before we standardize it.

what i meant by "i think it's a good idea to document this issue in this jira" 
is that it's good that we have this jira to discuss the problem and potential 
solutions.

> Client library reconnecting breaks FIFO client order
> 
>
> Key: ZOOKEEPER-2619
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010 
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
>  ZooKeeper provides "FIFO client order: all requests from a given client are 
> executed in the order that they were sent by the client." I believe 
> applications written using the Java client library are unable to rely on this 
> guarantee, and any current application that does so is broken. Other client 
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described 
> on Page 4 (right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. 
> Applying the above definition, if the ZooKeeper database contains /pointer, 
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection 
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following 
> three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined 
> operations, then upon noticing a connection loss, issues a series of recovery 
> operations, repeating the recovery procedure as necessary. If a pipelined 
> operation fails, all subsequent operations in the pipeline must also fail. 
> Yet the client must also carry on eventually: the recovery operations cannot 
> be trivially failed forever. Unfortunately, the client library does not know 
> where the pipelined operations end and the recovery operations begin. At the 
> time of a connection loss, subsequent pipelined operations may or may not be 
> queued in the library; others might be upcoming in the application thread. If 
> the library re-establishes a connection too early, it will send pipelined 
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its 
> callbacks and watchers for connection loss events, and do its best to stop 
> the subsequent pipelined operations at the first sign of a connection loss. 
> In addition to being a large burden for the application, this does not solve 
> the problem all the time. In particular, if the callback thread is delayed 
> significantly (as can happen due to excessive computation or scheduling 
> hiccups), the application may not learn about the connection loss event until 
> after the connection has been re-established and after dependent pipelined 
> operations have already been transmitted over the new connection.
> I suggest the following API changes to fix the problem:
>  - Add a method ZooKeeper.getConnection() returning a ZKConnection object. 
> ZKConnection would wrap a TCP connection. It would include all synchronous 
> and asynchronous operations currently defined on the ZooKeeper class. Upon a 
> connection loss on a ZKConnection, all subsequent operations on the same 
> 

[jira] [Commented] (ZOOKEEPER-2619) Client library reconnecting breaks FIFO client order

2016-10-22 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15598842#comment-15598842
 ] 

Benjamin Reed commented on ZOOKEEPER-2619:
--

i think it's a good idea to document this issue in this jira. it would be 
really nice to surface this to clients in a way that they both realize the 
problem and they have a way to deal with it.

the nice thing about it is that it is a client side issue. the server maintains 
its guarantees. since you are implementing your own client you can actually 
experiment with different ideas.

it sounds to me that getConnection() and reenableOps() are basically the same. 
right? or are you proposing that when you get a ZKConnection object you can 
invoke the zookeeper operations on that?

i think this is really only an issue for async methods, since synchronous 
methods execute ... synchronously, thus one at a time. i kind of like the idea 
of getting a object that only has async methods that you can have a strong 
guarantee of FIFO execution.

one problem i see with reenableOps is that it affects everything using the 
zookeeper handle, not just the ops in question.

> Client library reconnecting breaks FIFO client order
> 
>
> Key: ZOOKEEPER-2619
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2619
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Diego Ongaro
>
> According to the USENIX ATC 2010 
> [paper|https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems],
>  ZooKeeper provides "FIFO client order: all requests from a given client are 
> executed in the order that they were sent by the client." I believe 
> applications written using the Java client library are unable to rely on this 
> guarantee, and any current application that does so is broken. Other client 
> libraries are also likely to be affected.
> Consider this application, which is simplified from the algorithm described 
> on Page 4 (right column) of the paper:
> {code}
>   zk = new ZooKeeper(...)
>   zk.createAsync("/data-23857", "...", callback)
>   zk.createSync("/pointer", "/data-23857")
> {code}
> Assume an empty ZooKeeper database to begin with and no other writers. 
> Applying the above definition, if the ZooKeeper database contains /pointer, 
> it must also contain /data-23857.
> Now consider this series of unfortunate events:
> {code}
>   zk = new ZooKeeper(...)
>   // The library establishes a TCP connection.
>   zk.createAsync("/data-23857", "...", callback)
>   // The library/kernel closes the TCP connection because it times out, and
>   // the create of /data-23857 is doomed to fail with ConnectionLoss. Suppose
>   // that it never reaches the server.
>   // The library establishes a new TCP connection.
>   zk.createSync("/pointer", "/data-23857")
>   // The create of /pointer succeeds.
> {code}
> That's the problem: subsequent operations get assigned to the new connection 
> and succeed, while earlier operations fail.
> In general, I believe it's impossible to have a system with the following 
> three properties:
>  # FIFO client order for asynchronous operations,
>  # Failing operations when connections are lost, AND
>  # Transparently reconnecting when connections are lost.
> To argue this, consider an application that issues a series of pipelined 
> operations, then upon noticing a connection loss, issues a series of recovery 
> operations, repeating the recovery procedure as necessary. If a pipelined 
> operation fails, all subsequent operations in the pipeline must also fail. 
> Yet the client must also carry on eventually: the recovery operations cannot 
> be trivially failed forever. Unfortunately, the client library does not know 
> where the pipelined operations end and the recovery operations begin. At the 
> time of a connection loss, subsequent pipelined operations may or may not be 
> queued in the library; others might be upcoming in the application thread. If 
> the library re-establishes a connection too early, it will send pipelined 
> operations out of FIFO client order.
> I considered a possible workaround of having the client diligently check its 
> callbacks and watchers for connection loss events, and do its best to stop 
> the subsequent pipelined operations at the first sign of a connection loss. 
> In addition to being a large burden for the application, this does not solve 
> the problem all the time. In particular, if the callback thread is delayed 
> significantly (as can happen due to excessive computation or scheduling 
> hiccups), the application may not learn about the connection loss event until 
> after the connection has been re-established and after dependent pipelined 
> operations have already been transmitted over the new connection.
> I suggest the following API changes to fix 

[jira] [Commented] (ZOOKEEPER-761) Remove *synchronous* calls from the *single-threaded* C clieant API, since they are documented not to work

2016-10-18 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586819#comment-15586819
 ] 

Benjamin Reed commented on ZOOKEEPER-761:
-

this is a pretty good patch to make things start working, but i think we should 
deprecate the single threaded API altogether. what do others think?

concretely, i propose that we had #ifdef THREADED around the sync APIs and also 
add a warning that the non THREADED API is deprecated.

> Remove *synchronous* calls from the *single-threaded* C clieant API, since 
> they are documented not to work
> --
>
> Key: ZOOKEEPER-761
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-761
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.1.1, 3.2.2
> Environment: RHEL 4u8 (Linux).  The issue is not OS-specific though.
>Reporter: Jozef Hatala
>Assignee: Pierre Habouzit
>Priority: Minor
> Fix For: 3.5.3, 3.6.0
>
> Attachments: fix-sync-apis-in-st-adaptor.patch, 
> fix-sync-apis-in-st-adaptor.v2.patch
>
>
> Since the synchronous calls are 
> [known|http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client]
>  to be unimplemented in the single threaded version of the client library 
> libzookeeper_st.so, I believe that it would be helpful towards users of the 
> library if that information was also obvious from the header file.
> Anecdotally more than one of us here made the mistake of starting by using 
> the synchronous calls with the single-threaded library, and we found 
> ourselves debugging it.  An early warning would have been greatly appreciated.
> 1. Could you please add warnings to the doxygen blocks of all synchronous 
> calls saying that they are not available in the single-threaded API.  This 
> cannot be safely done with {{#ifdef THREADED}}, obviously, because the same 
> header file is included whichever client library implementation one is 
> compiling for.
> 2. Could you please bracket the implementation of all synchronous calls in 
> zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols 
> are not present in libzookeeper_st.so?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567516#comment-15567516
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

cool i opened an infra jira to see if they can turn on the bridging: INFRA-12752

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567338#comment-15567338
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

are you still working on this edward? do you want me to try and implement the 
changes i suggested?

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559081#comment-15559081
 ] 

Benjamin Reed edited comment on ZOOKEEPER-2597 at 10/9/16 2:16 AM:
---

no problem. i made some reviews. (i thought that they would be bridged to 
jira...)


was (Author: breed):
no problem. i made some reviews. (i though that they would be bridged to 
jira...)

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559081#comment-15559081
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

no problem. i made some reviews. (i though that they would be bridged to 
jira...)

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558730#comment-15558730
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

did you put up the pull request?

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-07 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556598#comment-15556598
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

great thanx! i have a couple of questions about it and it would be nice to be 
able comment on the diff in github :) i too would like to get this in asap!

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2597) Add script to merge PR from Apache git repo to Github

2016-10-06 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552617#comment-15552617
 ] 

Benjamin Reed commented on ZOOKEEPER-2597:
--

i'm just starting to look at this script. it's kind of ironic that this isn't a 
pull request ;)

> Add script to merge PR from Apache git repo to Github
> -
>
> Key: ZOOKEEPER-2597
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2597
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Edward Ribeiro
>Assignee: Edward Ribeiro
>Priority: Minor
> Attachments: ZOOKEEPER-2597.patch
>
>
> A port of kafka-merge-pr.py to workon on ZooKeeper repo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-2600) dangling ephemerals on overloaded server with local sessions

2016-09-24 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed resolved ZOOKEEPER-2600.
--
Resolution: Cannot Reproduce

> dangling ephemerals on overloaded server with local sessions
> 
>
> Key: ZOOKEEPER-2600
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2600
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Reporter: Benjamin Reed
>
> we had the following strange production bug:
> there was an ephemeral znode for a session that was no longer active.  it 
> happened even in the absence of failures.
> we are running with local sessions enabled and slightly different logic than 
> the open source zookeeper, but code inspection shows that the problem is also 
> in open source.
> the triggering condition was server overload. we had a traffic burst and it 
> we were having commit latencies of over 30 seconds.
> after digging through logs/code we realized from the logs that the create 
> session txn for the ephemeral node started (in the PrepRequestProcessor) at 
> 11:23:04 and committed at 11:23:38 (the "Adding global session" is output in 
> the commit processor). it took 34 seconds to commit the createSession, during 
> that time the session expired. due to delays it appears that the interleave 
> was as follows:
> 1) create session hits prep request processor and create session txn 
> generated 11:23:04
> 2) time passes as the create session is going through zab
> 3) the session expires, close session is generated, and close session txn 
> generated 11:23:23
> 4) the create session gets committed and the session gets re-added to the 
> sessionTracker 11:23:38
> 5) the create ephemeral node hits prep request processor and a create txn 
> generated 11:23:40
> 6) the close session gets committed (all ephemeral nodes for the session are 
> deleted) and the session is deleted from sessionTracker
> 7) the create ephemeral node gets committed
> the root cause seems to be that the gobal sessions are managed by both the 
> PrepRequestProcessor and the CommitProcessor. also with the local session 
> upgrading we can have changes in flight before our sessions commits. i think 
> there are probably two places to fix:
> 1) changes to session tracker should not happen in prep request processor.
> 2) we should not have requests in flight while create session is in process. 
> there are two options to prevent this:
> a) when a create session is generated in makeUpgradeRequest, we need to start 
> queuing the requests from the clients and only submit them once the create 
> session is committed
> b) the client should explicitly detect that it needs to change from local 
> session to global session and explicitly open a global session and get the 
> commit before it sends an ephemeral create request
> option 2a) is a more transparent fix, but architecturally and in the long 
> term i think 2b) might be better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2600) dangling ephemerals on overloaded server with local sessions

2016-09-24 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15518765#comment-15518765
 ] 

Benjamin Reed commented on ZOOKEEPER-2600:
--

i investigated this further and it appears that a local change that has not 
been upstream is causing this problem. closing the bug.

> dangling ephemerals on overloaded server with local sessions
> 
>
> Key: ZOOKEEPER-2600
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2600
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Reporter: Benjamin Reed
>
> we had the following strange production bug:
> there was an ephemeral znode for a session that was no longer active.  it 
> happened even in the absence of failures.
> we are running with local sessions enabled and slightly different logic than 
> the open source zookeeper, but code inspection shows that the problem is also 
> in open source.
> the triggering condition was server overload. we had a traffic burst and it 
> we were having commit latencies of over 30 seconds.
> after digging through logs/code we realized from the logs that the create 
> session txn for the ephemeral node started (in the PrepRequestProcessor) at 
> 11:23:04 and committed at 11:23:38 (the "Adding global session" is output in 
> the commit processor). it took 34 seconds to commit the createSession, during 
> that time the session expired. due to delays it appears that the interleave 
> was as follows:
> 1) create session hits prep request processor and create session txn 
> generated 11:23:04
> 2) time passes as the create session is going through zab
> 3) the session expires, close session is generated, and close session txn 
> generated 11:23:23
> 4) the create session gets committed and the session gets re-added to the 
> sessionTracker 11:23:38
> 5) the create ephemeral node hits prep request processor and a create txn 
> generated 11:23:40
> 6) the close session gets committed (all ephemeral nodes for the session are 
> deleted) and the session is deleted from sessionTracker
> 7) the create ephemeral node gets committed
> the root cause seems to be that the gobal sessions are managed by both the 
> PrepRequestProcessor and the CommitProcessor. also with the local session 
> upgrading we can have changes in flight before our sessions commits. i think 
> there are probably two places to fix:
> 1) changes to session tracker should not happen in prep request processor.
> 2) we should not have requests in flight while create session is in process. 
> there are two options to prevent this:
> a) when a create session is generated in makeUpgradeRequest, we need to start 
> queuing the requests from the clients and only submit them once the create 
> session is committed
> b) the client should explicitly detect that it needs to change from local 
> session to global session and explicitly open a global session and get the 
> commit before it sends an ephemeral create request
> option 2a) is a more transparent fix, but architecturally and in the long 
> term i think 2b) might be better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2600) dangling ephemerals on overloaded server with local sessions

2016-09-22 Thread Benjamin Reed (JIRA)
Benjamin Reed created ZOOKEEPER-2600:


 Summary: dangling ephemerals on overloaded server with local 
sessions
 Key: ZOOKEEPER-2600
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2600
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Reporter: Benjamin Reed


we had the following strange production bug:

there was an ephemeral znode for a session that was no longer active.  it 
happened even in the absence of failures.

we are running with local sessions enabled and slightly different logic than 
the open source zookeeper, but code inspection shows that the problem is also 
in open source.

the triggering condition was server overload. we had a traffic burst and it we 
were having commit latencies of over 30 seconds.

after digging through logs/code we realized from the logs that the create 
session txn for the ephemeral node started (in the PrepRequestProcessor) at 
11:23:04 and committed at 11:23:38 (the "Adding global session" is output in 
the commit processor). it took 34 seconds to commit the createSession, during 
that time the session expired. due to delays it appears that the interleave was 
as follows:

1) create session hits prep request processor and create session txn generated 
11:23:04
2) time passes as the create session is going through zab
3) the session expires, close session is generated, and close session txn 
generated 11:23:23
4) the create session gets committed and the session gets re-added to the 
sessionTracker 11:23:38
5) the create ephemeral node hits prep request processor and a create txn 
generated 11:23:40
6) the close session gets committed (all ephemeral nodes for the session are 
deleted) and the session is deleted from sessionTracker
7) the create ephemeral node gets committed

the root cause seems to be that the gobal sessions are managed by both the 
PrepRequestProcessor and the CommitProcessor. also with the local session 
upgrading we can have changes in flight before our sessions commits. i think 
there are probably two places to fix:

1) changes to session tracker should not happen in prep request processor.
2) we should not have requests in flight while create session is in process. 
there are two options to prevent this:
a) when a create session is generated in makeUpgradeRequest, we need to start 
queuing the requests from the clients and only submit them once the create 
session is committed
b) the client should explicitly detect that it needs to change from local 
session to global session and explicitly open a global session and get the 
commit before it sends an ephemeral create request

option 2a) is a more transparent fix, but architecturally and in the long term 
i think 2b) might be better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2465) Documentation copyright notice is out of date.

2016-09-12 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15485391#comment-15485391
 ] 

Benjamin Reed commented on ZOOKEEPER-2465:
--

the last time a similar issue came up like this on a different project an 
unnamed companies legal team pointed out that you don't need to keep updating 
the date. after a long and heated discussion among engineers with no legal 
background, we decided to follow advice and just leave the date.

i don't have the legal background to make a definitive statement, but it does 
make maintenance easier if we don't have to keep updating the year. we can just 
make it 2008 and not worry about changing it.

> Documentation copyright notice is out of date.
> --
>
> Key: ZOOKEEPER-2465
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2465
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Edward Ribeiro
>Priority: Blocker
> Fix For: 3.5.3
>
>
> As reported by [~eribeiro], all of the documentation pages show a copyright 
> notice dating "2008-2013".  This issue tracks updating the copyright notice 
> on all documentation pages to show the current year.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-09-06 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469376#comment-15469376
 ] 

Benjamin Reed commented on ZOOKEEPER-2169:
--

can't you just do a stat to find this out?


> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2536) When provide path for "dataDir" with trailing space, it is taking correct path (by trucating space) for snapshot but creating temporary file with some junk folder n

2016-09-01 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457459#comment-15457459
 ] 

Benjamin Reed commented on ZOOKEEPER-2536:
--

BTW i think the patch is not applying because you didn't do the diff relative 
to the root of the zookeeper repo

> When provide path for "dataDir" with trailing space, it is taking correct 
> path (by trucating space) for snapshot but creating temporary file with some 
> junk folder name for zookeeper_server.pid
> 
>
> Key: ZOOKEEPER-2536
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2536
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Rakesh Kumar Singh
> Fix For: 3.5.1, 3.5.2
>
> Attachments: zkServer.sh.patch
>
>
> Scenario 1:-
> When provide path for "dataDir" with trailing space, it is taking correct 
> path (by trucating space) for snapshot but creating temporary file with some 
> junk folder name for zookeeper_server.pid
> Steps to reproduce:-
> 1. Configure the dataDir
> dataDir=/home/Rakesh/Zookeeper/18_Aug/zookeeper-3.5.1-alpha/data 
> Here there is a space after /data 
> 2. Start Zookeeper Server
> 3. The snapshot is getting created at location mentioned above by truncating 
> the trailing space but
> one temp folder with junk name (like -> D29D4X~J) is getting created for 
> zookeeper_server.pid
> Scenario 2:-
> When configure the heading and trailing space in above mentioned scenario. 
> the temp folder is getting created in zookeeper/bin folder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2537) When provide path for "dataDir" with heading space, it is taking correct path (by trucating space) for snapshot but zookeeper_server.pid is getting created in root

2016-09-01 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457457#comment-15457457
 ] 

Benjamin Reed commented on ZOOKEEPER-2537:
--

isn't this the same as ZOOKEEPER-2536?

> When provide path for "dataDir" with heading space, it is taking correct path 
> (by trucating space) for snapshot but zookeeper_server.pid is getting created 
> in root (/) folder
> --
>
> Key: ZOOKEEPER-2537
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2537
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Rakesh Kumar Singh
> Fix For: 3.5.1, 3.5.2
>
> Attachments: zkServer.sh.patch
>
>
> Scenario 1 :-
> When provide path for "dataDir" with heading space, it is taking correct path 
> (by trucating space) for snapshot but zookeeper_server.pid is getting created 
> in root (/) folder
> Steps to reproduce:-
> 1. Configure the dataDir
> dataDir= /home/Rakesh/Zookeeper/18_Aug/zookeeper-3.5.1-alpha/data
> Here there is a space after dataDir=
> 2. Start Zookeeper Server
> 3. The snapshot is getting created at location mentioned above by truncating 
> the heading space but
> zookeeper_server.pid is getting created at root (/) folder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2536) When provide path for "dataDir" with trailing space, it is taking correct path (by trucating space) for snapshot but creating temporary file with some junk folder n

2016-09-01 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457447#comment-15457447
 ] 

Benjamin Reed commented on ZOOKEEPER-2536:
--

+1 LGTM

> When provide path for "dataDir" with trailing space, it is taking correct 
> path (by trucating space) for snapshot but creating temporary file with some 
> junk folder name for zookeeper_server.pid
> 
>
> Key: ZOOKEEPER-2536
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2536
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Rakesh Kumar Singh
> Fix For: 3.5.1, 3.5.2
>
> Attachments: zkServer.sh.patch
>
>
> Scenario 1:-
> When provide path for "dataDir" with trailing space, it is taking correct 
> path (by trucating space) for snapshot but creating temporary file with some 
> junk folder name for zookeeper_server.pid
> Steps to reproduce:-
> 1. Configure the dataDir
> dataDir=/home/Rakesh/Zookeeper/18_Aug/zookeeper-3.5.1-alpha/data 
> Here there is a space after /data 
> 2. Start Zookeeper Server
> 3. The snapshot is getting created at location mentioned above by truncating 
> the trailing space but
> one temp folder with junk name (like -> D29D4X~J) is getting created for 
> zookeeper_server.pid
> Scenario 2:-
> When configure the heading and trailing space in above mentioned scenario. 
> the temp folder is getting created in zookeeper/bin folder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2539) Throwing nullpointerException when run the command "config -c" when client port is mentioned as separate and not like new style

2016-09-01 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457427#comment-15457427
 ] 

Benjamin Reed commented on ZOOKEEPER-2539:
--

+1 looks good, just a formatting nit you need to indent the line after the if() 
and put it in {}'s since it is on a separate line.

> Throwing nullpointerException when run the command "config -c" when client 
> port is mentioned as separate and not like new style
> ---
>
> Key: ZOOKEEPER-2539
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2539
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Rakesh Kumar Singh
>Priority: Minor
> Fix For: 3.5.1, 3.5.2
>
> Attachments: ConfigUtils.java.patch
>
>
> Throwing nullpointerException when run the command "config -c" when client 
> port is mentioned as separate and not like new style
> 1. Configure the zookeeper to start in cluster mode like below-
> clientPort=2181
> server.1=10.18.101.80:2888:3888
> server.2=10.18.219.50:2888:3888
> server.3=10.18.221.194:2888:3888
> and not like below:-
> server.1=10.18.101.80:2888:3888:participant;2181
> server.2=10.18.219.50:2888:3888:participant;2181
> server.3=10.18.221.194:2888:3888:participant;2181
> 2. Start the cluster and one client using >zkCli.sh
> 3. execute command "config -c"
> It is throwing nullpointerException:-
> root@BLR110865:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin#
>  ./zkCli.sh 
> Connecting to localhost:2181
> 2016-08-29 21:45:19,558 [myid:] - INFO  [main:Environment@109] - Client 
> environment:zookeeper.version=3.5.1-alpha--1, built on 08/18/2016 08:20 GMT
> 2016-08-29 21:45:19,561 [myid:] - INFO  [main:Environment@109] - Client 
> environment:host.name=BLR110865
> 2016-08-29 21:45:19,562 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.version=1.7.0_17
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.vendor=Oracle Corporation
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.home=/usr/lib/jvm/oracle_jdk7/jre
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.class.path=/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../build/classes:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../build/lib/*.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/slf4j-log4j12-1.7.5.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/slf4j-api-1.7.5.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/servlet-api-2.5-20081211.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/netty-3.7.0.Final.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/log4j-1.2.16.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jline-2.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jetty-util-6.1.26.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jetty-6.1.26.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/javacc.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jackson-mapper-asl-1.9.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/jackson-core-asl-1.9.11.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/commons-cli-1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../lib/ant-eclipse-1.0-jvm1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../zookeeper-3.5.1-alpha.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../src/java/lib/ant-eclipse-1.0-jvm1.2.jar:/home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/bin/../conf:
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.io.tmpdir=/tmp
> 2016-08-29 21:45:19,564 [myid:] - INFO  [main:Environment@109] - Client 
> environment:java.compiler=
> 2016-08-29 21:45:19,565 [myid:] - INFO  [main:Environment@109] - Client 
> environment:os.name=Linux
> 2016-08-29 21:45:19,565 [myid:] - INFO  [main:Environment@109] - Client 
> environment:os.arch=amd64
> 2016-08-29 21:45:19,565 [myid:] - INFO  [main:Environment@109] - Client 
> environment:os.version=4.4.0-31-generic
> 2016-08-29 21:45:19,565 [myid:] - INFO  [main:Environment@109] - Client 
> environment:user.name=root
> 

[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-08-28 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444326#comment-15444326
 ] 

Benjamin Reed commented on ZOOKEEPER-2169:
--

i observed that all the container check tests happen with the server running 
the whole time. there are no recovery tests. sometimes weird interactions 
happen there.

the more i think about it the more i realize since this is the first 
functionality that uses a notion of a global clock and the first that treats 
mtime as something more than an informational hint, there are more conditions 
to test: 1) what happens if the new server has a clock that is far behind or 
ahead? it's clear that if the new server has a clock that is ahead (or the old 
server had a clock that was behind) we may violate the TTL, but there may be 
others. 2) a similar thing might happen if the time is adjusted on the machine 
the server is running on. i imagine there are others to test, this is just off 
the top of my head, but since this is a new feature we can learn as people use 
it.

i'm fine letting it go in as is.

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-08-19 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428477#comment-15428477
 ] 

Benjamin Reed commented on ZOOKEEPER-2169:
--

this looks pretty cool. it doesn't look like you have test coverage for 
expiring nodes whose ttls are passed when you are bringing up the server. right?

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-08-12 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-2325:
-
Attachment: zk.patch

patch for ZOOKEEPER-2325.001.patch to make sure the initial case is handled 
correctly: the server starts up and logs a few transactions and then restarts. 
(found because this patch doesn't pass unit tests.)

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2014-04-13 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968053#comment-13968053
 ] 

Benjamin Reed commented on ZOOKEEPER-1807:
--

+1 looks good to me. it would be nice if [~fpj] gave it a glance though :)

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Alexander Shraer
Priority: Blocker
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, 
 ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, 
 ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1699) Leader should timeout and give up leadership when losing quorum of last proposed configuration

2014-04-13 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968059#comment-13968059
 ] 

Benjamin Reed commented on ZOOKEEPER-1699:
--

you have some tabs that should be spaces.

why are you catching the interrupted exceptions? if you take too long, an 
InterruptedException will be thrown after the while loop anyway.

 Leader should timeout and give up leadership when losing quorum of last 
 proposed configuration
 --

 Key: ZOOKEEPER-1699
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1699
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Alexander Shraer
Priority: Blocker
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1699-v1.patch, ZOOKEEPER-1699-v2.patch, 
 ZOOKEEPER-1699.patch


 A leader gives up leadership when losing a quorum of the current 
 configuration.
 This doesn't take into account any proposed configuration. So, if
 a reconfig operation is in progress and a quorum of the new configuration is 
 not
 responsive, the leader will just get stuck waiting for it to ACK the reconfig 
 operation, and will never timeout. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect

2013-11-26 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833545#comment-13833545
 ] 

Benjamin Reed commented on ZOOKEEPER-832:
-

i think ZOOKEEPER-1794 would mostly cover the issues, but i don't think it 
handles the scenario of this jira :)

the scenario is that the zxid seen by the client is less than that of the ZK 
servers, so the clients spin.  that would still happen with ZOOKEEPER-1794.

using the dbid (assuming it is sent in the connect) the clients could fail fast 
and hard.

 Invalid session id causes infinite loop during automatic reconnect
 --

 Key: ZOOKEEPER-832
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1, 3.4.5, 3.5.0
 Environment: All
Reporter: Ryan Holmes
Assignee: Germán Blanco
 Fix For: 3.4.7, 3.5.0

 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, 
 ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch


 Steps to reproduce:
 1.) Connect to a standalone server using the Java client.
 2.) Stop the server.
 3.) Delete the contents of the data directory (i.e. the persisted session 
 data).
 4.) Start the server.
 The client now automatically tries to reconnect but the server refuses the 
 connection because the session id is invalid. The client and server are now 
 in an infinite loop of attempted and rejected connections. While this 
 situation represents a catastrophic failure and the current behavior is not 
 incorrect, it appears that there is no way to detect this situation on the 
 client and therefore no way to recover.
 The suggested improvement is to send an event to the default watcher 
 indicating that the current state is session invalid, similar to how the 
 session expired state is handled.
 Server log output (repeats indefinitely):
 2010-08-05 11:48:08,283 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - 
 Accepted socket connection from /127.0.0.1:63292
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing 
 session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last 
 zxid is 0x0 client must try another server
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed 
 socket connection for client /127.0.0.1:63292 (no session established for 
 client)
 Client log output (repeats indefinitely):
 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - 
 Opening socket connection to server localhost/127.0.0.1:2181
 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 
 0x12a3ae4e893000a for server null, unexpected error, closing socket 
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring 
 exception during shutdown input
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
   at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring 
 exception during shutdown output
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
   at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect

2013-11-25 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832373#comment-13832373
 ] 

Benjamin Reed commented on ZOOKEEPER-832:
-

i think the solution to this is to use the dbid. it would fix a couple of 
things.

the dbid is supposed to be a globally unique id that tags the zookeeper data. 
we have it in the file headers, but we never really finished integrating it. 
one key purpose was to make it so the clients could always know that they are 
talking to a server that is using the same zookeeper data instance as all the 
other servers it has been talking too. it would also ensure that the zookeeper 
servers are using the same db instance. in addition i think it could be 
extended to your purposes.

for issue 1) the nice thing about using a ramdisk is that everything gets lost 
on reboot, so if you don't have anything you know it. now when a server tries 
to connect to the ensemble it can signal that it will not participate in quorum 
establishment because its dbid is 0. once it connects to an established leader 
it can sync with the leader and take on its dbid.

for issue 2) we would need to add the dbid to ConnectRequest (something that 
we've needed to do anyway). if the client connects to a server with a different 
dbid, the server can reliably tell the client that the zookeeper instance that 
is was connected to is gone and it should close its handle. this would work 
even if the server has a later zxid than the last seen by the client.

 Invalid session id causes infinite loop during automatic reconnect
 --

 Key: ZOOKEEPER-832
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1, 3.4.5, 3.5.0
 Environment: All
Reporter: Ryan Holmes
Assignee: Germán Blanco
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, 
 ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch


 Steps to reproduce:
 1.) Connect to a standalone server using the Java client.
 2.) Stop the server.
 3.) Delete the contents of the data directory (i.e. the persisted session 
 data).
 4.) Start the server.
 The client now automatically tries to reconnect but the server refuses the 
 connection because the session id is invalid. The client and server are now 
 in an infinite loop of attempted and rejected connections. While this 
 situation represents a catastrophic failure and the current behavior is not 
 incorrect, it appears that there is no way to detect this situation on the 
 client and therefore no way to recover.
 The suggested improvement is to send an event to the default watcher 
 indicating that the current state is session invalid, similar to how the 
 session expired state is handled.
 Server log output (repeats indefinitely):
 2010-08-05 11:48:08,283 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - 
 Accepted socket connection from /127.0.0.1:63292
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing 
 session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last 
 zxid is 0x0 client must try another server
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed 
 socket connection for client /127.0.0.1:63292 (no session established for 
 client)
 Client log output (repeats indefinitely):
 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - 
 Opening socket connection to server localhost/127.0.0.1:2181
 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 
 0x12a3ae4e893000a for server null, unexpected error, closing socket 
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring 
 exception during shutdown input
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
   at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring 
 exception during shutdown output
 java.nio.channels.ClosedChannelException
   at 
 

[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect

2013-11-24 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831147#comment-13831147
 ] 

Benjamin Reed commented on ZOOKEEPER-832:
-

i can't quite see how any of the fixes talked about can really be used in a 
reasonable way. i think there are two issues:

1) if a server's data gets reset, it should not come up as if nothing happened. 
in reality it should not vote or participate in a quorum until it can sync with 
an active leader. otherwise can lose data.

2) if we do get in a situation where data loss has happened (for example too 
many servers lose data or we ignore issue 1), do we want active clients to stay 
active? i would think we would want to invalidate all client sessions.

the discussions on this issue seem to look at working around (while ignoring) 
these two issues and letting clients that are able to keep going continue to 
work as normal.

is this an accurate assessment? (sorry i lost track of this thread.)

 Invalid session id causes infinite loop during automatic reconnect
 --

 Key: ZOOKEEPER-832
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1, 3.4.5, 3.5.0
 Environment: All
Reporter: Ryan Holmes
Assignee: Germán Blanco
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, 
 ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch


 Steps to reproduce:
 1.) Connect to a standalone server using the Java client.
 2.) Stop the server.
 3.) Delete the contents of the data directory (i.e. the persisted session 
 data).
 4.) Start the server.
 The client now automatically tries to reconnect but the server refuses the 
 connection because the session id is invalid. The client and server are now 
 in an infinite loop of attempted and rejected connections. While this 
 situation represents a catastrophic failure and the current behavior is not 
 incorrect, it appears that there is no way to detect this situation on the 
 client and therefore no way to recover.
 The suggested improvement is to send an event to the default watcher 
 indicating that the current state is session invalid, similar to how the 
 session expired state is handled.
 Server log output (repeats indefinitely):
 2010-08-05 11:48:08,283 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - 
 Accepted socket connection from /127.0.0.1:63292
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing 
 session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last 
 zxid is 0x0 client must try another server
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed 
 socket connection for client /127.0.0.1:63292 (no session established for 
 client)
 Client log output (repeats indefinitely):
 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - 
 Opening socket connection to server localhost/127.0.0.1:2181
 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 
 0x12a3ae4e893000a for server null, unexpected error, closing socket 
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring 
 exception during shutdown input
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
   at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring 
 exception during shutdown output
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
   at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos

2013-11-23 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830747#comment-13830747
 ] 

Benjamin Reed commented on ZOOKEEPER-1742:
--

here is the message that failed to post last night: i took a look at simply 
disabling the tests that don't work. for zktest-st i could get all but a dozen 
tests to pass, but there are so many tests that fail on zktest-mt that it 
becomes worthless. i suggest that we ignore this issue for the next release.

in short. i concur with german.

 make check doesn't work on macos
 --

 Key: ZOOKEEPER-1742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.5, 3.5.0
Reporter: Flavio Junqueira
Assignee: Germán Blanco
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742-3.4.patch, 
 ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742-3.4.patch, 
 ZOOKEEPER-1742.patch, ZOOKEEPER-1742.patch, ZOOKEEPER-1742.patch, 
 ZOOKEEPER-1742.patch, ZOOKEEPER-1742.patch


 There are two problems I have spotted when running make check with the C 
 client. First, it complains that the sleep call is not defined in two test 
 files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. 
 Including unistd.h works. The second problem is with linker options. It 
 complains that --wrap is not a valid. I'm not sure how to deal with this 
 one yet, since I'm not sure why we are using it.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos

2013-11-22 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830577#comment-13830577
 ] 

Benjamin Reed commented on ZOOKEEPER-1742:
--

it doesn't work on 3.4 either :(

 make check doesn't work on macos
 --

 Key: ZOOKEEPER-1742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.5, 3.5.0
Reporter: Flavio Junqueira
Assignee: Germán Blanco
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742-3.4.patch, 
 ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch, 
 ZOOKEEPER-1742.patch, ZOOKEEPER-1742.patch, ZOOKEEPER-1742.patch, 
 ZOOKEEPER-1742.patch


 There are two problems I have spotted when running make check with the C 
 client. First, it complains that the sleep call is not defined in two test 
 files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. 
 Including unistd.h works. The second problem is with linker options. It 
 complains that --wrap is not a valid. I'm not sure how to deal with this 
 one yet, since I'm not sure why we are using it.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos

2013-11-21 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829739#comment-13829739
 ] 

Benjamin Reed commented on ZOOKEEPER-1742:
--

this still fails for me on my mac. i think the time mocking doesn't seem to be 
working properly.

 make check doesn't work on macos
 --

 Key: ZOOKEEPER-1742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.5, 3.5.0
Reporter: Flavio Junqueira
Assignee: Germán Blanco
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742-3.4.patch, 
 ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch, 
 ZOOKEEPER-1742.patch, ZOOKEEPER-1742.patch, ZOOKEEPER-1742.patch, 
 ZOOKEEPER-1742.patch


 There are two problems I have spotted when running make check with the C 
 client. First, it complains that the sleep call is not defined in two test 
 files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. 
 Including unistd.h works. The second problem is with linker options. It 
 complains that --wrap is not a valid. I'm not sure how to deal with this 
 one yet, since I'm not sure why we are using it.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos

2013-11-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13819839#comment-13819839
 ] 

Benjamin Reed commented on ZOOKEEPER-1742:
--

doing the tests on mac osx is hard because the linker does not have the 
functionality that we need to hook into the mock framework. it's only a subset 
of the tests, so we would still have some coverage on osx. to really fix it we 
should probably move to a robust cross-platform mock framework.

 make check doesn't work on macos
 --

 Key: ZOOKEEPER-1742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Benjamin Reed
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch


 There are two problems I have spotted when running make check with the C 
 client. First, it complains that the sleep call is not defined in two test 
 files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. 
 Including unistd.h works. The second problem is with linker options. It 
 complains that --wrap is not a valid. I'm not sure how to deal with this 
 one yet, since I'm not sure why we are using it.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos

2013-11-01 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811314#comment-13811314
 ] 

Benjamin Reed commented on ZOOKEEPER-1742:
--

i propose that i open up another issue to not build these tests on the Mac and 
do a patch there and then lower the priority of this issue. does that sound ok?

 make check doesn't work on macos
 --

 Key: ZOOKEEPER-1742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Benjamin Reed
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch


 There are two problems I have spotted when running make check with the C 
 client. First, it complains that the sleep call is not defined in two test 
 files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. 
 Including unistd.h works. The second problem is with linker options. It 
 complains that --wrap is not a valid. I'm not sure how to deal with this 
 one yet, since I'm not sure why we are using it.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1783) Distinguish initial configuration from first established configuration

2013-10-15 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796361#comment-13796361
 ] 

Benjamin Reed commented on ZOOKEEPER-1783:
--

looks great alex! sorry to nit pick but you have at least one stray tab, a 
couple of lines with an indentation of 2 spaces, and a couple with what looks 
like 8 spaces ( or it might be a tab)

 Distinguish initial configuration from first established configuration
 --

 Key: ZOOKEEPER-1783
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1783
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1783.patch, ZOOKEEPER-1783-ver1.patch, 
 ZOOKEEPER-1783-ver2.patch, ZOOKEEPER-1783-ver3.patch, 
 ZOOKEEPER-1783-ver4.patch, ZOOKEEPER-1783-ver5.patch, 
 ZOOKEEPER-1783-ver6.patch, ZOOKEEPER-1783-ver7.patch


 We need a way to distinguish an initial config of a server and an initial 
 config of a running ensemble (before any reconfigs happen). Currently both 
 have version 0. 
 The version of a config increases with each reconfiguration, so the problem 
 is just with the initial config.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1783) Distinguish initial configuration from first established configuration

2013-10-15 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796412#comment-13796412
 ] 

Benjamin Reed commented on ZOOKEEPER-1783:
--

+1 looks great thanx for sticking with it.

 Distinguish initial configuration from first established configuration
 --

 Key: ZOOKEEPER-1783
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1783
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1783.patch, ZOOKEEPER-1783-ver1.patch, 
 ZOOKEEPER-1783-ver2.patch, ZOOKEEPER-1783-ver3.patch, 
 ZOOKEEPER-1783-ver4.patch, ZOOKEEPER-1783-ver5.patch, 
 ZOOKEEPER-1783-ver6.patch, ZOOKEEPER-1783-ver7.patch, 
 ZOOKEEPER-1783-ver8.patch


 We need a way to distinguish an initial config of a server and an initial 
 config of a running ensemble (before any reconfigs happen). Currently both 
 have version 0. 
 The version of a config increases with each reconfiguration, so the problem 
 is just with the initial config.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1783) Distinguish initial configuration from first established configuration

2013-10-13 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793918#comment-13793918
 ] 

Benjamin Reed commented on ZOOKEEPER-1783:
--

in Leader.java you create a new quorum verifier and update it with a the 
current zxid. couldn't you have just updated the current quorum verifier.

also can you explain the changes in QuorumPeer. i'm not sure how they are 
related.

 Distinguish initial configuration from first established configuration
 --

 Key: ZOOKEEPER-1783
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1783
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1783.patch, ZOOKEEPER-1783-ver1.patch, 
 ZOOKEEPER-1783-ver2.patch, ZOOKEEPER-1783-ver3.patch, 
 ZOOKEEPER-1783-ver4.patch, ZOOKEEPER-1783-ver5.patch, 
 ZOOKEEPER-1783-ver6.patch


 We need a way to distinguish an initial config of a server and an initial 
 config of a running ensemble (before any reconfigs happen). Currently both 
 have version 0. 
 The version of a config increases with each reconfiguration, so the problem 
 is just with the initial config.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos

2013-10-13 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793925#comment-13793925
 ] 

Benjamin Reed commented on ZOOKEEPER-1742:
--

the --wrap is only used with the tests, so it shouldn't be a problem in 
general. i think the reason the dlsym isn't working is because we are linking 
with the static libraries as you mention. we should really be linking to the 
shared libraries.

 make check doesn't work on macos
 --

 Key: ZOOKEEPER-1742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Benjamin Reed
 Fix For: 3.4.6, 3.5.0


 There are two problems I have spotted when running make check with the C 
 client. First, it complains that the sleep call is not defined in two test 
 files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. 
 Including unistd.h works. The second problem is with linker options. It 
 complains that --wrap is not a valid. I'm not sure how to deal with this 
 one yet, since I'm not sure why we are using it.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos

2013-10-13 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793927#comment-13793927
 ] 

Benjamin Reed commented on ZOOKEEPER-1742:
--

@pat you should split out the ubuntu build problem. there are just some 
unistd.h includes missing

 make check doesn't work on macos
 --

 Key: ZOOKEEPER-1742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Benjamin Reed
 Fix For: 3.4.6, 3.5.0


 There are two problems I have spotted when running make check with the C 
 client. First, it complains that the sleep call is not defined in two test 
 files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. 
 Including unistd.h works. The second problem is with linker options. It 
 complains that --wrap is not a valid. I'm not sure how to deal with this 
 one yet, since I'm not sure why we are using it.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1783) Distinguish initial configuration from first established configuration

2013-10-12 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793513#comment-13793513
 ] 

Benjamin Reed commented on ZOOKEEPER-1783:
--

can you describe the problem a bit more? is the scenario that you have an 
initial ensemble and then you are adding a new server and it also has a 
configuration version 0?

 Distinguish initial configuration from first established configuration
 --

 Key: ZOOKEEPER-1783
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1783
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.5.0
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1783.patch, ZOOKEEPER-1783-ver1.patch, 
 ZOOKEEPER-1783-ver2.patch, ZOOKEEPER-1783-ver3.patch, 
 ZOOKEEPER-1783-ver4.patch, ZOOKEEPER-1783-ver5.patch


 We need a way to distinguish an initial config of a server and an initial 
 config of a running ensemble (before any reconfigs happen). Currently both 
 have version 0. 
 The version of a config increases with each reconfiguration, so the problem 
 is just with the initial config.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible

2013-10-12 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793517#comment-13793517
 ] 

Benjamin Reed commented on ZOOKEEPER-1499:
--

+1 looks good alex!

 clientPort config changes not backwards-compatible
 --

 Key: ZOOKEEPER-1499
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.0
Reporter: Camille Fournier
Assignee: Alexander Shraer
Priority: Blocker
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1499.patch, ZOOKEEPER-1499-ver1.java, 
 ZOOKEEPER-1499-ver2.java, ZOOKEEPER-1499-ver3.patch


 With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer 
 gets read, and clients can't connect without adding ;2181 to the end of their 
 server lines. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1652) zookeeper java client does a reverse dns lookup when connecting

2013-10-12 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793518#comment-13793518
 ] 

Benjamin Reed commented on ZOOKEEPER-1652:
--

+1 nice fix

 zookeeper java client does a reverse dns lookup when connecting
 ---

 Key: ZOOKEEPER-1652
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1652
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.5
Reporter: Sean Bridges
Assignee: Sean Bridges
Priority: Critical
 Attachments: ZOOKEEPER-1652.patch


 When connecting to zookeeper, the client does a reverse dns lookup on the 
 hostname.  In our environment, the reverse dns lookup takes 5 seconds to 
 fail, causing zookeeper clients to connect slowly.
 The reverse dns lookup occurs in ClientCnx in the calls to adr.getHostName()
 {code}
 setName(getName().replaceAll(\\(.*\\),
 ( + addr.getHostName() + : + addr.getPort() + )));
 try {
 zooKeeperSaslClient = new 
 ZooKeeperSaslClient(zookeeper/+addr.getHostName());
 } catch (LoginException e) {
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1666) Avoid Reverse DNS lookup if the hostname in connection string is literal IP address.

2013-10-12 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793520#comment-13793520
 ] 

Benjamin Reed commented on ZOOKEEPER-1666:
--

+1 i think this is a java only problem. we don't do ip - hostname resolutions 
in C.

 Avoid Reverse DNS lookup if the hostname in connection string is literal IP 
 address.
 

 Key: ZOOKEEPER-1666
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1666
 Project: ZooKeeper
  Issue Type: Improvement
  Components: java client
Reporter: George Cao
Assignee: George Cao
  Labels: patch, test
 Attachments: ZOOKEEPER-1666.patch, ZOOKEEPER-1666.patch


 In our ENV, if the InetSocketAddress.getHostName() is called and the host 
 name in the connection string are literal IP address, then the call will 
 trigger a reverse DNS lookup which is very slow.
 And in this situation, the host name can simply set as the IP without causing 
 any problem. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible

2013-10-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-1499:
-

Attachment: ZOOKEEPER-1499.patch

isn't the problem that we aren't using the wildcard address if the client 
address isn't specified? this patch should fix it.

 clientPort config changes not backwards-compatible
 --

 Key: ZOOKEEPER-1499
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.0
Reporter: Camille Fournier
Assignee: Alexander Shraer
Priority: Blocker
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1499.patch


 With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer 
 gets read, and clients can't connect without adding ;2181 to the end of their 
 server lines. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1774) QuorumPeerMainTest fails consistently with complains about host assertion failure

2013-10-07 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787936#comment-13787936
 ] 

Benjamin Reed commented on ZOOKEEPER-1774:
--

it could be dns timeout issues. we only wait about 5 seconds for the error to 
happen, but your DNS setup may have timeouts longer than that. perhaps we 
should bump the timeout on line 404 to be more like 15 seconds.

 QuorumPeerMainTest fails consistently with complains about host assertion 
 failure
 ---

 Key: ZOOKEEPER-1774
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1774
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, tests
Affects Versions: 3.4.6
 Environment: Ubuntu 13.04 
 Linux version 3.8.0-30-generic (buildd@akateko) (gcc version 4.7.3 
 (Ubuntu/Linaro 4.7.3-1ubuntu1) ) #44-Ubuntu SMP Thu Aug 22 20:54:42 UTC 2013
 java -version
 java version 1.6.0_45
 Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
 Java HotSpot(TM) Server VM (build 20.45-b01, mixed mode)
Reporter: Patrick Hunt
Priority: Blocker
 Fix For: 3.4.6, 3.5.0


 QuorumPeerMainTest fails consistently with complains about host assertion 
 failure.
 {noformat}
 2013-10-01 16:09:17,962 [myid:] - INFO  
 [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
 testBadPeerAddressInQuorum
 java.lang.AssertionError: complains about host
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testBadPeerAddressInQuorum(QuorumPeerMainTest.java:434)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
   at 
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
   at 
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
   at 
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)
 2013-10-01 16:09:17,963 [myid:] - INFO  [main:ZKTestCase$1@65] - FAILED 
 testBadPeerAddressInQuorum
 java.lang.AssertionError: complains about host
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testBadPeerAddressInQuorum(QuorumPeerMainTest.java:434)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
   at 
 

[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect

2013-10-06 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787921#comment-13787921
 ] 

Benjamin Reed commented on ZOOKEEPER-832:
-

i think approach 1 might be better. a session validation flushes the proposals 
and commits to a follower, so if the session validates, but the client zxid is 
still too high then something is obviously wrong and the client should be 
killed.

although the scenario in which your session data is intact, but your data is 
reset is very bogus to start with. nothing good will happen in that scenario. 
if a server finds session data but the data has been reset, it should also 
reset the session information.

 Invalid session id causes infinite loop during automatic reconnect
 --

 Key: ZOOKEEPER-832
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
 Project: ZooKeeper
  Issue Type: Improvement
  Components: c client, java client
Affects Versions: 3.3.1
 Environment: Mac OS X 10.6.4
 JVM 1.6.0_20
Reporter: Ryan Holmes
Assignee: Germán Blanco
 Fix For: 3.5.0


 Steps to reproduce:
 1.) Connect to a standalone server using the Java client.
 2.) Stop the server.
 3.) Delete the contents of the data directory (i.e. the persisted session 
 data).
 4.) Start the server.
 The client now automatically tries to reconnect but the server refuses the 
 connection because the session id is invalid. The client and server are now 
 in an infinite loop of attempted and rejected connections. While this 
 situation represents a catastrophic failure and the current behavior is not 
 incorrect, it appears that there is no way to detect this situation on the 
 client and therefore no way to recover.
 The suggested improvement is to send an event to the default watcher 
 indicating that the current state is session invalid, similar to how the 
 session expired state is handled.
 Server log output (repeats indefinitely):
 2010-08-05 11:48:08,283 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - 
 Accepted socket connection from /127.0.0.1:63292
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing 
 session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last 
 zxid is 0x0 client must try another server
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed 
 socket connection for client /127.0.0.1:63292 (no session established for 
 client)
 Client log output (repeats indefinitely):
 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - 
 Opening socket connection to server localhost/127.0.0.1:2181
 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 
 0x12a3ae4e893000a for server null, unexpected error, closing socket 
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring 
 exception during shutdown input
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
   at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring 
 exception during shutdown output
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
   at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble

2013-10-06 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787928#comment-13787928
 ] 

Benjamin Reed commented on ZOOKEEPER-1777:
--

i'm curious about the goal here. the scenario is that we have zookeeper servers 
that have suffered permanent data losses (since they were using ram disks) and 
restart with empty data. in effect they are lying: they are voting as if they 
didn't suffer a failure, so our quorum protocols lose their guarantee.

the fix should be to detect the lie and halt. correct?

if you instead detect inconsistent followers and force them to sync up, you may 
get consistency in the ensemble, but you may be inconsistent with reality and 
with what the clients view is.

 Missing ephemeral nodes in one of the members of the ensemble
 -

 Key: ZOOKEEPER-1777
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.5
 Environment: Linux, Java 1.7
Reporter: Germán Blanco
Assignee: Germán Blanco
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, 
 ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz


 In a 3-servers ensemble, one of the followers doesn't see part of the 
 ephemeral nodes that are present in the leader and the other follower. 
 The 8 missing nodes in the follower that is not ok were created in the end 
 of epoch 1, the ensemble is running in epoch 2.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1781) ZooKeeper Server fails if snapCount is set to 1

2013-10-06 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787929#comment-13787929
 ] 

Benjamin Reed commented on ZOOKEEPER-1781:
--

+1 good work!

 ZooKeeper Server fails if snapCount is set to 1 
 

 Key: ZOOKEEPER-1781
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1781
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.5
Reporter: Takashi Ohnishi
Priority: Minor
 Attachments: ZOOKEEPER-1781.patch, ZOOKEEPER-1781.patch


 If snapCount is set to 1, ZooKeeper Server can start but it fails with the 
 below error:
 2013-10-02 18:09:07,600 [myid:1] - ERROR 
 [SyncThread:1:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
 java.lang.IllegalArgumentException: n must be positive
 at java.util.Random.nextInt(Random.java:300)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:93)
 In source code,  it maybe be supposed that snapCount must be 2 or more:
 {code:title=org.apache.zookeeper.server.SyncRequestProcessor.java|borderStyle=solid}
  91 // we do this in an attempt to ensure that not all ofthe 
 servers
  92 // in the ensemble take a snapshot at the same time
  93 int randRoll = r.nextInt(snapCount/2);
 {code}
 I think this supposition is not bad because snapCount = 1 is not realistic 
 setting...
 But, it may be better to mention this restriction in documentation or add a 
 validation in the source code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1763) Upgrade the netty version

2013-09-24 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777096#comment-13777096
 ] 

Benjamin Reed commented on ZOOKEEPER-1763:
--

+1 looks good to me

 Upgrade the netty version
 -

 Key: ZOOKEEPER-1763
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1763
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.4.5
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 3.5.0

 Attachments: 1763.v1.patch


 2 years ago (in https://github.com/netty/netty/issues/103), Netty changed 
 their group-id from org.jboss.netty to io.netty. ZooKeeper is still on the 
 3.2.5, so applications using 3.3 cannot use the maven dependencyManagement 
 feature, as the group id differ. HBase  Hadoop 2 are on the branch 3.3+, 
 with the new group id.
 Note that the netty 4 changes the package name as well. That's not the case 
 for Netty 3.3+.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1760) Provide an interface for check version of a node

2013-09-24 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777132#comment-13777132
 ] 

Benjamin Reed commented on ZOOKEEPER-1760:
--

i agree with flavio. you get this functionality and more with the stat command.

 Provide an interface for check version of a node
 

 Key: ZOOKEEPER-1760
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1760
 Project: ZooKeeper
  Issue Type: New Feature
  Components: java client
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 3.5.0


 The idea of this JIRA is to discuss the check version interface which is used 
 to see the existence of a node for the specified version. Presently only 
 multi transaction api has this interface, this umbrella JIRA is to make 
 'check version' api part of ZooKeeper# main apis and cli command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1629) testTransactionLogCorruption occasionally fails

2013-06-13 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682645#comment-13682645
 ] 

Benjamin Reed commented on ZOOKEEPER-1629:
--

i think alex is right. i think we could test this case with a simpler, more 
reliable test.

 testTransactionLogCorruption occasionally fails
 ---

 Key: ZOOKEEPER-1629
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1629
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.5.0, 3.4.6

 Attachments: TruncateCorruptionTest-patch.patch


 It seems that testTransactionLogCorruption is very flaky,for example fails 
 here:
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/500/
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/503/#showFailuresLink
 also fails for older builds (no longer on the website), for example all 
 builds from 381 to 399.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1697) large snapshots can cause continuous quorum failure

2013-05-02 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647703#comment-13647703
 ] 

Benjamin Reed commented on ZOOKEEPER-1697:
--

ZOOKEEPER-1324 is probably one of the issues, but i think after that gets fixed 
we can run into the issue that pat identified. we don't set tickOfLastAck until 
after we receive the first message from the follower after the leader becomes 
active. after the leader becomes active it will sleep for tickTime/2 and then 
check to see if the followers are synced, which uses tickOfLastAck. that gives 
followers a very small window in which to send the first message. perhaps when 
the leader starts. a simple fix would be to set the tickOfLastAck of all its 
followers to the current tickTime.

 large snapshots can cause continuous quorum failure
 ---

 Key: ZOOKEEPER-1697
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1697
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.5.0, 3.4.6


 I keep seeing this on the leader:
 2013-04-30 01:18:39,754 INFO
 org.apache.zookeeper.server.quorum.Leader: Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:447)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:422)
 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 The followers are downloading the snapshot when this happens, and are
 trying to do their first ACK to the leader, the ack fails with broken
 pipe.
 In this case the snapshots are large and the config has increased the
 initLimit. syncLimit is small - 10 or so with ticktime of 2000. Note
 this is 3.4.3 with ZOOKEEPER-1521 applied.
 I originally speculated that
 https://issues.apache.org/jira/browse/ZOOKEEPER-1521 might be related.
 I thought I might have broken something for this environment. That
 doesn't look to be the case.
 As it looks now it seems that 1521 didn't go far enough. The leader
 verifies that all followers have ACK'd to the leader within the last
 syncLimit time period. This runs all the time in the background on
 the leader to identify the case where a follower drops. In this case
 the followers take so long to load the snapshot that this check fails
 the very first time, as a result the leader drops (not enough ack'd
 followers w/in the sync limit) and re-election happens. This repeats
 forever. (the above error)
 this is the call:
 org.apache.zookeeper.server.quorum.LearnerHandler.synced() that's at
 odds.
 look at setting of tickOfLastAck in
 org.apache.zookeeper.server.quorum.LearnerHandler.run()
 It's not set until the follower first acks - in this case I can see
 that the followers are not getting to the ack prior to the leader
 shutting down due to the error log above.
 It seems that sync() should probably use the init limit until the
 first ack comes in from the follower. I also see that while tickOfLastAck and 
 leader.self.tick is shared btw two threads there is no synchronization of the 
 shared resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1629) testTransactionLogCorruption occasionally fails

2013-04-02 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619961#comment-13619961
 ] 

Benjamin Reed commented on ZOOKEEPER-1629:
--

ah you are right. it would be good to have the steps as comments in the code. 
that should be rock solid reproducible. we shouldn't disable snapshotting. if 
it really is causing a problem, we should make sure it happens to fix the bug.

 testTransactionLogCorruption occasionally fails
 ---

 Key: ZOOKEEPER-1629
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1629
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Reporter: Alexander Shraer
 Attachments: TruncateCorruptionTest-patch.patch


 It seems that testTransactionLogCorruption is very flaky,for example fails 
 here:
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/500/
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/503/#showFailuresLink
 also fails for older builds (no longer on the website), for example all 
 builds from 381 to 399.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1629) testTransactionLogCorruption occasionally fails

2013-04-01 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-1629:
-

Summary: testTransactionLogCorruption occasionally fails  (was: 
testTrancationLogCorruption occasionally fails)

 testTransactionLogCorruption occasionally fails
 ---

 Key: ZOOKEEPER-1629
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1629
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Reporter: Alexander Shraer
 Attachments: TruncateCorruptionTest-patch.patch


 It seems that testTransactionLogCorruption is very flaky,for example fails 
 here:
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/500/
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/503/#showFailuresLink
 also fails for older builds (no longer on the website), for example all 
 builds from 381 to 399.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1629) testTransactionLogCorruption occasionally fails

2013-04-01 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618850#comment-13618850
 ] 

Benjamin Reed commented on ZOOKEEPER-1629:
--

can we get a few more comments in the code on the overall strategy of the test. 
wrapper1 uses a different port that the rest and then there is some port 
forwarding going on. i understand that the objective is to do an add after a 
trunc, but i'm not sure how/why it is achieved with this code.


 testTransactionLogCorruption occasionally fails
 ---

 Key: ZOOKEEPER-1629
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1629
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Reporter: Alexander Shraer
 Attachments: TruncateCorruptionTest-patch.patch


 It seems that testTransactionLogCorruption is very flaky,for example fails 
 here:
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/500/
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/503/#showFailuresLink
 also fails for older builds (no longer on the website), for example all 
 builds from 381 to 399.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2013-03-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13599688#comment-13599688
 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-

it passed jenkins right before the commit. how did that happen? is jenkins not 
running properly or do we have a race condition?

 Allow dynamic changes to server cluster membership
 --

 Key: ZOOKEEPER-107
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: SimpleAddition.rtf, zkreconfig-usenixatc-final.pdf, 
 ZOOKEEPER-107-14-Jan.patch, ZOOKEEPER-107-14-Oct.patch, 
 ZOOKEEPER-107-15-Oct.patch, ZOOKEEPER-107-15-Oct-ver1.patch, 
 ZOOKEEPER-107-15-Oct-ver2.patch, ZOOKEEPER-107-15-Oct-ver3.patch, 
 ZOOKEEPER-107-16-Jan.patch, ZOOKEEPER-107-17-Jan.patch, 
 ZOOKEEPER-107-18-Jan.patch, ZOOKEEPER-107-18-Jan-ver2.patch, 
 ZOOKEEPER-107-1-Mar.patch, ZOOKEEPER-107-20-Jan.patch, 
 ZOOKEEPER-107-20-July.patch, ZOOKEEPER-107-21-July.patch, 
 ZOOKEEPER-107-22-Apr.patch, ZOOKEEPER-107-23-SEP.patch, 
 ZOOKEEPER-107-24-Jan.patch, ZOOKEEPER-107-28-Feb.patch, 
 ZOOKEEPER-107-28-Feb.patch, ZOOKEEPER-107-28-NOV-ver2.patch, 
 ZOOKEEPER-107-29-Feb.patch, ZOOKEEPER-107-2-Mar.patch, 
 ZOOKEEPER-107-3-Oct.patch, ZOOKEEPER-107-4-Feb.patch, 
 ZOOKEEPER-107-4-Feb-ver1.patch, ZOOKEEPER-107-4-Feb-ver2.patch, 
 ZOOKEEPER-107-4-Feb-ver2.patch, ZOOKEEPER-107-5-Mar.patch, 
 ZOOKEEPER-107-5-Mar-ver2.patch, ZOOKEEPER-107-6-NOV-2.patch, 
 ZOOKEEPER-107-7-NOV.patch, ZOOKEEPER-107-7-NOV-ver1.patch, 
 ZOOKEEPER-107-7-NOV-ver2.patch, ZOOKEEPER-107-Aug-20.patch, 
 ZOOKEEPER-107-Aug-20-ver1.patch, ZOOKEEPER-107-Aug-25.patch, 
 ZOOKEEPER-107.patch, ZOOKEEPER-107.patch, zookeeper-3.4.0.jar, 
 zookeeper-dev-fatjar.jar, zookeeper-reconfig-sep11.patch, 
 zookeeper-reconfig-sep12.patch, zoo_replicated1.cfg, zoo_replicated1.members


 Currently cluster membership is statically defined, adding/removing hosts 
 to/from the server cluster dynamically needs to be supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2013-03-06 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595583#comment-13595583
 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-

Committed revision 1453693.
great work everyone!

 Allow dynamic changes to server cluster membership
 --

 Key: ZOOKEEPER-107
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: SimpleAddition.rtf, zkreconfig-usenixatc-final.pdf, 
 ZOOKEEPER-107-14-Jan.patch, ZOOKEEPER-107-14-Oct.patch, 
 ZOOKEEPER-107-15-Oct.patch, ZOOKEEPER-107-15-Oct-ver1.patch, 
 ZOOKEEPER-107-15-Oct-ver2.patch, ZOOKEEPER-107-15-Oct-ver3.patch, 
 ZOOKEEPER-107-16-Jan.patch, ZOOKEEPER-107-17-Jan.patch, 
 ZOOKEEPER-107-18-Jan.patch, ZOOKEEPER-107-18-Jan-ver2.patch, 
 ZOOKEEPER-107-1-Mar.patch, ZOOKEEPER-107-20-Jan.patch, 
 ZOOKEEPER-107-20-July.patch, ZOOKEEPER-107-21-July.patch, 
 ZOOKEEPER-107-22-Apr.patch, ZOOKEEPER-107-23-SEP.patch, 
 ZOOKEEPER-107-24-Jan.patch, ZOOKEEPER-107-28-Feb.patch, 
 ZOOKEEPER-107-28-Feb.patch, ZOOKEEPER-107-28-NOV-ver2.patch, 
 ZOOKEEPER-107-29-Feb.patch, ZOOKEEPER-107-2-Mar.patch, 
 ZOOKEEPER-107-3-Oct.patch, ZOOKEEPER-107-4-Feb.patch, 
 ZOOKEEPER-107-4-Feb-ver1.patch, ZOOKEEPER-107-4-Feb-ver2.patch, 
 ZOOKEEPER-107-4-Feb-ver2.patch, ZOOKEEPER-107-5-Mar.patch, 
 ZOOKEEPER-107-5-Mar-ver2.patch, ZOOKEEPER-107-6-NOV-2.patch, 
 ZOOKEEPER-107-7-NOV.patch, ZOOKEEPER-107-7-NOV-ver1.patch, 
 ZOOKEEPER-107-7-NOV-ver2.patch, ZOOKEEPER-107-Aug-20.patch, 
 ZOOKEEPER-107-Aug-20-ver1.patch, ZOOKEEPER-107-Aug-25.patch, 
 ZOOKEEPER-107.patch, zookeeper-3.4.0.jar, zookeeper-dev-fatjar.jar, 
 zookeeper-reconfig-sep11.patch, zookeeper-reconfig-sep12.patch, 
 zoo_replicated1.cfg, zoo_replicated1.members


 Currently cluster membership is statically defined, adding/removing hosts 
 to/from the server cluster dynamically needs to be supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2013-03-05 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594396#comment-13594396
 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-

+1 awesome work you guys! looks ready to me. any objections?

 Allow dynamic changes to server cluster membership
 --

 Key: ZOOKEEPER-107
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: SimpleAddition.rtf, zkreconfig-usenixatc-final.pdf, 
 ZOOKEEPER-107-14-Jan.patch, ZOOKEEPER-107-14-Oct.patch, 
 ZOOKEEPER-107-15-Oct.patch, ZOOKEEPER-107-15-Oct-ver1.patch, 
 ZOOKEEPER-107-15-Oct-ver2.patch, ZOOKEEPER-107-15-Oct-ver3.patch, 
 ZOOKEEPER-107-16-Jan.patch, ZOOKEEPER-107-17-Jan.patch, 
 ZOOKEEPER-107-18-Jan.patch, ZOOKEEPER-107-18-Jan-ver2.patch, 
 ZOOKEEPER-107-1-Mar.patch, ZOOKEEPER-107-20-Jan.patch, 
 ZOOKEEPER-107-20-July.patch, ZOOKEEPER-107-21-July.patch, 
 ZOOKEEPER-107-22-Apr.patch, ZOOKEEPER-107-23-SEP.patch, 
 ZOOKEEPER-107-24-Jan.patch, ZOOKEEPER-107-28-Feb.patch, 
 ZOOKEEPER-107-28-Feb.patch, ZOOKEEPER-107-28-NOV-ver2.patch, 
 ZOOKEEPER-107-29-Feb.patch, ZOOKEEPER-107-2-Mar.patch, 
 ZOOKEEPER-107-3-Oct.patch, ZOOKEEPER-107-4-Feb.patch, 
 ZOOKEEPER-107-4-Feb-ver1.patch, ZOOKEEPER-107-4-Feb-ver2.patch, 
 ZOOKEEPER-107-4-Feb-ver2.patch, ZOOKEEPER-107-5-Mar.patch, 
 ZOOKEEPER-107-5-Mar-ver2.patch, ZOOKEEPER-107-6-NOV-2.patch, 
 ZOOKEEPER-107-7-NOV.patch, ZOOKEEPER-107-7-NOV-ver1.patch, 
 ZOOKEEPER-107-7-NOV-ver2.patch, ZOOKEEPER-107-Aug-20.patch, 
 ZOOKEEPER-107-Aug-20-ver1.patch, ZOOKEEPER-107-Aug-25.patch, 
 ZOOKEEPER-107.patch, zookeeper-3.4.0.jar, zookeeper-dev-fatjar.jar, 
 zookeeper-reconfig-sep11.patch, zookeeper-reconfig-sep12.patch, 
 zoo_replicated1.cfg, zoo_replicated1.members


 Currently cluster membership is statically defined, adding/removing hosts 
 to/from the server cluster dynamically needs to be supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1366) Zookeeper should be tolerant of clock adjustments

2013-02-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575664#comment-13575664
 ] 

Benjamin Reed commented on ZOOKEEPER-1366:
--

i was just reviewing this issue and reading my comments that i had recently 
made. and they sounded like me but i couldn't remember making them last month. 
then i realized that i made them in 2012!!! we really need to get this patch 
in. is someone working on it?

 Zookeeper should be tolerant of clock adjustments
 -

 Key: ZOOKEEPER-1366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1366
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1366-3.3.3.patch, ZOOKEEPER-1366.patch, 
 ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch


 If you want to wreak havoc on a ZK based system just do [date -s +1hour] 
 and watch the mayhem as all sessions expire at once.
 This shouldn't happen.  Zookeeper could easily know handle elapsed times as 
 elapsed times rather than as differences between absolute times.  The 
 absolute times are subject to adjustment when the clock is set while a timer 
 is not subject to this problem.  In Java, System.currentTimeMillis() gives 
 you absolute time while System.nanoTime() gives you time based on a timer 
 from an arbitrary epoch.
 I have done this and have been running tests now for some tens of minutes 
 with no failures.  I will set up a test machine to redo the build again on 
 Ubuntu and post a patch here for discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1643) Windows: fetch_and_add not 64bit-compatible, may not be correct

2013-02-10 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575593#comment-13575593
 ] 

Benjamin Reed commented on ZOOKEEPER-1643:
--

i think it would be great to use intrinsics here! michi is the one that wrote 
that code. the first lock addl with an operand of 0 is interesting. i'm not 
sure why that is needed. do you know michi? it is not incorrect though. the 
lock makes the operation atomic.

 Windows: fetch_and_add not 64bit-compatible, may not be correct
 ---

 Key: ZOOKEEPER-1643
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1643
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3
 Environment: Windows 7
 Microsoft Visual Studio 2005
Reporter: Erik Anderson

 Note: While I am using a really old version of ZK, I did do enough SVN 
 Blame operations to realize that this code hasn't changed.
 I am currently attempting to compile the C client under MSVC 2005 arch=x64.  
 There are three things I can see with fetch_and_add() inside of 
 /src/c/src/mt_adapter.c
 (1) MSVC 2005 64bit will not compile inline _asm sections.  I'm moderately 
 sure this code is x86-specific so I'm unsure whether it should attempt to 
 either.
 (2) The Windows intrinsic InterlockedExchangeAdd 
 [http://msdn.microsoft.com/en-us/library/windows/desktop/ms683597(v=vs.85).aspx]
  appears to do the same thing this code is attempting to do
 (3) I'm really rusty on my assembly, but why are we doing two separate XADD 
 operations here, and is the code as-written anything approaching atomicity?
 If you want an official patch I likely can do an SVN checkout and submit a 
 patch the replaces the entire #else on lines 495-505 with a return 
 InterlockedExchangeAdd(operand, incr);
 Usually when I'm scratching my head this badly there's something I'm missing 
 though.  As far as I can tell there has been no prior discussion on this code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1643) Windows: fetch_and_add not 64bit-compatible, may not be correct

2013-02-10 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575594#comment-13575594
 ] 

Benjamin Reed commented on ZOOKEEPER-1643:
--

we should also use __sync_fetch_and_add in the non windows case :)

 Windows: fetch_and_add not 64bit-compatible, may not be correct
 ---

 Key: ZOOKEEPER-1643
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1643
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3
 Environment: Windows 7
 Microsoft Visual Studio 2005
Reporter: Erik Anderson

 Note: While I am using a really old version of ZK, I did do enough SVN 
 Blame operations to realize that this code hasn't changed.
 I am currently attempting to compile the C client under MSVC 2005 arch=x64.  
 There are three things I can see with fetch_and_add() inside of 
 /src/c/src/mt_adapter.c
 (1) MSVC 2005 64bit will not compile inline _asm sections.  I'm moderately 
 sure this code is x86-specific so I'm unsure whether it should attempt to 
 either.
 (2) The Windows intrinsic InterlockedExchangeAdd 
 [http://msdn.microsoft.com/en-us/library/windows/desktop/ms683597(v=vs.85).aspx]
  appears to do the same thing this code is attempting to do
 (3) I'm really rusty on my assembly, but why are we doing two separate XADD 
 operations here, and is the code as-written anything approaching atomicity?
 If you want an official patch I likely can do an SVN checkout and submit a 
 patch the replaces the entire #else on lines 495-505 with a return 
 InterlockedExchangeAdd(operand, incr);
 Usually when I'm scratching my head this badly there's something I'm missing 
 though.  As far as I can tell there has been no prior discussion on this code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1643) Windows: fetch_and_add not 64bit-compatible, may not be correct

2013-02-10 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575628#comment-13575628
 ] 

Benjamin Reed commented on ZOOKEEPER-1643:
--

i really think you only need the one. (that is the case for non-windows.) 
you'll notice that the first xadd is adding 0, so there is no double add.

 Windows: fetch_and_add not 64bit-compatible, may not be correct
 ---

 Key: ZOOKEEPER-1643
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1643
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3
 Environment: Windows 7
 Microsoft Visual Studio 2005
Reporter: Erik Anderson

 Note: While I am using a really old version of ZK, I did do enough SVN 
 Blame operations to realize that this code hasn't changed.
 I am currently attempting to compile the C client under MSVC 2005 arch=x64.  
 There are three things I can see with fetch_and_add() inside of 
 /src/c/src/mt_adapter.c
 (1) MSVC 2005 64bit will not compile inline _asm sections.  I'm moderately 
 sure this code is x86-specific so I'm unsure whether it should attempt to 
 either.
 (2) The Windows intrinsic InterlockedExchangeAdd 
 [http://msdn.microsoft.com/en-us/library/windows/desktop/ms683597(v=vs.85).aspx]
  appears to do the same thing this code is attempting to do
 (3) I'm really rusty on my assembly, but why are we doing two separate XADD 
 operations here, and is the code as-written anything approaching atomicity?
 If you want an official patch I likely can do an SVN checkout and submit a 
 patch the replaces the entire #else on lines 495-505 with a return 
 InterlockedExchangeAdd(operand, incr);
 Usually when I'm scratching my head this badly there's something I'm missing 
 though.  As far as I can tell there has been no prior discussion on this code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2013-02-06 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572606#comment-13572606
 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-

great work you guys! can you comment on why FdLeakTest was removed? also, i 
think we should have added LearnerInfoVX, where X is the protocol number rather 
than changing LearnerInfo. (we did bump the protocol. right?)

 Allow dynamic changes to server cluster membership
 --

 Key: ZOOKEEPER-107
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: SimpleAddition.rtf, zkreconfig-usenixatc-final.pdf, 
 ZOOKEEPER-107-14-Jan.patch, ZOOKEEPER-107-14-Oct.patch, 
 ZOOKEEPER-107-15-Oct.patch, ZOOKEEPER-107-15-Oct-ver1.patch, 
 ZOOKEEPER-107-15-Oct-ver2.patch, ZOOKEEPER-107-15-Oct-ver3.patch, 
 ZOOKEEPER-107-16-Jan.patch, ZOOKEEPER-107-17-Jan.patch, 
 ZOOKEEPER-107-18-Jan.patch, ZOOKEEPER-107-18-Jan-ver2.patch, 
 ZOOKEEPER-107-1-Mar.patch, ZOOKEEPER-107-20-Jan.patch, 
 ZOOKEEPER-107-20-July.patch, ZOOKEEPER-107-21-July.patch, 
 ZOOKEEPER-107-22-Apr.patch, ZOOKEEPER-107-23-SEP.patch, 
 ZOOKEEPER-107-24-Jan.patch, ZOOKEEPER-107-28-Feb.patch, 
 ZOOKEEPER-107-28-Feb.patch, ZOOKEEPER-107-28-NOV-ver2.patch, 
 ZOOKEEPER-107-29-Feb.patch, ZOOKEEPER-107-3-Oct.patch, 
 ZOOKEEPER-107-4-Feb.patch, ZOOKEEPER-107-4-Feb-ver1.patch, 
 ZOOKEEPER-107-4-Feb-ver2.patch, ZOOKEEPER-107-4-Feb-ver2.patch, 
 ZOOKEEPER-107-6-NOV-2.patch, ZOOKEEPER-107-7-NOV.patch, 
 ZOOKEEPER-107-7-NOV-ver1.patch, ZOOKEEPER-107-7-NOV-ver2.patch, 
 ZOOKEEPER-107-Aug-20.patch, ZOOKEEPER-107-Aug-20-ver1.patch, 
 ZOOKEEPER-107-Aug-25.patch, zookeeper-3.4.0.jar, zookeeper-dev-fatjar.jar, 
 zookeeper-reconfig-sep11.patch, zookeeper-reconfig-sep12.patch, 
 zoo_replicated1.cfg, zoo_replicated1.members


 Currently cluster membership is statically defined, adding/removing hosts 
 to/from the server cluster dynamically needs to be supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   >