from:"Flavio Junqueira \(JIRA\)"

[jira] [Updated] (ZOOKEEPER-2389) read-only observer doesn't load transaction log when transitioning to read-only

2016-03-16 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2389:

Fix Version/s: 3.6.0
   3.5.2
   3.4.9

> read-only observer doesn't load transaction log when transitioning to 
> read-only
> ---
>
> Key: ZOOKEEPER-2389
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2389
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.8
>Reporter: Jason Rosenberg
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
>
> I have rediscovered an issue, that was apparently posted a while back (link 
> below).  It seems that if I configure an Observer node to be enabled for 
> read-only mode, with syncEnabled = true, it properly syncs its transaction 
> log with the quorum.  However, if I shut down the quorum participants, and 
> the Observer automatically transitions to read-only mode, it does not load 
> the saved transaction log, and thus rejects any client connection with a zxid 
> > 0.  But If I restart the Observer node, it reloads it's persisted 
> transaction log and serves read-only requests at the latest zxid.  Is this 
> the correct behavior? Things run fine if instead of an observer, I do the 
> same with a read-only participant.  In this case, it transitions without 
> issue to a read-only server, and serves the current transaction log.
> It seems to me this issue renders read-only observers completely useless.  
> What am I missing here?
> I'm seeing this with 3.4.8
> It seems this was discovered and reported a long time ago here:
> http://grokbase.com/t/zookeeper/user/14c16b1d22/issue-with-zxid-during-observer-failover-to-read-only



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2044) CancelledKeyException in zookeeper 3.4.5

2016-03-13 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192378#comment-15192378
 ] 

Flavio Junqueira commented on ZOOKEEPER-2044:
-

I think it is very low risk, but I also don't think it is strictly necessary to 
have it in 3.4. If you're uneasy about having this one in 3.4 because it is 
mostly an improvement, I won't oppose [~phunt].
  

> CancelledKeyException in zookeeper 3.4.5
> 
>
> Key: ZOOKEEPER-2044
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2044
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
> Environment: Red Hat Enterprise Linux Server release 6.2
>Reporter: shamjith antholi
>Assignee: Flavio Junqueira
>Priority: Minor
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2044.patch, ZOOKEEPER-2044.patch
>
>
> I am getting cancelled key exception in zookeeper (version 3.4.5). Please see 
> the log below. When this error is thrown, the connected solr shard is going 
> down by giving the error "Failed to index metadata in 
> Solr,StackTrace=SolrError: HTTP status 503.Reason: 
> {"responseHeader":{"status":503,"QTime":204},"error":{"msg":"ClusterState 
> says we are the leader, but locally we don't think so","code":503"  and 
> ultimately the current activity is going down. Could you please give a 
> solution for this ?
> Zookeper log 
> --
> 2014-09-16 02:58:47,799 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
> attempting to renew session 0x24868e7ca980003 at /172.22.0.5:58587
> 2014-09-16 02:58:47,800 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating 
> client: 0x24868e7ca980003
> 2014-09-16 02:58:47,802 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@588] - Invalid 
> session 0x24868e7ca980003 for client /172.22.0.5:58587, probably expired
> 2014-09-16 02:58:47,803 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed 
> socket connection for client /172.22.0.5:58587 which had sessionid 
> 0x24868e7ca980003
> 2014-09-16 02:58:47,810 [myid:1] - ERROR 
> [CommitProcessor:1:NIOServerCnxn@180] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1113)
> at org.apache.zookeeper.server.DataTree.setWatches(DataTree.java:1327)
> at 
> org.apache.zookeeper.server.ZKDatabase.setWatches(ZKDatabase.java:384)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:304)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2044) CancelledKeyException in zookeeper 3.4.5

2016-03-13 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2044:

Fix Version/s: 3.6.0
   3.5.2

> CancelledKeyException in zookeeper 3.4.5
> 
>
> Key: ZOOKEEPER-2044
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2044
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
> Environment: Red Hat Enterprise Linux Server release 6.2
>Reporter: shamjith antholi
>Assignee: Flavio Junqueira
>Priority: Minor
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2044.patch, ZOOKEEPER-2044.patch
>
>
> I am getting cancelled key exception in zookeeper (version 3.4.5). Please see 
> the log below. When this error is thrown, the connected solr shard is going 
> down by giving the error "Failed to index metadata in 
> Solr,StackTrace=SolrError: HTTP status 503.Reason: 
> {"responseHeader":{"status":503,"QTime":204},"error":{"msg":"ClusterState 
> says we are the leader, but locally we don't think so","code":503"  and 
> ultimately the current activity is going down. Could you please give a 
> solution for this ?
> Zookeper log 
> --
> 2014-09-16 02:58:47,799 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
> attempting to renew session 0x24868e7ca980003 at /172.22.0.5:58587
> 2014-09-16 02:58:47,800 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating 
> client: 0x24868e7ca980003
> 2014-09-16 02:58:47,802 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@588] - Invalid 
> session 0x24868e7ca980003 for client /172.22.0.5:58587, probably expired
> 2014-09-16 02:58:47,803 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed 
> socket connection for client /172.22.0.5:58587 which had sessionid 
> 0x24868e7ca980003
> 2014-09-16 02:58:47,810 [myid:1] - ERROR 
> [CommitProcessor:1:NIOServerCnxn@180] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1113)
> at org.apache.zookeeper.server.DataTree.setWatches(DataTree.java:1327)
> at 
> org.apache.zookeeper.server.ZKDatabase.setWatches(ZKDatabase.java:384)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:304)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1604) remove rpm/deb/... packaging

2016-03-13 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192349#comment-15192349
 ] 

Flavio Junqueira commented on ZOOKEEPER-1604:
-

[~eyang] at this point, I'm a bit confused about what you'd like to happen. You 
seem to be dissatisfied with BigTop and you seem to want this community to move 
in a different different direction with respect to packaging. I believe we are 
all trying to move forward as a community, so why don't why start a thread on 
the dev list explaining what is troubling you and what you'd like to happen? 
I'm suggesting it because the scope of this jira is too narrow for what you 
seem to be trying to convey. I also suggest that you try to be specific about 
what you'd like to happen and how you'd like to contribute for it to happen, 
and let's see how the community reacts. Is that fair?

> remove rpm/deb/... packaging
> 
>
> Key: ZOOKEEPER-1604
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1604
> Project: ZooKeeper
>  Issue Type: Task
>  Components: build
>Affects Versions: 3.3.0
>Reporter: Patrick Hunt
>Assignee: Chris Nauroth
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-1604.001.patch, ZOOKEEPER-1604.patch
>
>
> Remove rpm/deb/... packaging from our source repo. Now that BigTop is 
> available and fully supporting ZK it's no longer necessary for us to attempt 
> to include this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value

2016-03-09 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188132#comment-15188132
 ] 

Flavio Junqueira commented on ZOOKEEPER-2384:
-

What I suggested isn't particularly hard to do, but it can be inefficient in 
the presence of contention. If there are many clients trying to do that, then 
they can keep stepping on each other and causing the conditional write to fail.

> Support atomic increment / decrement of znode value
> ---
>
> Key: ZOOKEEPER-2384
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> Use case is to store reference count (integer type) in znode.
> It is desirable to provide support for atomic increment / decrement of the 
> znode value.
> Suggestion from Flavio:
> you can read the znode, keep the version of the znode, update the value, 
> write back conditionally. The condition for the setData operation to succeed 
> is that the version is the same that it read
> While the above is feasible, developer has to implement retry logic 
> him/herself. It is not easy to combine increment / decrement with other 
> operations using multi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2383) Startup race in ZooKeeperServer

2016-03-08 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2383:

Priority: Blocker  (was: Major)

> Startup race in ZooKeeperServer
> ---
>
> Key: ZOOKEEPER-2383
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: jmx, server
>Affects Versions: 3.4.8
>Reporter: Steve Rowe
>Priority: Blocker
> Fix For: 3.4.9
>
> Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, 
> zk-3.4.8-NPE.log
>
>
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 
> (SOLR-8724) I ran into test failures where attempts to create a node in a 
> newly started standalone ZooKeeperServer were failing because of an assertion 
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then 
> registers itself in JMX, but if a connection comes in before the server's JMX 
> registration happens, registration of the connection will fail because it 
> trips the assertion that (effectively) its parent (the server) has already 
> registered itself.
> {code:java|title=ZooKeeperServer.java}
> public synchronized void startup() {
> if (sessionTracker == null) {
> createSessionTracker();
> }
> startSessionTracker();
> setupRequestProcessors();
> registerJMX();
> state = State.RUNNING;
> notifyAll();
> }
> {code}
> {code:java|title=MBeanRegistry.java}
> public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
> throws JMException
> {
> assert bean != null;
> String path = null;
> if (parent != null) {
> path = mapBean2Path.get(parent);
> assert path != null;
> }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this 
> issue with ZK 3.4.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2383) Startup race in ZooKeeperServer

2016-03-08 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2383:

Fix Version/s: 3.4.9

> Startup race in ZooKeeperServer
> ---
>
> Key: ZOOKEEPER-2383
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: jmx, server
>Affects Versions: 3.4.8
>Reporter: Steve Rowe
>Priority: Blocker
> Fix For: 3.4.9
>
> Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, 
> zk-3.4.8-NPE.log
>
>
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 
> (SOLR-8724) I ran into test failures where attempts to create a node in a 
> newly started standalone ZooKeeperServer were failing because of an assertion 
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then 
> registers itself in JMX, but if a connection comes in before the server's JMX 
> registration happens, registration of the connection will fail because it 
> trips the assertion that (effectively) its parent (the server) has already 
> registered itself.
> {code:java|title=ZooKeeperServer.java}
> public synchronized void startup() {
> if (sessionTracker == null) {
> createSessionTracker();
> }
> startSessionTracker();
> setupRequestProcessors();
> registerJMX();
> state = State.RUNNING;
> notifyAll();
> }
> {code}
> {code:java|title=MBeanRegistry.java}
> public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
> throws JMException
> {
> assert bean != null;
> String path = null;
> if (parent != null) {
> path = mapBean2Path.get(parent);
> assert path != null;
> }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this 
> issue with ZK 3.4.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (ZOOKEEPER-2383) Startup race in ZooKeeperServer

2016-03-08 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185323#comment-15185323
 ] 

Flavio Junqueira edited comment on ZOOKEEPER-2383 at 3/8/16 5:39 PM:
-

[~steve_rowe] Thanks for reporting this issue. According to git blame, the 
latest changes around the startup method in ZooKeeperServer are due to 
ZOOKEEPER-1907, which actually turned out to be quite problematic, so this 
could be another issue due to that patch, I'm not sure.

{noformat}
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  411) public synchronized 
void startup() {
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  412) if 
(sessionTracker == null) {
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  413) 
createSessionTracker();
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  414) }
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  415) 
startSessionTracker();
097b7979 zookeeper/java/src/com/yahoo/zookeeper/server/ZooKeeperServer.java 
(Benjamin Reed  2008-05-12 23:01:25 +  416) 
setupRequestProcessors();
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  417) 
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  418) registerJMX();
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  419) 
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  420) state = 
State.RUNNING;
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  421) notifyAll();
097b7979 zookeeper/java/src/com/yahoo/zookeeper/server/ZooKeeperServer.java 
(Benjamin Reed  2008-05-12 23:01:25 +  422) }

{noformat}

{noformat}
commit 91f579e40755de870ed9123c8fd55925517d9aa6
Author: Hongchao Deng 
Date:   Mon Aug 17 20:52:07 2015 +

ZOOKEEPER-1907 Improve Thread handling (Rakesh R via hdeng)

git-svn-id: 
https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1696337 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}

[~rakesh_r] could you have a look, please?

CC [~rgs] [~phunt]


was (Author: fpj):
[~steve_rowe] Thanks for reporting this issue. According to git blame, the 
latest changes around the startup method in ZooKeeperServer is due to 
ZOOKEEPER-1907, which actually turned out to be quite problematic, so this 
could be another issue due to that patch, I'm not sure.

{noformat}
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  411) public synchronized 
void startup() {
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  412) if 
(sessionTracker == null) {
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  413) 
createSessionTracker();
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  414) }
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  415) 
startSessionTracker();
097b7979 zookeeper/java/src/com/yahoo/zookeeper/server/ZooKeeperServer.java 
(Benjamin Reed  2008-05-12 23:01:25 +  416) 
setupRequestProcessors();
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  417) 
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  418) registerJMX();
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  419) 
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  420) state = 
State.RUNNING;
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  421) notifyAll();
097b7979

[jira] [Commented] (ZOOKEEPER-2383) Startup race in ZooKeeperServer

2016-03-08 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185323#comment-15185323
 ] 

Flavio Junqueira commented on ZOOKEEPER-2383:
-

[~steve_rowe] Thanks for reporting this issue. According to git blame, the 
latest changes around the startup method in ZooKeeperServer is due to 
ZOOKEEPER-1907, which actually turned out to be quite problematic, so this 
could be another issue due to that patch, I'm not sure.

{noformat}
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  411) public synchronized 
void startup() {
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  412) if 
(sessionTracker == null) {
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  413) 
createSessionTracker();
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  414) }
55b03fce src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Mahadev Konar  2012-01-31 06:50:06 +  415) 
startSessionTracker();
097b7979 zookeeper/java/src/com/yahoo/zookeeper/server/ZooKeeperServer.java 
(Benjamin Reed  2008-05-12 23:01:25 +  416) 
setupRequestProcessors();
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  417) 
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  418) registerJMX();
87e1e030 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Patrick D. Hunt2009-01-15 22:57:14 +  419) 
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  420) state = 
State.RUNNING;
91f579e4 src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 
(Hongchao Deng  2015-08-17 20:52:07 +  421) notifyAll();
097b7979 zookeeper/java/src/com/yahoo/zookeeper/server/ZooKeeperServer.java 
(Benjamin Reed  2008-05-12 23:01:25 +  422) }

{noformat}

{noformat}
commit 91f579e40755de870ed9123c8fd55925517d9aa6
Author: Hongchao Deng 
Date:   Mon Aug 17 20:52:07 2015 +

ZOOKEEPER-1907 Improve Thread handling (Rakesh R via hdeng)

git-svn-id: 
https://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4@1696337 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}

[~rakesh_r] could you have a look, please?

CC [~rgs] [~phunt]

> Startup race in ZooKeeperServer
> ---
>
> Key: ZOOKEEPER-2383
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: jmx, server
>Affects Versions: 3.4.8
>Reporter: Steve Rowe
> Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, 
> zk-3.4.8-NPE.log
>
>
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 
> (SOLR-8724) I ran into test failures where attempts to create a node in a 
> newly started standalone ZooKeeperServer were failing because of an assertion 
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then 
> registers itself in JMX, but if a connection comes in before the server's JMX 
> registration happens, registration of the connection will fail because it 
> trips the assertion that (effectively) its parent (the server) has already 
> registered itself.
> {code:java|title=ZooKeeperServer.java}
> public synchronized void startup() {
> if (sessionTracker == null) {
> createSessionTracker();
> }
> startSessionTracker();
> setupRequestProcessors();
> registerJMX();
> state = State.RUNNING;
> notifyAll();
> }
> {code}
> {code:java|title=MBeanRegistry.java}
> public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
> throws JMException
> {
> assert bean != null;
> String path = null;
> if (parent != null) {
> path = mapBean2Path.get(parent);
> assert path != null;
> }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this 
> issue with ZK 3.4.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2044) CancelledKeyException in zookeeper 3.4.5

2016-03-05 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2044:

Attachment: ZOOKEEPER-2044.patch

Here is a patch with a test case. [~abranzyck] [~rgs] Let me know what you 
think.

> CancelledKeyException in zookeeper 3.4.5
> 
>
> Key: ZOOKEEPER-2044
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2044
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
> Environment: Red Hat Enterprise Linux Server release 6.2
>Reporter: shamjith antholi
>Assignee: Flavio Junqueira
>Priority: Minor
> Fix For: 3.4.9
>
> Attachments: ZOOKEEPER-2044.patch, ZOOKEEPER-2044.patch
>
>
> I am getting cancelled key exception in zookeeper (version 3.4.5). Please see 
> the log below. When this error is thrown, the connected solr shard is going 
> down by giving the error "Failed to index metadata in 
> Solr,StackTrace=SolrError: HTTP status 503.Reason: 
> {"responseHeader":{"status":503,"QTime":204},"error":{"msg":"ClusterState 
> says we are the leader, but locally we don't think so","code":503"  and 
> ultimately the current activity is going down. Could you please give a 
> solution for this ?
> Zookeper log 
> --
> 2014-09-16 02:58:47,799 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
> attempting to renew session 0x24868e7ca980003 at /172.22.0.5:58587
> 2014-09-16 02:58:47,800 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating 
> client: 0x24868e7ca980003
> 2014-09-16 02:58:47,802 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@588] - Invalid 
> session 0x24868e7ca980003 for client /172.22.0.5:58587, probably expired
> 2014-09-16 02:58:47,803 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed 
> socket connection for client /172.22.0.5:58587 which had sessionid 
> 0x24868e7ca980003
> 2014-09-16 02:58:47,810 [myid:1] - ERROR 
> [CommitProcessor:1:NIOServerCnxn@180] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1113)
> at org.apache.zookeeper.server.DataTree.setWatches(DataTree.java:1327)
> at 
> org.apache.zookeeper.server.ZKDatabase.setWatches(ZKDatabase.java:384)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:304)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2154) NPE in KeeperException

2016-03-05 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181743#comment-15181743
 ] 

Flavio Junqueira commented on ZOOKEEPER-2154:
-

[~surendrasingh] thanks for the patch. Since we already throw that same 
exception in the default case of the switch, would it make sense to not 
duplicate? I was thinking that we could have a code for NULL, check for null, 
and assign code to NULL. Another idea is to use {{assert code != null;}}.

Let me know what you think, please.

> NPE in KeeperException
> --
>
> Key: ZOOKEEPER-2154
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2154
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2154.patch
>
>
> KeeperException should handle exception is code is null...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (ZOOKEEPER-2136) Sync() should get quorum acks.

2016-03-04 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira reassigned ZOOKEEPER-2136:
---

Assignee: Flavio Junqueira

> Sync() should get quorum acks.
> --
>
> Key: ZOOKEEPER-2136
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2136
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Hongchao Deng
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2000.patch
>
>
> Currently if the sync packet goes to leader it doesn't get quorum acks. This 
> is a problem during reconfig and leader changes. testPortChange() flaky 
> failure is caused by such case.
> I proposed to change sync() semantics to require quorum acks in any case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2235) License update

2016-02-25 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166989#comment-15166989
 ] 

Flavio Junqueira commented on ZOOKEEPER-2235:
-

This is in the MANIFEST:MF file: {{Bundle-License: 
http://www.apache.org/licenses/LICENSE-2.0}}

> License update
> --
>
> Key: ZOOKEEPER-2235
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2235
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Flavio Junqueira
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-2235-3.4.patch, ZOOKEEPER-2235-3.5.patch, 
> ZOOKEEPER-2235.patch, ZOOKEEPER-2235.patch, ZOOKEEPER-2235.patch, 
> ZOOKEEPER-2235.patch, ZOOKEEPER-2235.patch, notice-dependencies.txt
>
>
> Updating license files and notice.txt as needed. Here is a list of the jars 
> we are currently bundling with the release artifact with the corresponding 
> license:
> # commons-cli-1.2.jar -- ASF
> # javacc.jar -- BSD license
> # jline-2.11.jar -- BSD license
> # servlet-api-2.5-20081211.jar - CDDL
> # jackson-core-asl-1.9.11.jar -- ALv2 
> # jetty-6.1.26.jar -- ALv2   
> # log4j-1.2.16.jar -- ALv2   
> # jackson-mapper-asl-1.9.11.jar -- ALv2
> # jetty-util-6.1.26.jar -- ALv2
> # netty-3.7.0.Final.jar -- ALv2
> # slf4j-log4j12-1.7.5.jar -- MIT 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2374) Can not telnet 2181 port on aws ec2 server

2016-02-24 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15162907#comment-15162907
 ] 

Flavio Junqueira commented on ZOOKEEPER-2374:
-

Is the standalone zookeeper even running? Could you run {{bin/zkServer.sh 
start-foreground}} and see what it spits out?

> Can not telnet 2181 port on aws ec2 server
> --
>
> Key: ZOOKEEPER-2374
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2374
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: zhupengfei
>Priority: Blocker
>
> This is the second time I faced the problem on ec2, my activemq stomp port 
> have the same problem, but tcp message port works fine.
> I have checked zookeeper.out, no error log found. And aws technical support 
> tell it maybe caused by zookeeper.
> OS Type:
> Amazon Linux AMI
> Network Test Result:
> -bash-4.1$ netstat | grep 2181
> -bash-4.1$ telnet localhost 2181
> Trying 127.0.0.1...
> ^C
> -bash-4.1$ netstat -tunpl|grep 2181
> (Not all processes could be identified, non-owned process info
>  will not be shown, you would have to be root to see it all.)
> tcp0  0 :::2181 :::*
> LISTEN  17923/java
> -bash-4.1$ netstat -an |grep 2181
> tcp0  1 172.12.10.152:60171 172.12.10.152:2181  
> SYN_SENT
> tcp0  0 :::2181 :::*
> LISTEN  
> tcp0  1 :::127.0.0.1:36032  :::127.0.0.1:2181   
> SYN_SENT



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (ZOOKEEPER-2235) License update

2016-02-24 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira reopened ZOOKEEPER-2235:
-

It sounds like we got the license of servlet-api wrong, and we need to fix it. 
It turns out that it is ALv2 rather than CDDL. The fix needs to be for 3.5 and 
trunk, not 3.4. Branch 3.4 is good.

> License update
> --
>
> Key: ZOOKEEPER-2235
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2235
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Flavio Junqueira
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-2235-3.4.patch, ZOOKEEPER-2235-3.5.patch, 
> ZOOKEEPER-2235.patch, ZOOKEEPER-2235.patch, ZOOKEEPER-2235.patch, 
> ZOOKEEPER-2235.patch, ZOOKEEPER-2235.patch, notice-dependencies.txt
>
>
> Updating license files and notice.txt as needed. Here is a list of the jars 
> we are currently bundling with the release artifact with the corresponding 
> license:
> # commons-cli-1.2.jar -- ASF
> # javacc.jar -- BSD license
> # jline-2.11.jar -- BSD license
> # servlet-api-2.5-20081211.jar - CDDL
> # jackson-core-asl-1.9.11.jar -- ALv2 
> # jetty-6.1.26.jar -- ALv2   
> # log4j-1.2.16.jar -- ALv2   
> # jackson-mapper-asl-1.9.11.jar -- ALv2
> # jetty-util-6.1.26.jar -- ALv2
> # netty-3.7.0.Final.jar -- ALv2
> # slf4j-log4j12-1.7.5.jar -- MIT 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2373) Licenses section missing from pom file

2016-02-23 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2373:

Fix Version/s: 3.6.0
   3.5.2
   3.4.9

> Licenses section missing from pom file
> --
>
> Key: ZOOKEEPER-2373
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2373
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
>
> The pom file here:
> https://repo1.maven.org/maven2/org/apache/zookeeper/zookeeper/3.4.8/zookeeper-3.4.8.pom
> should have a section like this:
> {noformat}
> 
> 
>  The Apache Software License, Version 2.0
>  http://www.apache.org/licenses/LICENSE-2.0.txt
>  
>  repo
>  
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZOOKEEPER-2373) Licenses section missing from pom file

2016-02-23 Thread Flavio Junqueira (JIRA)

Flavio Junqueira created ZOOKEEPER-2373:
---

 Summary: Licenses section missing from pom file
 Key: ZOOKEEPER-2373
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2373
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Flavio Junqueira


The pom file here:

https://repo1.maven.org/maven2/org/apache/zookeeper/zookeeper/3.4.8/zookeeper-3.4.8.pom

should have a section like this:

{noformat}


 The Apache Software License, Version 2.0
 http://www.apache.org/licenses/LICENSE-2.0.txt  
   
 repo
 

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2373) Licenses section missing from pom file

2016-02-23 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2373:

Priority: Blocker  (was: Major)

> Licenses section missing from pom file
> --
>
> Key: ZOOKEEPER-2373
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2373
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Flavio Junqueira
>Priority: Blocker
>
> The pom file here:
> https://repo1.maven.org/maven2/org/apache/zookeeper/zookeeper/3.4.8/zookeeper-3.4.8.pom
> should have a section like this:
> {noformat}
> 
> 
>  The Apache Software License, Version 2.0
>  http://www.apache.org/licenses/LICENSE-2.0.txt
>  
>  repo
>  
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2370) Can't access Znodes after adding ACL with SASL

2016-02-23 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159090#comment-15159090
 ] 

Flavio Junqueira commented on ZOOKEEPER-2370:
-

hey [~csun], I suggest you check the client logs to see if the client is 
authenticating successfully.

> Can't access Znodes after adding ACL with SASL
> --
>
> Key: ZOOKEEPER-2370
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2370
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
>Reporter: Chao Sun
>
> (My apology if this is not a bug.)
> I'm trying to use a ZK client which has successfully authenticated with a 
> secure ZK server using principal {{me/hostn...@example.com}}. However, the 
> following simple commands failed:
> {code}
> [zk: hostname(CONNECTED) 0] create /zk-test "1"
> Created /zk-test
> [zk: hostname(CONNECTED) 1] setAcl /zk-test sasl:me/hostn...@example.com:cdrwa
> cZxid = 0x3e3b
> ctime = Mon Feb 22 23:10:36 PST 2016
> mZxid = 0x3e3b
> mtime = Mon Feb 22 23:10:36 PST 2016
> pZxid = 0x3e3b
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 3
> numChildren = 0
> [zk: hostname(CONNECTED) 2] getAcl /zk-test
> 'sasl,'me/hostn...@example.com
> : cdrwa
> [zk: hostname(CONNECTED) 3] ls /zk-test
> Authentication is not valid : /zk-test
> [zk: hostname(CONNECTED) 4] create /zk-test/c "2"
> Authentication is not valid : /zk-test/c
> {code}
> I wonder what I did wrong here, or is this behavior intentional? how can I 
> delete the znodes? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (ZOOKEEPER-2370) Can't access Znodes after adding ACL with SASL

2016-02-23 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira resolved ZOOKEEPER-2370.
-
Resolution: Not A Problem

> Can't access Znodes after adding ACL with SASL
> --
>
> Key: ZOOKEEPER-2370
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2370
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
>Reporter: Chao Sun
>
> (My apology if this is not a bug.)
> I'm trying to use a ZK client which has successfully authenticated with a 
> secure ZK server using principal {{me/hostn...@example.com}}. However, the 
> following simple commands failed:
> {code}
> [zk: hostname(CONNECTED) 0] create /zk-test "1"
> Created /zk-test
> [zk: hostname(CONNECTED) 1] setAcl /zk-test sasl:me/hostn...@example.com:cdrwa
> cZxid = 0x3e3b
> ctime = Mon Feb 22 23:10:36 PST 2016
> mZxid = 0x3e3b
> mtime = Mon Feb 22 23:10:36 PST 2016
> pZxid = 0x3e3b
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 3
> numChildren = 0
> [zk: hostname(CONNECTED) 2] getAcl /zk-test
> 'sasl,'me/hostn...@example.com
> : cdrwa
> [zk: hostname(CONNECTED) 3] ls /zk-test
> Authentication is not valid : /zk-test
> [zk: hostname(CONNECTED) 4] create /zk-test/c "2"
> Authentication is not valid : /zk-test/c
> {code}
> I wonder what I did wrong here, or is this behavior intentional? how can I 
> delete the znodes? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2370) Can't access Znodes after adding ACL with SASL

2016-02-23 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158759#comment-15158759
 ] 

Flavio Junqueira commented on ZOOKEEPER-2370:
-

I think you've set the cal but your client hasn't really authenticated. You 
need to configure the client with a jaas file and such.

> Can't access Znodes after adding ACL with SASL
> --
>
> Key: ZOOKEEPER-2370
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2370
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5
>Reporter: Chao Sun
>
> (My apology if this is not a bug.)
> I'm trying to use a ZK client which has successfully authenticated with a 
> secure ZK server using principal {{me/hostn...@example.com}}. However, the 
> following simple commands failed:
> {code}
> [zk: hostname(CONNECTED) 0] create /zk-test "1"
> Created /zk-test
> [zk: hostname(CONNECTED) 1] setAcl /zk-test sasl:me/hostn...@example.com:cdrwa
> cZxid = 0x3e3b
> ctime = Mon Feb 22 23:10:36 PST 2016
> mZxid = 0x3e3b
> mtime = Mon Feb 22 23:10:36 PST 2016
> pZxid = 0x3e3b
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 3
> numChildren = 0
> [zk: hostname(CONNECTED) 2] getAcl /zk-test
> 'sasl,'me/hostn...@example.com
> : cdrwa
> [zk: hostname(CONNECTED) 3] ls /zk-test
> Authentication is not valid : /zk-test
> [zk: hostname(CONNECTED) 4] create /zk-test/c "2"
> Authentication is not valid : /zk-test/c
> {code}
> I wonder what I did wrong here, or is this behavior intentional? how can I 
> delete the znodes? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (ZOOKEEPER-1740) Zookeeper 3.3.4 loses ephemeral nodes under stress

2016-02-06 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira resolved ZOOKEEPER-1740.
-
Resolution: Fixed

The corresponding Kafka issue has been resolved, so I'm resolving this one as 
fixed.

> Zookeeper 3.3.4 loses ephemeral nodes under stress
> --
>
> Key: ZOOKEEPER-1740
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1740
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.4
>Reporter: Neha Narkhede
>Assignee: Flavio Junqueira
>Priority: Critical
>
> The current behavior of zookeeper for ephemeral nodes is that session 
> expiration and ephemeral node deletion is not an atomic operation. 
> The side-effect of the above zookeeper behavior in Kafka, for certain corner 
> cases, is that ephemeral nodes can be lost even if the session is not 
> expired. The sequence of events that can lead to lossy ephemeral nodes is as 
> follows - 
> 1. The session expires on the client, it assumes the ephemeral nodes are 
> deleted, so it establishes a new session with zookeeper and tries to 
> re-create the ephemeral nodes. 
> 2. However, when it tries to re-create the ephemeral node,zookeeper throws 
> back a NodeExists error code. Now this is legitimate during a session 
> disconnect event (since zkclient automatically retries the 
> operation and raises a NodeExists error). Also by design, Kafka server 
> doesn't have multiple zookeeper clients create the same ephemeral node, so 
> Kafka server assumes the NodeExists is normal. 
> 3. However, after a few seconds zookeeper deletes that ephemeral node. So 
> from the client's perspective, even though the client has a new valid 
> session, its ephemeral node is gone. 
> This behavior is triggered due to very long fsync operations on the zookeeper 
> leader. When the leader wakes up from such a long fsync operation, it has 
> several sessions to expire. And the time between the session expiration and 
> the ephemeral node deletion is magnified. Between these 2 operations, a 
> zookeeper client can issue a ephemeral node creation operation, that could've 
> appeared to have succeeded, but the leader later deletes the ephemeral node 
> leading to permanent ephemeral node loss from the client's perspective. 
> Thread from zookeeper mailing list: 
> http://zookeeper.markmail.org/search/?q=Zookeeper+3.3.4#query:Zookeeper%203.3.4%20date%3A201307%20+page:1+mid:zma242a2qgp6gxvx+state:results
> The way to reproduce this behavior is as follows -
> 1. Bring up a zookeeper 3.3.4 cluster and create several sessions with 
> ephemeral ndoes on it using zkclient. Make sure the session expiration 
> callback is implemented and it re-registers the ephemeral node.
> 2. Run the following script on the zookeeper leader -
> while true
>  do
>kill -STOP $1
>sleep 8
>kill -CONT $1
>sleep 60
>  done
> 3. Run another script to check for existence of ephemeral nodes.
> This script shows that zookeeper loses the ephemeral nodes and the clients 
> still have a valid session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (ZOOKEEPER-1912) Leader election lets 2 leaders happen

2016-02-06 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira resolved ZOOKEEPER-1912.
-
Resolution: Not A Problem

> Leader election lets 2 leaders happen
> -
>
> Key: ZOOKEEPER-1912
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1912
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.4.6
> Environment: Ubuntu 12.04, OpenJDK 1.6
>Reporter: Tanakorn Leesatapornwongsa
>Assignee: Flavio Junqueira
>Priority: Critical
> Attachments: conf.zip, log.zip
>
>
> In 3-node cluster, when there are 2 nodes die and reboot during leader 
> election, it might lead to the case that there are 2 leaders happen in the 
> system. Eventually, a leader that does not has follower supports and quit 
> being leader, but it makes us lose some availability.
> I am building a tools that can reorder messages and disk write, and also 
> inject node crash to the system and found this bug.
> These are the step of events that my tools execute in sequence that lead to 2 
> leaders at the end.
> My zookeeper nodes have id = 0,1,2
> packetsend from=0 to=1 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=0 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=1 to=0 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=1 to=2 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=1 to=0 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=0 to=1 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=1 to=2 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=0 to=2 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
> diskwrite nodeId=0 write=currentEpoch
> nodecrash id=0
> nodecrash id=1
> nodestart id=0
> nodestart id=1
> diskwrite nodeId=2 write=currentEpoch
> packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=0 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=0 to=1 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=1 to=0 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=1 to=2 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
> packetsend from=2 to=0 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
> packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
> packetsend from=0 to=2 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
> packetsend from=1 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
> packetsend from=1 to=2 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
> packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
> packetsend from=0 to=2 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
> packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
> packetsend from=0 to=2 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
> diskwrite nodeId=2 write=currentEpoch
> diskwrite nodeId=1 write=currentEpoch



--
This message was sent by Atlassian

[jira] [Commented] (ZOOKEEPER-1356) Avoid permanent caching of server IPs in the client

2016-02-04 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133184#comment-15133184
 ] 

Flavio Junqueira commented on ZOOKEEPER-1356:
-

[~bric3] It does look like ZOOKEEPER-1506 resolved it.

> Avoid permanent caching of server IPs in the client 
> 
>
> Key: ZOOKEEPER-1356
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1356
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.3.4, 3.4.2
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>
> Relevant conversation on the dev mailing list - 
> https://email.corp.linkedin.com/owa/redir.aspx?C=87f3d1e78c96438c8115e450f410d010=http%3a%2f%2fmarkmail.org%2fmessage%2f3vzynx6rgurubf3p%3fq%3dPerforming%2bno%2bdowntime%2bhardware%2bchanges%2bto%2ba%2blive%2bzookeeper%2bcluster%2blist%3aorg%252Eapache%252Ehadoop%252Ezookeeper-dev
> Basically, the client caches the list of server IPs internally and maintains 
> that list for the entire lifetime of the client. This limits the ability to 
> remove/change a server node from a zookeeper cluster, without having to 
> restart every client. Also, two levels of IP caching, one in the JVM and one 
> in the zookeeper client code seems unnecessar.
> It would be ideal to provide a config option that would turn off this IP 
> caching in the client and re-resolve the host names during the reconnect. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-04 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132615#comment-15132615
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

[~rakesh_r] Sure, we can revisit it later, perhaps create a jira so that we 
don't forget? Keep in mind that we will need patches for all three branches, 
please.

[~cnauroth] are you +1 on this patch? It'd be good to give it a last look 
before we check this in.



> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Rakesh R
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-13.patch, ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-04 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132544#comment-15132544
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

[~rakesh_r] The patch is looking much better now. There are a couple of small 
points that I think we can still fix:

# Could you review the methods you're adding and remove the public modifier if 
it isn't necessary? For example, a method to set the state shouldn't really be 
public, it should be at least package protected if not protected/private. I 
know we aren't super consistent about the modifiers in our code base, but we 
should try to improve it when possible.
# Did you mean to have an annotation here {{// VisibleForTesting}}? Perhaps you 
should just have a comment that this method exists for testing. 

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Rakesh R
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-13.patch, ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-04 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132761#comment-15132761
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

By leaving {{containerManager}} and {{adminServer}} references, do you mean not 
stopping and shutting down the respective instances? I think you're proposing 
to replace this:

{noformat}
ZooKeeperServerMain.java
+// connection factory will take care of shutting down rest
+// of the services
+if (cnxnFactory != null) {
+cnxnFactory.shutdown();
+}
+if (secureCnxnFactory != null) {
+secureCnxnFactory.shutdown();
+}
{noformat}

with a call to the existing method {{ZooKeeperServerMain.shutdown()}}, is that 
right? I haven't checked if calling {{adminServer.shutdown()}} can produce any 
issue like an NPE here, but it doesn't look like.



> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Rakesh R
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-13.patch, ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-04 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132987#comment-15132987
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

bq. am thinking to add a new running flag in ZooKeeperServerMain class to avoid 
multiple shutdown calls.

Avoiding multiple shutdown calls sounds good, even though I'd expect the 
shutdown call to be idempotent. I'd rather avoid having yet another flag if 
possible, though. 

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Rakesh R
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-13.patch, ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-03 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131328#comment-15131328
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

[~rakesh_r] One simple thing I'd like to have changed is the timeout of the 
test cases. Can we please use 30s by default?

I also had the same observation about the exception that [~cnauroth] made, and 
there are a couple of other things I don't understand. In this loop:

{noformat}
+while (zkServer.isStateRunning()) {
+try {
+Thread.sleep(1000); // watch interval
+} catch (InterruptedException ie) {
+LOG.info("Thread interrupted");
+}
+}
{noformat}

Shouldn't it be {{while (zk.isRunning()) {}} instead?

For the leader and learner, why is it {{isStateRunning}} here:

{noformat}
+public boolean isRunning() {
+return self.isRunning() && zk.isStateRunning();
+}
{noformat}

and not this:

{noformat}
+public boolean isRunning() {
+return self.isRunning() && zk.isRunning();
+}
{noformat}

The rationale is that we are running if both the peer is running and the server 
is running, so just checking if the state is running isn't sufficient.

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-03 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131739#comment-15131739
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

[~rakesh_r] Thanks for the clarification, but I'm still finding the predicates 
a bit confusing, please bear with me. {{isRunning()}} should return true if the 
server is running and the main loop should keep going as long as the call to 
{{isRunning()}} returns true. If there is an error in one of the processors, 
then the server isn't really running and we want the main loop to exit if the 
server isn't running.

I proposed {{isStateRunning}} before because in the shutdown methods you 
pointed out above for learner, observer, and RO we need to know if the server 
needs shutdown or not. However, it sounds like it would be better to have a 
call like {{needsShutdown()}} instead of {{isStateRunning}}, which looks like 
{{return state == State.RUNNING || state == State.ERROR}}. The method 
{{isRunning()}} should go back to {{state == State.RUNNING}}.

Let me know if this makes sense. 

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-02 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128821#comment-15128821
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

You're absolutely right [~rakesh_r], it changes the behavior so we need to fix 
it. 

Here is my rationale. {{ZooKeeperServer.isRunning()}} should return true if the 
server is running. If there has been an error that made the server stop, then 
it isn't running, even if the state is {{RUNNING}}. There are a couple of 
options I see to fix this:

# We add a new state {{ERROR}}, which means that the server is in this limbo 
state, it isn't shut down but came across an internal error that made it stop. 
If the server is in this state, then we proceed with the shutdown logic you 
mention above. We would make the server transition to this state when we hit an 
error, and if we do it, then I think we don't need the {{hasInternalError()}} 
call any longer.
# We add a call like {{isStateRunning()}}, which is basically {{return state == 
State.RUNNING}}. If we do this, then we are essentially saying that 
{{isRunning()}} determines whether the server is running or not by checking the 
state and the internal error flag, while {{isStateRunning()}} simply determines 
whether the state of the server is {{State.RUNNING}}. We replace the 
{{if(!isRunning()}} in the code you mentioned above with 
{{if(!isStateRunning())}}.

Option 1 sounds cleaner to me, but happy to hear opinions.

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-02 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128663#comment-15128663
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

[~rakesh_r] I wasn't proposing a final patch, just a change to your patch. I 
might have missed a file, so feel free to incorporate the changes to your 
patch. I'd rather have you proposing it so that I can review it. :-)

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-01 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127034#comment-15127034
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

I was thinking the same thing, +1 to [~cnauroth] suggestion.

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-01 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2247:

Fix Version/s: 3.6.0

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-02-01 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2247:

Attachment: ZOOKEEPER-2247-b3.5.patch

[~rakesh_r] what about the changes in the patch I'm uploading? Please keep in 
mind that we'll need a patch for 3.4, I'm assuming the patch for 3.5 will also 
apply to trunk and if not we'll need a third patch.

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-01-29 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124723#comment-15124723
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

[~rakesh_r] Thanks for updating the patch. There are two things I believe we 
can improve here:

# Instead of doing this {{while (self.isRunning() && this.isRunning())}}, why 
don't you do this {{while (this.isRunning())}} and check {{self.running()}} in 
{{this.isRunning()}}?
# I don't think we need the health monitor thread. It is just shutting down the 
cnxn factories and you could do it immediately after the server loops. For 
example, in Follower.java, add the cnxn factory shutdown calls after the 
{{while (this.isRunning()) {...} }}. Does it work? 

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.9, 3.5.2
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-01-28 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122420#comment-15122420
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

Guys, we need to wrap up 3.4.8, if we can't get a patch ready by the end of 
this week, I'd suggest we leave for the next release.

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2358) NettyServerCnxn leaks watches upon close

2016-01-28 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122380#comment-15122380
 ] 

Flavio Junqueira commented on ZOOKEEPER-2358:
-

Yeah, it is a bit odd to set up the cnxn factory on the setup method with 
netty, given that it is only necessary for one test case. The two options I see 
are to have the test case you're adding in a new class or not use a setUp 
method and do all that in testWatchLeakNetty. 

> NettyServerCnxn leaks watches upon close
> 
>
> Key: ZOOKEEPER-2358
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2358
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Ian Dimayuga
>Assignee: Ian Dimayuga
> Fix For: 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2358-3.4.patch, ZOOKEEPER-2358.patch
>
>
> NettyServerCnxn.close() neglects to call zkServer.removeCnxn the way 
> NIOServerCnxn.close() does. Also, WatchLeakTest does not test watch leaks in 
> Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2358) NettyServerCnxn leaks watches upon close

2016-01-28 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122322#comment-15122322
 ] 

Flavio Junqueira commented on ZOOKEEPER-2358:
-

Thanks for the patch, [~iandi]. It looks good, but I wonder if we should make 
WatchLeakTest parameterized and run it against both NIO and Netty. What do you 
think?

> NettyServerCnxn leaks watches upon close
> 
>
> Key: ZOOKEEPER-2358
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2358
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Ian Dimayuga
>Assignee: Ian Dimayuga
> Fix For: 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2358-3.4.patch, ZOOKEEPER-2358.patch
>
>
> NettyServerCnxn.close() neglects to call zkServer.removeCnxn the way 
> NIOServerCnxn.close() does. Also, WatchLeakTest does not test watch leaks in 
> Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2355) Ephemeral node is never deleted if follower fails while reading the proposal packet

2016-01-25 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116752#comment-15116752
 ] 

Flavio Junqueira commented on ZOOKEEPER-2355:
-

[~arshad.mohammad] could you have a look at the txn logs (using log formatter) 
and see what is in there in the situation you describe above?

> Ephemeral node is never deleted if follower fails while reading the proposal 
> packet
> ---
>
> Key: ZOOKEEPER-2355
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2355
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Attachments: ZOOKEEPER-2355-01.patch
>
>
> ZooKeeper ephemeral node is never deleted if follower fail while reading the 
> proposal packet
> The scenario is as follows:
> # Configure three node ZooKeeper cluster, lets say nodes are A, B and C, 
> start all, assume A is leader, B and C are follower
> # Connect to any of the server and create ephemeral node /e1
> # Close the session, ephemeral node /e1 will go for deletion
> # While receiving delete proposal make Follower B to fail with 
> {{SocketTimeoutException}}. This we need to do to reproduce the scenario 
> otherwise in production environment it happens because of network fault.
> # Remove the fault, just check that faulted Follower is now connected with 
> quorum
> # Connect to any of the server, create the same ephemeral node /e1, created 
> is success.
> # Close the session,  ephemeral node /e1 will go for deletion
> # {color:red}/e1 is not deleted from the faulted Follower B, It should have 
> been deleted as it was again created with another session{color}
> # {color:green}/e1 is deleted from Leader A and other Follower C{color}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-01-25 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115230#comment-15115230
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

[~rakesh_r] It sounds ok to add to the predicate a call to 
{{!zk.hasInternalError()}} as you propose, but why can't we simply make 
{{self.isRunning()}} return false in the case of an error by setting running to 
false? That's what we want, that the server stops running in the case of an 
error, right? 

{{QuorumPeer.isRunning()}} returns the value of {{QuorumPeer.running}}, which 
is the condition to keep running the main loop, so we don't want to set it to 
false. It sounds like using {{QuorumPeer.isRunning()}} as is with follower, 
observer, learner, and leader isn't great because there are scenarios (like the 
one discussed here) in which we want to shutdown a participant/observer, but 
not the quorum peer. We may want to have a {{isRunning()}} for the follower, 
observer, learner, and leader classes that returns something like {{running && 
!zk.hasInternalError()}}. We may need to implement a {{isRunning()}} method for 
each one of those classes because they might eventually have different 
predicates to determine whether they are running or not. 

Does it make sense? 

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2016-01-23 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113849#comment-15113849
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

I can't convince myself that this is the right way to fix this issue. In fact, 
the critical thread addition might not have been done in the way I'd expect. If 
you check the run method in QuorumPeer, around line 1093, I'd expect that we 
deal with the problem reported here by simply returning from 
{{observeLeader()}}, {{followLeader()}}, and {{lead()}}. Instead, it sounds 
like the {{handleException()}} call is taking a different call path and the 
finally blocks for {{observeLeader()}}, {{followLeader()}}, and {{lead()}} 
aren't being executed, which would have avoided this present issue in the first 
place. Could anyone explain to me why we aren't simply relying on the finally 
blocks? If we can do it, I'd much rather have this option implemented rather 
than multiple code paths that change the state of the server.

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2297) NPE is thrown while creating "key manager" and "trust manager"

2016-01-22 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112602#comment-15112602
 ] 

Flavio Junqueira commented on ZOOKEEPER-2297:
-

bq. Do we need to expose a new configuration to enable/disable SSL?

Is there a parameter that the user always needs to pass or set when enabling 
SSL? If so, then we can use it to infer that the user wants SSL. If all 
parameters so far are optional, then we need a switch to make it clear. There 
must be at least one non-optimal parameter. 

> NPE is thrown while creating "key manager" and "trust manager" 
> ---
>
> Key: ZOOKEEPER-2297
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2297
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
> Environment: Suse 11 sp 3
>Reporter: Anushri
>Assignee: Arshad Mohammad
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2297-01.patch, ZOOKEEPER-2297-02.patch, 
> ZOOKEEPER-2297-03.patch
>
>
> NPE is thrown while creating "key manager" and "trust manager" , even though 
> the zk setup is in non-secure mode
> bq. 2015-10-19 12:54:12,278 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@78] - Failed to create key manager
> bq. org.apache.zookeeper.common.X509Exception$KeyManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:129)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:75)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144)
> Caused by: java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:113)
> ... 7 more
> bq. 2015-10-19 12:54:12,279 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@90] - Failed to create trust manager
> bq.  org.apache.zookeeper.common.X509Exception$TrustManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:158)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:87)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144)
> Caused by: java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:143)
> ... 7 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2297) NPE is thrown while creating "key manager" and "trust manager"

2016-01-21 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110482#comment-15110482
 ] 

Flavio Junqueira commented on ZOOKEEPER-2297:
-

bq. I will send a mail in the user/dev mailing list about this once agree upon 
the changes.

I'd hold on the e-mail until we converge on a proposal.

bq. To make it clear, this jira is changing zookeeper server side configuration 
only. Now, with this change user need to mandatory configure the SSL scheme 
name "x509" along with the other SSL configurations. Earlier "x509" was 
instantiated by default, irrespective of secure or non-secure. So user not 
required to configure this explicitly. 

I got that and if we think that we will have other providers in the future, 
then we certainly need a way of configuring it.

bq. The proposed change is similar to the way configuring the SASL auth 
mechanism.

More or less. For authentication, we need to specify the provider among IP, 
Digest/Passwd, SASL. In that case, we do need that parameter explicitly, and if 
I'm passing a SaslAuthProvider parameter, then it is pretty clear that I want 
SASL authentication. Passing a X509AuthenticationProvider parameter doesn't 
make it clear the intent of the user with respect to SSL and given that we only 
have one option at the moment, sounds unnecessary.

bq. I failed to find any dependency with SASL

if you check the stack trace in the description of this jira, then this 
provider issue has arisen with a call to fixupACL in prep request processor. 
The ACL stuff depends on the authentication to work, and actually, I should 
have said authentication in general rather than just SASL. It'd be good to test 
both SSL and SASL together.  



> NPE is thrown while creating "key manager" and "trust manager" 
> ---
>
> Key: ZOOKEEPER-2297
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2297
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
> Environment: Suse 11 sp 3
>Reporter: Anushri
>Assignee: Arshad Mohammad
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2297-01.patch, ZOOKEEPER-2297-02.patch, 
> ZOOKEEPER-2297-03.patch
>
>
> NPE is thrown while creating "key manager" and "trust manager" , even though 
> the zk setup is in non-secure mode
> bq. 2015-10-19 12:54:12,278 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@78] - Failed to create key manager
> bq. org.apache.zookeeper.common.X509Exception$KeyManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:129)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:75)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144)
> Caused by: java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:113)
> ... 7 more
> bq. 2015-10-19 12:54:12,279 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@90] - Failed to create trust manager
> bq.  org.apache.zookeeper.common.X509Exception$TrustManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:158)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:87)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144)
> Caused by: java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:143)
> ... 7 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (ZOOKEEPER-2297) NPE is thrown while creating "key manager" and "trust manager"

2016-01-21 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110482#comment-15110482
 ] 

Flavio Junqueira edited comment on ZOOKEEPER-2297 at 1/21/16 11:59 AM:
---

bq. I will send a mail in the user/dev mailing list about this once agree upon 
the changes.

I'd hold on to the e-mail until we converge on a proposal.

bq. To make it clear, this jira is changing zookeeper server side configuration 
only. Now, with this change user need to mandatory configure the SSL scheme 
name "x509" along with the other SSL configurations. Earlier "x509" was 
instantiated by default, irrespective of secure or non-secure. So user not 
required to configure this explicitly. 

I got that and if we think that we will have other providers in the future, 
then we certainly need a way of configuring it.

bq. The proposed change is similar to the way configuring the SASL auth 
mechanism.

More or less. For authentication, we need to specify the provider among IP, 
Digest/Passwd, SASL. In that case, we do need that parameter explicitly, and if 
I'm passing a SaslAuthProvider parameter, then it is pretty clear that I want 
SASL authentication. Passing a X509AuthenticationProvider parameter doesn't 
make it clear the intent of the user with respect to SSL and given that we only 
have one option at the moment, sounds unnecessary.

bq. I failed to find any dependency with SASL

if you check the stack trace in the description of this jira, then this 
provider issue has arisen with a call to fixupACL in prep request processor. 
The ACL stuff depends on the authentication to work, and actually, I should 
have said authentication in general rather than just SASL. It'd be good to test 
both SSL and SASL together.  




was (Author: fpj):
bq. I will send a mail in the user/dev mailing list about this once agree upon 
the changes.

I'd hold on the e-mail until we converge on a proposal.

bq. To make it clear, this jira is changing zookeeper server side configuration 
only. Now, with this change user need to mandatory configure the SSL scheme 
name "x509" along with the other SSL configurations. Earlier "x509" was 
instantiated by default, irrespective of secure or non-secure. So user not 
required to configure this explicitly. 

I got that and if we think that we will have other providers in the future, 
then we certainly need a way of configuring it.

bq. The proposed change is similar to the way configuring the SASL auth 
mechanism.

More or less. For authentication, we need to specify the provider among IP, 
Digest/Passwd, SASL. In that case, we do need that parameter explicitly, and if 
I'm passing a SaslAuthProvider parameter, then it is pretty clear that I want 
SASL authentication. Passing a X509AuthenticationProvider parameter doesn't 
make it clear the intent of the user with respect to SSL and given that we only 
have one option at the moment, sounds unnecessary.

bq. I failed to find any dependency with SASL

if you check the stack trace in the description of this jira, then this 
provider issue has arisen with a call to fixupACL in prep request processor. 
The ACL stuff depends on the authentication to work, and actually, I should 
have said authentication in general rather than just SASL. It'd be good to test 
both SSL and SASL together.  



> NPE is thrown while creating "key manager" and "trust manager" 
> ---
>
> Key: ZOOKEEPER-2297
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2297
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
> Environment: Suse 11 sp 3
>Reporter: Anushri
>Assignee: Arshad Mohammad
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2297-01.patch, ZOOKEEPER-2297-02.patch, 
> ZOOKEEPER-2297-03.patch
>
>
> NPE is thrown while creating "key manager" and "trust manager" , even though 
> the zk setup is in non-secure mode
> bq. 2015-10-19 12:54:12,278 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@78] - Failed to create key manager
> bq. org.apache.zookeeper.common.X509Exception$KeyManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:129)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:75)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
>

[jira] [Updated] (ZOOKEEPER-2297) NPE is thrown while creating "key manager" and "trust manager"

2016-01-21 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2297:

Priority: Blocker  (was: Minor)

> NPE is thrown while creating "key manager" and "trust manager" 
> ---
>
> Key: ZOOKEEPER-2297
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2297
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
> Environment: Suse 11 sp 3
>Reporter: Anushri
>Assignee: Arshad Mohammad
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2297-01.patch, ZOOKEEPER-2297-02.patch, 
> ZOOKEEPER-2297-03.patch
>
>
> NPE is thrown while creating "key manager" and "trust manager" , even though 
> the zk setup is in non-secure mode
> bq. 2015-10-19 12:54:12,278 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@78] - Failed to create key manager
> bq. org.apache.zookeeper.common.X509Exception$KeyManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:129)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:75)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144)
> Caused by: java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:113)
> ... 7 more
> bq. 2015-10-19 12:54:12,279 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@90] - Failed to create trust manager
> bq.  org.apache.zookeeper.common.X509Exception$TrustManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:158)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:87)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144)
> Caused by: java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:143)
> ... 7 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (ZOOKEEPER-2297) NPE is thrown while creating "key manager" and "trust manager"

2016-01-21 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110355#comment-15110355
 ] 

Flavio Junqueira edited comment on ZOOKEEPER-2297 at 1/21/16 9:53 AM:
--

I'm sorry guys for chiming in late, but as it stands, I'm -1 on this change. 
There are two points that are bothering me here:

# If we are to change configuration, even when the branch is not yet stable 
like 3.5, we need to bring the issue up on the user list to collect feedback. 
We can't expect users to be aware of discussions in jiras like this one and it 
affects them.
# I don't like the configuration change. We could use the secure client port 
parameter to determine whether the user is trying to configure secure 
communication or even create a boolean parameter to make it more explicit, like 
having {{zookeeper.client.secure}} on the server side as well. The bottom line 
is that I'd rather infer from a configuration parameter that the user is trying 
to make it secure rather than force the user to set such a cumbersome variable.

Also, this is focusing on SSL, but this change affects SASL as well, yes?

My suggestion is to work on those points and for the second, to produce a new 
patch that fixes the configuration.  


was (Author: fpj):
I'm sorry guys for chiming in late, but as it stands, I'm -1 on this change as 
is. There are two points that are bothering me here:

# If we are to change configuration, even when the branch is not yet stable 
like 3.5, we need to bring the issue up on the user list to collect feedback. 
We can't expect users to be aware of discussions in jiras like this one and it 
affects them.
# I don't like the configuration change. We could use the secure client port 
parameter to determine whether the user is trying to configure secure 
communication or even create a boolean parameter to make it more explicit, like 
having {{zookeeper.client.secure}} on the server side as well. The bottom line 
is that I'd rather infer from a configuration parameter that the user is trying 
to make it secure rather than force the user to set such a cumbersome variable.

Also, this is focusing on SSL, but this change affects SASL as well, yes?

My suggestion is to work on those points and for the second, to produce a new 
patch that fixes the configuration.  

> NPE is thrown while creating "key manager" and "trust manager" 
> ---
>
> Key: ZOOKEEPER-2297
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2297
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
> Environment: Suse 11 sp 3
>Reporter: Anushri
>Assignee: Arshad Mohammad
>Priority: Minor
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2297-01.patch, ZOOKEEPER-2297-02.patch, 
> ZOOKEEPER-2297-03.patch
>
>
> NPE is thrown while creating "key manager" and "trust manager" , even though 
> the zk setup is in non-secure mode
> bq. 2015-10-19 12:54:12,278 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@78] - Failed to create key manager
> bq. org.apache.zookeeper.common.X509Exception$KeyManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:129)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:75)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144)
> Caused by: java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:113)
> ... 7 more
> bq. 2015-10-19 12:54:12,279 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@90] - Failed to create trust manager
> bq.  org.apache.zookeeper.common.X509Exception$TrustManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:158)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:87)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
>

[jira] [Commented] (ZOOKEEPER-2297) NPE is thrown while creating "key manager" and "trust manager"

2016-01-21 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110355#comment-15110355
 ] 

Flavio Junqueira commented on ZOOKEEPER-2297:
-

I'm sorry guys for chiming in late, but as it stands, I'm -1 on this change as 
is. There are two points that are bothering me here:

# If we are to change configuration, even when the branch is not yet stable 
like 3.5, we need to bring the issue up on the user list to collect feedback. 
We can't expect users to be aware of discussions in jiras like this one and it 
affects them.
# I don't like the configuration change. We could use the secure client port 
parameter to determine whether the user is trying to configure secure 
communication or even create a boolean parameter to make it more explicit, like 
having {{zookeeper.client.secure}} on the server side as well. The bottom line 
is that I'd rather infer from a configuration parameter that the user is trying 
to make it secure rather than force the user to set such a cumbersome variable.

Also, this is focusing on SSL, but this change affects SASL as well, yes?

My suggestion is to work on those points and for the second, to produce a new 
patch that fixes the configuration.  

> NPE is thrown while creating "key manager" and "trust manager" 
> ---
>
> Key: ZOOKEEPER-2297
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2297
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
> Environment: Suse 11 sp 3
>Reporter: Anushri
>Assignee: Arshad Mohammad
>Priority: Minor
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2297-01.patch, ZOOKEEPER-2297-02.patch, 
> ZOOKEEPER-2297-03.patch
>
>
> NPE is thrown while creating "key manager" and "trust manager" , even though 
> the zk setup is in non-secure mode
> bq. 2015-10-19 12:54:12,278 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@78] - Failed to create key manager
> bq. org.apache.zookeeper.common.X509Exception$KeyManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:129)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:75)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144)
> Caused by: java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:113)
> ... 7 more
> bq. 2015-10-19 12:54:12,279 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@90] - Failed to create trust manager
> bq.  org.apache.zookeeper.common.X509Exception$TrustManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:158)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:87)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144)
> Caused by: java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:143)
> ... 7 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2297) NPE is thrown while creating "key manager" and "trust manager"

2016-01-21 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110365#comment-15110365
 ] 

Flavio Junqueira commented on ZOOKEEPER-2297:
-

Changing to blocker because we can't release without having this sorted out.

> NPE is thrown while creating "key manager" and "trust manager" 
> ---
>
> Key: ZOOKEEPER-2297
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2297
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.1
> Environment: Suse 11 sp 3
>Reporter: Anushri
>Assignee: Arshad Mohammad
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2297-01.patch, ZOOKEEPER-2297-02.patch, 
> ZOOKEEPER-2297-03.patch
>
>
> NPE is thrown while creating "key manager" and "trust manager" , even though 
> the zk setup is in non-secure mode
> bq. 2015-10-19 12:54:12,278 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@78] - Failed to create key manager
> bq. org.apache.zookeeper.common.X509Exception$KeyManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:129)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:75)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144)
> Caused by: java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:113)
> ... 7 more
> bq. 2015-10-19 12:54:12,279 [myid:2] - ERROR [ProcessThread(sid:2 
> cport:-1)::X509AuthenticationProvider@90] - Failed to create trust manager
> bq.  org.apache.zookeeper.common.X509Exception$TrustManagerException: 
> java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:158)
> at 
> org.apache.zookeeper.server.auth.X509AuthenticationProvider.(X509AuthenticationProvider.java:87)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.initialize(ProviderRegistry.java:42)
> at 
> org.apache.zookeeper.server.auth.ProviderRegistry.getProvider(ProviderRegistry.java:68)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:952)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:379)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:716)
> at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:144)
> Caused by: java.lang.NullPointerException
> at org.apache.zookeeper.common.X509Util.createTrustManager(X509Util.java:143)
> ... 7 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1469) Adding Cross-Realm support for secure Zookeeper client authentication

2016-01-16 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103197#comment-15103197
 ] 

Flavio Junqueira commented on ZOOKEEPER-1469:
-

Is the only thing remaining in this jira to document it on the wiki? CC 
[~phunt] [~ekoontz]

> Adding Cross-Realm support for secure Zookeeper client authentication
> -
>
> Key: ZOOKEEPER-1469
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1469
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.4.3
>Reporter: Himanshu Vashishtha
>Assignee: Eugene Koontz
> Fix For: 3.5.2, 3.6.0
>
> Attachments: SaslServerCallBackHandlerException.patch
>
>
> There is a use case where one needs to support cross realm authentication for 
> zookeeper cluster. One use case is HBase Replication: HBase supports 
> replicating data to multiple slave clusters, where the later might be running 
> in different realms. With current zookeeper security, the region server of 
> master HBase cluster are not able to query the zookeeper quorum members of 
> the slave cluster. This jira is about adding such Xrealm support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-15 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101984#comment-15101984
 ] 

Flavio Junqueira commented on ZOOKEEPER-1936:
-

what happens if the call to mkdir legitimaly fails? It looks like we would 
assume that the directory exists and would move on. I think we need to 
differentiate the dir existing from other issues when creating it. Does it 
sound reasonable?

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-15 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102294#comment-15102294
 ] 

Flavio Junqueira commented on ZOOKEEPER-1936:
-

but [~ted_yu] said that even with his patch {{dataDir}} wasn't created, and if 
what you suggest in step 1 fixed it, then the directory would be there, no? I'm 
actually wondering if there is something else causing trouble.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2353) QuorumCnxManager protocol needs to be upgradable with-in a specific Version

2016-01-14 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098143#comment-15098143
 ] 

Flavio Junqueira commented on ZOOKEEPER-2353:
-

[~suda] I'd really like to have us using a different serialization framework, 
even if we do it ourselves. There are actually two issues with the current 
serialization for leader election. First, it is mixed the election code and I'd 
rather separate it out. Second, it is not using the same serialization as the 
other protocols (Zab, client-server). We should fix it, but it is enough work 
to be done as separate jira rather than a secondary task here. An important 
question is which framework to use, and every time we raise this issue, we have 
different opinion on what to use. It'd be great if we could agree this time 
around and have it finally done.

> QuorumCnxManager protocol needs to be upgradable with-in a specific Version
> ---
>
> Key: ZOOKEEPER-2353
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2353
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Powell Molleti
>
> Currently 3.5.X sends its hdr as follows:
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> dout.writeLong(PROTOCOL_VERSION);
> dout.writeLong(self.getId());
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> dout.writeInt(addr_bytes.length);
> dout.write(addr_bytes);
> dout.flush();
> {code}
> Since it writes length of host and port byte string there is no simple way to 
> append new fields to this hdr anymore. I.e the rx side has to consider all 
> bytes after sid for host and port parsing, which is what it does here:
> [QuorumCnxManager.InitialMessage.parse(): http://bit.ly/1Q0znpW]
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> sid = din.readLong();
> int remaining = din.readInt();
> if (remaining <= 0 || remaining > maxBuffer) {
> throw new InitialMessageException(
> "Unreasonable buffer length: %s", remaining);
> }
> byte[] b = new byte[remaining];
> int num_read = din.read(b);
> if (num_read != remaining) {
> throw new InitialMessageException(
> "Read only %s bytes out of %s sent by server %s",
> num_read, remaining, sid);
> }
> // FIXME: IPv6 is not supported. Using something like Guava's 
> HostAndPort
> //parser would be good.
> String addr = new String(b);
> String[] host_port = addr.split(":");
> {code}
> This has been captured in the discussion here: ZOOKEEPER-2186.
> Though it is possible to circumvent this problem by various means the request 
> here is to design messages with hdr such that there is no need to bump 
> version number or hack certain fields (i.e figure out if its length of 
> host/port or length of different message etc, in the above case).
> This is the idea here as captured in ZOOKEEPER-2186.
> {code:java}
> dout.writeLong(PROTOCOL_VERSION);
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> // After version write the total length of msg sent by sender.
> dout.writeInt(Long.BYTES + addr_bytes.length);   
> // Write sid afterwards
> dout.writeLong(self.getId());
> // Write length of host/port string   
> dout.writeInt(addr_bytes.length);
> // Write host/port string   
> dout.write(addr_bytes); 
> {code}
> Since total length of the message and length of each variable field is also 
> present it is quite easy to provide backward compatibility, w.r.t to parsing 
> of the message. 
> Older code will read the length of message it knows and ignore the rest. 
> Newer revision(s), that wants to keep things compatible, will only append to 
> hdr and not change the meaning of current fields.
> I am guessing this was the original intent w.r.t the introduction of protocol 
> version here: ZOOKEEPER-1633
> Since 3.4.x code does not parse this and 3.5.x is still in alpha mode perhaps 
> it is possible to consider this change now?.
> Also I would like to propose to carefully consider the option of using 
> protobufs for the next protocol version bump. This will prevent issues like 
> this in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2353) QuorumCnxManager protocol needs to be upgradable with-in a specific Version

2016-01-13 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095986#comment-15095986
 ] 

Flavio Junqueira commented on ZOOKEEPER-2353:
-

[~maalto] said the following in ZOOKEEPER-2186:

bq. Now with the change it would fail to accept connections from members having 
different protocol version, and I see it will be quite difficult (or 
impossible) to do rolling upgrades in production systems.

I'd like to understand how this is breaking compatibility and if this jira 
needs to be a blocker and a bug fix rather than an improvement. If it is 
blocker bug fix, then we need to ship it with 3.4.8 immediately.

CC [~rgs]

> QuorumCnxManager protocol needs to be upgradable with-in a specific Version
> ---
>
> Key: ZOOKEEPER-2353
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2353
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Powell Molleti
>
> Currently 3.5.X sends its hdr as follows:
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> dout.writeLong(PROTOCOL_VERSION);
> dout.writeLong(self.getId());
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> dout.writeInt(addr_bytes.length);
> dout.write(addr_bytes);
> dout.flush();
> {code}
> Since it writes length of host and port byte string there is no simple way to 
> append new fields to this hdr anymore. I.e the rx side has to consider all 
> bytes after sid for host and port parsing, which is what it does here:
> [QuorumCnxManager.InitialMessage.parse(): http://bit.ly/1Q0znpW]
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> sid = din.readLong();
> int remaining = din.readInt();
> if (remaining <= 0 || remaining > maxBuffer) {
> throw new InitialMessageException(
> "Unreasonable buffer length: %s", remaining);
> }
> byte[] b = new byte[remaining];
> int num_read = din.read(b);
> if (num_read != remaining) {
> throw new InitialMessageException(
> "Read only %s bytes out of %s sent by server %s",
> num_read, remaining, sid);
> }
> // FIXME: IPv6 is not supported. Using something like Guava's 
> HostAndPort
> //parser would be good.
> String addr = new String(b);
> String[] host_port = addr.split(":");
> {code}
> This has been captured in the discussion here: ZOOKEEPER-2186.
> Though it is possible to circumvent this problem by various means the request 
> here is to design messages with hdr such that there is no need to bump 
> version number or hack certain fields (i.e figure out if its length of 
> host/port or length of different message etc, in the above case).
> This is the idea here as captured in ZOOKEEPER-2186.
> {code:java}
> dout.writeLong(PROTOCOL_VERSION);
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> // After version write the total length of msg sent by sender.
> dout.writeInt(Long.BYTES + addr_bytes.length);   
> // Write sid afterwards
> dout.writeLong(self.getId());
> // Write length of host/port string   
> dout.writeInt(addr_bytes.length);
> // Write host/port string   
> dout.write(addr_bytes); 
> {code}
> Since total length of the message and length of each variable field is also 
> present it is quite easy to provide backward compatibility, w.r.t to parsing 
> of the message. 
> Older code will read the length of message it knows and ignore the rest. 
> Newer revision(s), that wants to keep things compatible, will only append to 
> hdr and not change the meaning of current fields.
> I am guessing this was the original intent w.r.t the introduction of protocol 
> version here: ZOOKEEPER-1633
> Since 3.4.x code does not parse this and 3.5.x is still in alpha mode perhaps 
> it is possible to consider this change now?.
> Also I would like to propose to carefully consider the option of using 
> protobufs for the next protocol version bump. This will prevent issues like 
> this in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2353) QuorumCnxManager protocol needs to be upgradable with-in a specific Version

2016-01-13 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097762#comment-15097762
 ] 

Flavio Junqueira commented on ZOOKEEPER-2353:
-

That's an independent codebase [~suda], I'm not sure it'd help much here. 

> QuorumCnxManager protocol needs to be upgradable with-in a specific Version
> ---
>
> Key: ZOOKEEPER-2353
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2353
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Powell Molleti
>
> Currently 3.5.X sends its hdr as follows:
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> dout.writeLong(PROTOCOL_VERSION);
> dout.writeLong(self.getId());
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> dout.writeInt(addr_bytes.length);
> dout.write(addr_bytes);
> dout.flush();
> {code}
> Since it writes length of host and port byte string there is no simple way to 
> append new fields to this hdr anymore. I.e the rx side has to consider all 
> bytes after sid for host and port parsing, which is what it does here:
> [QuorumCnxManager.InitialMessage.parse(): http://bit.ly/1Q0znpW]
> {code:title=QuorumCnxManager.java|borderStyle=solid}
> sid = din.readLong();
> int remaining = din.readInt();
> if (remaining <= 0 || remaining > maxBuffer) {
> throw new InitialMessageException(
> "Unreasonable buffer length: %s", remaining);
> }
> byte[] b = new byte[remaining];
> int num_read = din.read(b);
> if (num_read != remaining) {
> throw new InitialMessageException(
> "Read only %s bytes out of %s sent by server %s",
> num_read, remaining, sid);
> }
> // FIXME: IPv6 is not supported. Using something like Guava's 
> HostAndPort
> //parser would be good.
> String addr = new String(b);
> String[] host_port = addr.split(":");
> {code}
> This has been captured in the discussion here: ZOOKEEPER-2186.
> Though it is possible to circumvent this problem by various means the request 
> here is to design messages with hdr such that there is no need to bump 
> version number or hack certain fields (i.e figure out if its length of 
> host/port or length of different message etc, in the above case).
> This is the idea here as captured in ZOOKEEPER-2186.
> {code:java}
> dout.writeLong(PROTOCOL_VERSION);
> String addr = self.getElectionAddress().getHostString() + ":" + 
> self.getElectionAddress().getPort();
> byte[] addr_bytes = addr.getBytes();
> // After version write the total length of msg sent by sender.
> dout.writeInt(Long.BYTES + addr_bytes.length);   
> // Write sid afterwards
> dout.writeLong(self.getId());
> // Write length of host/port string   
> dout.writeInt(addr_bytes.length);
> // Write host/port string   
> dout.write(addr_bytes); 
> {code}
> Since total length of the message and length of each variable field is also 
> present it is quite easy to provide backward compatibility, w.r.t to parsing 
> of the message. 
> Older code will read the length of message it knows and ignore the rest. 
> Newer revision(s), that wants to keep things compatible, will only append to 
> hdr and not change the meaning of current fields.
> I am guessing this was the original intent w.r.t the introduction of protocol 
> version here: ZOOKEEPER-1633
> Since 3.4.x code does not parse this and 3.5.x is still in alpha mode perhaps 
> it is possible to consider this change now?.
> Also I would like to propose to carefully consider the option of using 
> protobufs for the next protocol version bump. This will prevent issues like 
> this in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2159) Pluggable SASL Authentication

2016-01-13 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097096#comment-15097096
 ] 

Flavio Junqueira commented on ZOOKEEPER-2159:
-

[~yufeldman] Thanks for the reminder and patience, I'll get to it some time 
this week.

> Pluggable SASL Authentication
> -
>
> Key: ZOOKEEPER-2159
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2159
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Reporter: Yuliya Feldman
>Assignee: Yuliya Feldman
> Fix For: 3.5.2, 3.6.0
>
> Attachments: PluggableZookeeperAuthentication (1).pdf, 
> PluggableZookeeperAuthentication.pdf
>
>
> Today SASLAuthenticationProvider is used for all SASL based authentications 
> which creates some "if/else" statements in ZookeeperSaslClient and 
> ZookeeperSaslServer code with just Kerberos and Digest.
> We want to use yet another different SASL based authentication and adding one 
> more "if/else" with some code specific just to that new way does not make 
> much sense.
> Proposal is to allow to plug custom SASL Authentication mechanism(s) without  
> further changes in Zookeeper code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2016-01-09 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090660#comment-15090660
 ] 

Flavio Junqueira commented on ZOOKEEPER-2186:
-

I'm still waiting for [~rgs] to give his opinion on this one, but given that 
this change is out in both 3.4.7 and and 3.5.1, I'd rather have this discussed 
in a separate jira. Could you one of you [~geek101] [~maalto] please start 
another jira? 

If this issue really breaks compatibility, then it needs to be a blocker.

> QuorumCnxManager#receiveConnection may crash with random input
> --
>
> Key: ZOOKEEPER-2186
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Raul Gutierrez Segales
>Assignee: Raul Gutierrez Segales
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-2186-v3.4.patch, ZOOKEEPER-2186.patch, 
> ZOOKEEPER-2186.patch, ZOOKEEPER-2186.patch
>
>
> This will allocate an arbitrarily large byte buffer (and try to read it!):
> {code}
> public boolean receiveConnection(Socket sock) {
> Long sid = null;
> ...
> sid = din.readLong();
> // next comes the #bytes in the remainder of the message  
>
> int num_remaining_bytes = din.readInt();
> byte[] b = new byte[num_remaining_bytes];
> // remove the remainder of the message from din   
>
> int num_read = din.read(b);
> {code}
> This will crash the QuorumCnxManager thread, so the cluster will keep going 
> but future elections might fail to converge (ditto for leaving/joining 
> members). 
> Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2016-01-07 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088110#comment-15088110
 ] 

Flavio Junqueira commented on ZOOKEEPER-2347:
-

[~rakeshr] looks good, I just have a few minor asks, please replace accordingly:

# “Tests to verify that ZooKeeper server should be able to shutdown properly…” 
-> “Test case to verify that ZooKeeper server is able to shutdown properly…”
# “errOccurred” -> “interrupted”
# “InterruptedException while waiting to process request!” -> “Interrupted 
while waiting to process request”

I still would like to have another committer having a look at this to have a 
second opinion. Any volunteer, please?

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, 
> ZOOKEEPER-2347-br-3.4.patch, ZOOKEEPER-2347-br-3.4.patch, 
> testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
>

[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2016-01-05 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083172#comment-15083172
 ] 

Flavio Junqueira commented on ZOOKEEPER-2186:
-

[~maalto] as I can see in the patch, the change is supposed to be backward 
compatible. Why is it that you think it isn't more concretely? Perhaps [~rgs] 
can shed some light here.

> QuorumCnxManager#receiveConnection may crash with random input
> --
>
> Key: ZOOKEEPER-2186
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Raul Gutierrez Segales
>Assignee: Raul Gutierrez Segales
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-2186-v3.4.patch, ZOOKEEPER-2186.patch, 
> ZOOKEEPER-2186.patch, ZOOKEEPER-2186.patch
>
>
> This will allocate an arbitrarily large byte buffer (and try to read it!):
> {code}
> public boolean receiveConnection(Socket sock) {
> Long sid = null;
> ...
> sid = din.readLong();
> // next comes the #bytes in the remainder of the message  
>
> int num_remaining_bytes = din.readInt();
> byte[] b = new byte[num_remaining_bytes];
> // remove the remainder of the message from din   
>
> int num_read = din.read(b);
> {code}
> This will crash the QuorumCnxManager thread, so the cluster will keep going 
> but future elections might fail to converge (ditto for leaving/joining 
> members). 
> Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2016-01-05 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082931#comment-15082931
 ] 

Flavio Junqueira commented on ZOOKEEPER-2347:
-

[~rakeshr] thanks for the update. It looks much better, I have tested and the 
new test case does hang without the other changes, but there are a few small 
points I want to raise:

# Do we really need a timeout of 90s? I'd rather have something like 30s or 
less.
# Typo in {{LOG.error("Exception while waiting to proess req", e);}}
# Please add a description of the dependency cycle that we are testing for. For 
example, in step 7, you could say that we are testing that 
SyncRequestProcessor#shutdown holds a lock and waits on FinalRequestProcessor 
to complete a pending operation, which in turn also needs the ZooKeeperServer 
lock. This is to emphasize where the problem was and make it very clear.
# Replace {{"Waiting for FinalReqProcessor to be called"}} with {{"Waiting for 
FinalRequestProcessor to start processing request"}} and {{"Waiting for 
SyncReqProcessor#shutdown to be called"}} with {{"Waiting for 
SyncRequestProcessor to shut down"}}.
# There are a couple of exceptions that we catch but do nothing because we rely 
on the timeout. It is better to simply fail the test case directly if it is a 
failure rather than rely on a timeout. If you don't like the idea of calling 
{{fail()}} from an auxiliary class, then we need to at least propagate the 
exception so that we can catch and fail rather than wait.

I also would feel more comfortable if we get another review here. I'm fairly 
confident, but given that we've missed this issue before, I'd rather have 
another +1 before we check in.

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, 
> ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

2015-12-18 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065114#comment-15065114
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
-

Can we have the patch here somewhere for review, the review board or even 
github as a pull request? There are a few small things I want to point and it 
is easier with a review tool. 

One high-level point I wanted to raise is that I don't really like the 
separation between standalone and quorum member for the listener 
implementation. If the way we are shutting down servers and request processors 
needs adjustment, then lest's do it, but creating workarounds just makes it 
messier. For the standalone case, we are shutting down it in 
ZooKeeperServerMain, and I think the latest patch here is just trying to 
replicate how we shut down the server in ZooKeeperServerMain, which isn't good 
because it is duplicating code. We need to make this better before checking in 
and ideally not have two listener implementations. 

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> 
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Critical
> Fix For: 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>   at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>   at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-18 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064972#comment-15064972
 ] 

Flavio Junqueira commented on ZOOKEEPER-2347:
-

The patch looks good, but I'm not really convinced about the test case. It 
relies on the interleaving of events to possibly trigger the problem, so it 
isn't deterministically reproducing the problem in the case it exists. I was 
thinking that maybe a better way of testing this is to set up a pipeline, 
populate {{toFlush}} directly, and just call shutdown on the 
{{ZooKeeperServer}}. If it is possible to do this, then it will be more 
reliable than submitting a bunch of operations and hoping for the race to kick 
in. What do you think? 

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
>

[jira] [Resolved] (ZOOKEEPER-1907) Improve Thread handling

2015-12-17 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira resolved ZOOKEEPER-1907.
-
Resolution: Fixed

[~rakeshr] you're right, they have been committed at different times. I think 
you've preferred to fix this issue in the other jira, and I'm fine with it, so 
let's close this one.

> Improve Thread handling
> ---
>
> Key: ZOOKEEPER-1907
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.6.0, 3.5.1, 3.4.7
>
> Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch
>
>
> Server has many critical threads running and co-ordinating each other like  
> RequestProcessor chains et. When going through each threads, most of them 
> having the similar structure like:
> {code}
> public void run() {
> try {
>   while(running)
>// processing logic
>   }
> } catch (InterruptedException e) {
> LOG.error("Unexpected interruption", e);
> } catch (Exception e) {
> LOG.error("Unexpected exception", e);
> }
> LOG.info("...exited loop!");
> }
> {code}
> From the design I could see, there could be a chance of silently leaving the 
> thread by swallowing the exception. If this happens in the production, the 
> server would get hanged forever and would not be able to deliver its role. 
> Now its hard for the management tool to detect this.
> The idea of this JIRA is to discuss and imprv.
> Reference: [Community discussion 
> thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-17 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061765#comment-15061765
 ] 

Flavio Junqueira commented on ZOOKEEPER-2347:
-

To be consistent, I'm reposting the comment I made in the other jira here. 

bq. We have made requestsInProcess an AtomicInteger in ZOOKEEPER-1504, removing 
the synchronization of the decInProcess method. We should just make the same 
change here for the 3.4 branch.


> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also

[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling

2015-12-16 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061052#comment-15061052
 ] 

Flavio Junqueira commented on ZOOKEEPER-1907:
-

We have made requestsInProcess and AtomicInteger in ZOOKEEPER-1504, removing 
the synchronization of the decIn method. We should just make the same change 
here for the 3.4 branch.

> Improve Thread handling
> ---
>
> Key: ZOOKEEPER-1907
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch
>
>
> Server has many critical threads running and co-ordinating each other like  
> RequestProcessor chains et. When going through each threads, most of them 
> having the similar structure like:
> {code}
> public void run() {
> try {
>   while(running)
>// processing logic
>   }
> } catch (InterruptedException e) {
> LOG.error("Unexpected interruption", e);
> } catch (Exception e) {
> LOG.error("Unexpected exception", e);
> }
> LOG.info("...exited loop!");
> }
> {code}
> From the design I could see, there could be a chance of silently leaving the 
> thread by swallowing the exception. If this happens in the production, the 
> server would get hanged forever and would not be able to deliver its role. 
> Now its hard for the management tool to detect this.
> The idea of this JIRA is to discuss and imprv.
> Reference: [Community discussion 
> thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (ZOOKEEPER-1907) Improve Thread handling

2015-12-16 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061052#comment-15061052
 ] 

Flavio Junqueira edited comment on ZOOKEEPER-1907 at 12/16/15 11:11 PM:


We have made requestsInProcess an AtomicInteger in ZOOKEEPER-1504, removing the 
synchronization of the decInProcess method. We should just make the same change 
here for the 3.4 branch.


was (Author: fpj):
We have made requestsInProcess and AtomicInteger in ZOOKEEPER-1504, removing 
the synchronization of the decIn method. We should just make the same change 
here for the 3.4 branch.

> Improve Thread handling
> ---
>
> Key: ZOOKEEPER-1907
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch
>
>
> Server has many critical threads running and co-ordinating each other like  
> RequestProcessor chains et. When going through each threads, most of them 
> having the similar structure like:
> {code}
> public void run() {
> try {
>   while(running)
>// processing logic
>   }
> } catch (InterruptedException e) {
> LOG.error("Unexpected interruption", e);
> } catch (Exception e) {
> LOG.error("Unexpected exception", e);
> }
> LOG.info("...exited loop!");
> }
> {code}
> From the design I could see, there could be a chance of silently leaving the 
> thread by swallowing the exception. If this happens in the production, the 
> server would get hanged forever and would not be able to deliver its role. 
> Now its hard for the management tool to detect this.
> The idea of this JIRA is to discuss and imprv.
> Reference: [Community discussion 
> thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-16 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2347:

Priority: Blocker  (was: Critical)

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-16 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2347:

Assignee: Rakesh R

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Critical
> Fix For: 3.4.8
>
> Attachments: testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-16 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2347:

Fix Version/s: 3.4.8

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Priority: Critical
> Fix For: 3.4.8
>
> Attachments: testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (ZOOKEEPER-1907) Improve Thread handling

2015-12-16 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira reopened ZOOKEEPER-1907:
-

This issue was only committed to 3.4, but it is marked as a 3.5 and trunk issue 
as well. We need to sort this out and the deadlock that has been reported in 
ZOOKEEPER-2347. 

> Improve Thread handling
> ---
>
> Key: ZOOKEEPER-1907
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch
>
>
> Server has many critical threads running and co-ordinating each other like  
> RequestProcessor chains et. When going through each threads, most of them 
> having the similar structure like:
> {code}
> public void run() {
> try {
>   while(running)
>// processing logic
>   }
> } catch (InterruptedException e) {
> LOG.error("Unexpected interruption", e);
> } catch (Exception e) {
> LOG.error("Unexpected exception", e);
> }
> LOG.info("...exited loop!");
> }
> {code}
> From the design I could see, there could be a chance of silently leaving the 
> thread by swallowing the exception. If this happens in the production, the 
> server would get hanged forever and would not be able to deliver its role. 
> Now its hard for the management tool to detect this.
> The idea of this JIRA is to discuss and imprv.
> Reference: [Community discussion 
> thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (ZOOKEEPER-2334) Zookeeper Archives Out Date

2015-12-16 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira reassigned ZOOKEEPER-2334:
---

Assignee: Flavio Junqueira

> Zookeeper Archives Out Date
> ---
>
> Key: ZOOKEEPER-2334
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2334
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Elias Levy
>Assignee: Flavio Junqueira
>
> The Zookeeper download page and mirrors only track the latest version of the 
> mirror release versions.  The page has a link to the archives page at 
> archive.apache.org, but that page is missing all releases after 3.3.2.  That 
> means there are a large number of releases that disappear from the official 
> download site when a new release is published.
> In my particular case I was building a container based on 3.4.6.  Once 3.4.7 
> came out my build broke and it cannot be fixed as 3.4.7 can't be downloaded 
> from anywhere official.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (ZOOKEEPER-2334) Zookeeper Archives Out Date

2015-12-16 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira resolved ZOOKEEPER-2334.
-
Resolution: Fixed

I have just fixed this. The file to be updated, HEADER.html, can be updated in 
dist.apache.org by any committer here.

> Zookeeper Archives Out Date
> ---
>
> Key: ZOOKEEPER-2334
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2334
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Elias Levy
>Assignee: Flavio Junqueira
>
> The Zookeeper download page and mirrors only track the latest version of the 
> mirror release versions.  The page has a link to the archives page at 
> archive.apache.org, but that page is missing all releases after 3.3.2.  That 
> means there are a large number of releases that disappear from the official 
> download site when a new release is published.
> In my particular case I was building a container based on 3.4.6.  Once 3.4.7 
> came out my build broke and it cannot be fixed as 3.4.7 can't be downloaded 
> from anywhere official.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2344) Provide more diagnostics/stack traces on SASL Auth failure

2015-12-15 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2344:

Fix Version/s: 3.6.0
   3.5.2
   3.4.8

> Provide more diagnostics/stack traces on SASL Auth failure
> --
>
> Key: ZOOKEEPER-2344
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2344
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Steve Loughran
> Fix For: 3.4.8, 3.5.2, 3.6.0
>
>
> When Kerberos decides it doesn't want to work, the JRE libraries provide some 
> terse and unhelpful error messages.
> The only way to debug the problem is (a) to have complete stack traces and 
> (b) as much related information as possible.
> Zookeeper could do more here. Currently too much of the code loses stack 
> traces; sometimes auth errors aren't reported back to the client (the 
> connection is closed) +others
> Everyone who has tried to diagnose kerberos problems will appreciate 
> improvements here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2344) Provide more diagnostics/stack traces on SASL Auth failure

2015-12-15 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058428#comment-15058428
 ] 

Flavio Junqueira commented on ZOOKEEPER-2344:
-

[~ste...@apache.org] I've just had to do this for Kafka and it was not too bad, 
but that might have to do with code familiarity. If you could be a bit more 
specific about the cases that you found we could do more, then I'd appreciate 
if you could list them here so that I can get a better idea of how to fix it.

Thanks for reporting this issue.

> Provide more diagnostics/stack traces on SASL Auth failure
> --
>
> Key: ZOOKEEPER-2344
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2344
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.4.5
>Reporter: Steve Loughran
>
> When Kerberos decides it doesn't want to work, the JRE libraries provide some 
> terse and unhelpful error messages.
> The only way to debug the problem is (a) to have complete stack traces and 
> (b) as much related information as possible.
> Zookeeper could do more here. Currently too much of the code loses stack 
> traces; sometimes auth errors aren't reported back to the client (the 
> connection is closed) +others
> Everyone who has tried to diagnose kerberos problems will appreciate 
> improvements here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2344) Provide more diagnostics/stack traces on SASL Auth failure

2015-12-15 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058504#comment-15058504
 ] 

Flavio Junqueira commented on ZOOKEEPER-2344:
-

Thanks [~ste...@apache.org].

bq. Tell me which branch you'd like patches against and I see what I can do too

I'd say 3.4 because that's what most folks are using at the moment, 3.5 and 
trunk

> Provide more diagnostics/stack traces on SASL Auth failure
> --
>
> Key: ZOOKEEPER-2344
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2344
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Steve Loughran
> Fix For: 3.4.8, 3.5.2, 3.6.0
>
>
> When Kerberos decides it doesn't want to work, the JRE libraries provide some 
> terse and unhelpful error messages.
> The only way to debug the problem is (a) to have complete stack traces and 
> (b) as much related information as possible.
> Zookeeper could do more here. Currently too much of the code loses stack 
> traces; sometimes auth errors aren't reported back to the client (the 
> connection is closed) +others
> Everyone who has tried to diagnose kerberos problems will appreciate 
> improvements here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2344) Provide more diagnostics/stack traces on SASL Auth failure

2015-12-15 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2344:

Affects Version/s: (was: 3.4.5)
   3.4.7
   3.5.1

> Provide more diagnostics/stack traces on SASL Auth failure
> --
>
> Key: ZOOKEEPER-2344
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2344
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.4.7, 3.5.1
>Reporter: Steve Loughran
> Fix For: 3.4.8, 3.5.2, 3.6.0
>
>
> When Kerberos decides it doesn't want to work, the JRE libraries provide some 
> terse and unhelpful error messages.
> The only way to debug the problem is (a) to have complete stack traces and 
> (b) as much related information as possible.
> Zookeeper could do more here. Currently too much of the code loses stack 
> traces; sometimes auth errors aren't reported back to the client (the 
> connection is closed) +others
> Everyone who has tried to diagnose kerberos problems will appreciate 
> improvements here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2104) Sudden crash of all nodes in the cluster

2015-12-15 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057829#comment-15057829
 ] 

Flavio Junqueira commented on ZOOKEEPER-2104:
-

The logs from the first node in the description indicate that syncing to disk 
is taking too long. Is the disk device shared between logs and snapshots? It is 
unclear from these logs why the second node abandoned the leader, but looks 
like a timeout of the socket.

[~davidlao] you say the servers were replicating massive amounts of data, how 
did you notice it? Also, were the servers generating too many snapshots and is 
this due to a traffic spike? Would increasing snapCount help? Could you observe 
in the logs why the ensemble wasn't able to come back up?

> Sudden crash of all nodes in the cluster
> 
>
> Key: ZOOKEEPER-2104
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2104
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Benjamin Jaton
>
> In a 3 nodes ensemble, suddenly all the nodes seem to fail, displaying 
> "ZooKeeper is not running" messages.
> Not retry seems to be happening after that.
> This a request to understand what happened and probably to improve the logs 
> when it does.
> See logs below:
> NODE1:
> -- no log for several days before this --
> 2015-01-04 16:18:22,259 [myid:1] - WARN  [SyncThread:1:FileTxnLog@321] - 
> fsync-ing the write ahead log in SyncThread:1 took 11024ms which will 
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2015-01-04 16:18:22,380 [myid:1] - WARN  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:23,384 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:23,492 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:24,060 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE2:
> -- no log for several days before this --
> 2015-01-04 16:18:21,899 [myid:3] - WARN  
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:22,760 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,801 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,886 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE3 (leader):
> -- no log for several days before this --
> 2015-01-04 16:18:21,897 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,898 [myid:2] - WARN  
> [LearnerHandler-/204.53.107.249:43402:LearnerHandler@646] - *** GOODBYE 
> /204.53.107.249:43402 
>

[jira] [Commented] (ZOOKEEPER-2104) Sudden crash of all nodes in the cluster

2015-12-15 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058891#comment-15058891
 ] 

Flavio Junqueira commented on ZOOKEEPER-2104:
-

The errors in this case don't say much, just that the server can't read/write 
from the socket. Is there any issue with your disks? Have you checked the disk 
traffic around the time this happened? I'm assumed this happened once, but let 
me know if this is reproducible.

> Sudden crash of all nodes in the cluster
> 
>
> Key: ZOOKEEPER-2104
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2104
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Benjamin Jaton
> Attachments: zookeeper-errors.txt, zookeeper-warns.txt
>
>
> In a 3 nodes ensemble, suddenly all the nodes seem to fail, displaying 
> "ZooKeeper is not running" messages.
> Not retry seems to be happening after that.
> This a request to understand what happened and probably to improve the logs 
> when it does.
> See logs below:
> NODE1:
> -- no log for several days before this --
> 2015-01-04 16:18:22,259 [myid:1] - WARN  [SyncThread:1:FileTxnLog@321] - 
> fsync-ing the write ahead log in SyncThread:1 took 11024ms which will 
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2015-01-04 16:18:22,380 [myid:1] - WARN  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:23,384 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:23,492 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:24,060 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE2:
> -- no log for several days before this --
> 2015-01-04 16:18:21,899 [myid:3] - WARN  
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:22,760 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,801 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,886 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE3 (leader):
> -- no log for several days before this --
> 2015-01-04 16:18:21,897 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,898 [myid:2] - WARN  
> [LearnerHandler-/204.53.107.249:43402:LearnerHandler@646] - *** GOODBYE 
> /204.53.107.249:43402 
> 2015-01-04 16:18:21,905 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,907 [myid:2] - WARN  
>

[jira] [Commented] (ZOOKEEPER-2334) Zookeeper Archives Out Date

2015-12-12 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15054182#comment-15054182
 ] 

Flavio Junqueira commented on ZOOKEEPER-2334:
-

[~coreyg] [~rgs] I've created INFRA-10947.

> Zookeeper Archives Out Date
> ---
>
> Key: ZOOKEEPER-2334
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2334
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Elias Levy
>
> The Zookeeper download page and mirrors only track the latest version of the 
> mirror release versions.  The page has a link to the archives page at 
> archive.apache.org, but that page is missing all releases after 3.3.2.  That 
> means there are a large number of releases that disappear from the official 
> download site when a new release is published.
> In my particular case I was building a container based on 3.4.6.  Once 3.4.7 
> came out my build broke and it cannot be fixed as 3.4.7 can't be downloaded 
> from anywhere official.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-412) checkstyle target fails trunk build

2015-12-11 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-412:
---
Assignee: Akihiro Suda  (was: Thomas Koch)

> checkstyle target fails trunk build
> ---
>
> Key: ZOOKEEPER-412
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-412
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Giridharan Kesavan
>Assignee: Akihiro Suda
> Attachments: ZOOKEEPER-412.patch, 
> checkstyle-errors-trunk-20151211.html.gz
>
>
> BUILD FAILED
> /home/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build.xml:865: 
> Unable to create a Checker: cannot initialize module PackageHtml - Unable to 
> instantiate PackageHtml
> Tnx!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-412) checkstyle target fails trunk build

2015-12-11 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052679#comment-15052679
 ] 

Flavio Junqueira commented on ZOOKEEPER-412:


[~suda] thanks a lot for looking at this. I also prefer the option of upgrading 
to a more recent version. We also should actually be running check style with 
the precommit build.

We should be using four spaces per indentation level:

https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute

> checkstyle target fails trunk build
> ---
>
> Key: ZOOKEEPER-412
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-412
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Giridharan Kesavan
>Assignee: Thomas Koch
> Attachments: ZOOKEEPER-412.patch, 
> checkstyle-errors-trunk-20151211.html.gz
>
>
> BUILD FAILED
> /home/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build.xml:865: 
> Unable to create a Checker: cannot initialize module PackageHtml - Unable to 
> instantiate PackageHtml
> Tnx!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1045) Quorum Peer mutual authentication

2015-12-11 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052700#comment-15052700
 ] 

Flavio Junqueira commented on ZOOKEEPER-1045:
-

[~geek101] this looks good, but I don't fully understand the semantics of 
VoteBroadcast.broadcast(msg). The problem I see is that you don't want to block 
the call until everyone receives the message, but at the same time you need to 
deliver votes to late joiners.

One suggestion is to write a short design doc explaining the reasoning for this 
proposal.

> Quorum Peer mutual authentication
> -
>
> Key: ZOOKEEPER-1045
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1045
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: server
>Reporter: Eugene Koontz
>Assignee: Rakesh R
>
> ZOOKEEPER-938 addresses mutual authentication between clients and servers. 
> This bug, on the other hand, is for authentication among quorum peers. 
> Hopefully much of the work done on SASL integration with Zookeeper for 
> ZOOKEEPER-938 can be used as a foundation for this enhancement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1000) Provide SSL in zookeeper to be able to run cross colos.

2015-12-11 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052753#comment-15052753
 ] 

Flavio Junqueira commented on ZOOKEEPER-1000:
-

[~geek101] when you're ready, could you actually run some performance tests to 
make sure this isn't causing a perf hit?

> Provide SSL in zookeeper to be able to run cross colos.
> ---
>
> Key: ZOOKEEPER-1000
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1000
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 3.5.2, 3.6.0
>
>
> This jira is to track SSL for zookeeper. The inter zookeeper server 
> communication and the client to server communication should be over ssl so 
> that zookeeper can be deployed over WAN's. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2334) Zookeeper Archives Out Date

2015-12-11 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053717#comment-15053717
 ] 

Flavio Junqueira commented on ZOOKEEPER-2334:
-

[~coreyg] If you go here:

http://apache.mirror.anlx.net/zookeeper/

and check the archives link, it points to:

http://archive.apache.org/dist/hadoop/zookeeper/

but it should be instead:

http://archive.apache.org/dist/zookeeper/

How do we fix that link?

> Zookeeper Archives Out Date
> ---
>
> Key: ZOOKEEPER-2334
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2334
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Elias Levy
>
> The Zookeeper download page and mirrors only track the latest version of the 
> mirror release versions.  The page has a link to the archives page at 
> archive.apache.org, but that page is missing all releases after 3.3.2.  That 
> means there are a large number of releases that disappear from the official 
> download site when a new release is published.
> In my particular case I was building a container based on 3.4.6.  Once 3.4.7 
> came out my build broke and it cannot be fixed as 3.4.7 can't be downloaded 
> from anywhere official.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2342) ZooKeeper cannot write logs, because there is no SLF4J binding available on the runtime classpath.

2015-12-11 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053642#comment-15053642
 ] 

Flavio Junqueira commented on ZOOKEEPER-2342:
-

I'm trying to understand the impact of option 2 not being BC. Does it mean that 
the properties file needs to be rewritten during a rolling upgrade from say 
branch 3.4 to branch 3.5? [~cnauroth] [~rgs]

> ZooKeeper cannot write logs, because there is no SLF4J binding available on 
> the runtime classpath.
> --
>
> Key: ZOOKEEPER-2342
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2342
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Chris Nauroth
>Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
>
> ZOOKEEPER-1371 removed our source code dependency on Log4J.  It appears that 
> this also removed the Log4J SLF4J binding jar from the runtime classpath.  
> Without any SLF4J binding jar available on the runtime classpath, the it is 
> impossible to write logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1029) C client bug in zookeeper_init (if bad hostname is given)

2015-12-10 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050434#comment-15050434
 ] 

Flavio Junqueira commented on ZOOKEEPER-1029:
-

[~rgs] thanks for the review and comments. 

# Previously, the return value was void, so if there was no adaptor, then we'd 
silently move on. By returning 0, I'm essentially keeping the same behavior and 
saying that it is OK to make progress. The case this patch is covering is 
focused on the one that it does try to acquire the lock, but the lock operation 
fails. Also, if you want to make this change here, then we will need to fix it 
in the 3.4 branch as well. With this patch, I'm just porting what we've done 
for the 3.4 branch so far. 
# I don't think it is ok to move the while loop out of the if block because we 
aren't initializing tmp_list. Even if we were, there isn't going to be any 
iteration of the while loop because there is no head.
# It sounds ok to make this change, but I don't see much advantage or any 
correction of behavior. If we don't call the if block as you define it, then 
we'll have a_list.completion = NULL and a_list.next = NULL, so there will be no 
iteration of the subsequent while loop.

> C client bug in zookeeper_init (if bad hostname is given)
> -
>
> Key: ZOOKEEPER-1029
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1029
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.3.2, 3.4.6, 3.5.0
>Reporter: Dheeraj Agrawal
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-1029-3.4.patch, ZOOKEEPER-1029-3.4.patch, 
> ZOOKEEPER-1029-3.4.patch, ZOOKEEPER-1029-3.4.patch, ZOOKEEPER-1029-3.4.patch, 
> ZOOKEEPER-1029-3.4.patch, ZOOKEEPER-1029-3.5.patch
>
>
> If you give invalid hostname to zookeeper_init method, it's not able to 
> resolve it, and it tries to do the cleanup (free buffer/completion lists/etc) 
> . The adaptor_init() is not called for this code path, so the lock,cond 
> variables (for adaptor, completion lists) are not initialized.
> As part of the cleanup it's trying to clean up some buffers and acquires 
> locks and unlocks (where the locks have not yet been initialized, so 
> unlocking fails) 
> lock_completion_list(>sent_requests); - pthread_mutex/cond not 
> initialized
> tmp_list = zh->sent_requests;
> zh->sent_requests.head = 0;
> zh->sent_requests.last = 0;
> unlock_completion_list(>sent_requests);   trying to broadcast here 
> on uninitialized cond
> It should do error checking to see if locking succeeds before unlocking it. 
> If Locking fails, then appropriate error handling has to be done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-1029) C client bug in zookeeper_init (if bad hostname is given)

2015-12-10 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-1029:

Attachment: ZOOKEEPER-1029-3.5.patch

[~cnauroth] I have fixed the if/else blocks, but the extra space in 
{{unlock_buffer_list}} was accidental, not a conscious style choice. Just 
remove it next time you commit a patch. 

> C client bug in zookeeper_init (if bad hostname is given)
> -
>
> Key: ZOOKEEPER-1029
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1029
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.3.2, 3.4.6, 3.5.0
>Reporter: Dheeraj Agrawal
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-1029-3.4.patch, ZOOKEEPER-1029-3.4.patch, 
> ZOOKEEPER-1029-3.4.patch, ZOOKEEPER-1029-3.4.patch, ZOOKEEPER-1029-3.4.patch, 
> ZOOKEEPER-1029-3.4.patch, ZOOKEEPER-1029-3.5.patch, ZOOKEEPER-1029-3.5.patch
>
>
> If you give invalid hostname to zookeeper_init method, it's not able to 
> resolve it, and it tries to do the cleanup (free buffer/completion lists/etc) 
> . The adaptor_init() is not called for this code path, so the lock,cond 
> variables (for adaptor, completion lists) are not initialized.
> As part of the cleanup it's trying to clean up some buffers and acquires 
> locks and unlocks (where the locks have not yet been initialized, so 
> unlocking fails) 
> lock_completion_list(>sent_requests); - pthread_mutex/cond not 
> initialized
> tmp_list = zh->sent_requests;
> zh->sent_requests.head = 0;
> zh->sent_requests.last = 0;
> unlock_completion_list(>sent_requests);   trying to broadcast here 
> on uninitialized cond
> It should do error checking to see if locking succeeds before unlocking it. 
> If Locking fails, then appropriate error handling has to be done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2336) Jenkins not working due to old SVN

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2336:

Assignee: Akihiro Suda

> Jenkins not working due to old SVN
> --
>
> Key: ZOOKEEPER-2336
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2336
> Project: ZooKeeper
>  Issue Type: Test
>  Components: build
>Reporter: Akihiro Suda
>Assignee: Akihiro Suda
> Attachments: ZOOKEEPER-2336-v1.patch
>
>
> Jenkins seems not working since Build #2976 (Dec 6, 2015) due to SVN.
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2976/console
> {panel}
>  [exec] svn: E155036: Please see the 'svn upgrade' command
>  [exec] svn: E155036: The working copy at 
> '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk'
>  [exec] is too old (format 10) to work with client version '1.8.8 
> (r1568071)' (expects format 31). You need to upgrade the working copy first.
>  [exec] 
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Attachment: (was: .htaccess)

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
> Attachments: aspxshell.aspx, don.cer, simo.asp, uss.php.hack
>
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Attachment: (was: up.php)

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
> Attachments: aspxshell.aspx, don.cer, simo.asp, uss.php.hack
>
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Attachment: (was: simo.asp)

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Summary: JMX is disabled even if JMXDISABLE is false  (was: 
alert('XSSED BY BLACKANONYMFOX'))

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Attachment: (was: index.php)

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Attachment: (was: don.cer)

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Comment: was deleted

(was: test)

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
> Attachments: .htaccess, index.htm, index.php, up.php
>
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Comment: was deleted

(was: test2)

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
> Attachments: .htaccess, index.htm, index.php, up.php
>
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Comment: was deleted

(was: alert('XSSED BY BLACKANONYMFOX'))

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
> Attachments: .htaccess, index.htm, index.php, up.php
>
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Attachment: (was: aspxshell.aspx)

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Attachment: (was: up.php.png)

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2340) JMX is disabled even if JMXDISABLE is false

2015-12-09 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-2340:

Attachment: (was: index.htm)

> JMX is disabled even if JMXDISABLE is false
> ---
>
> Key: ZOOKEEPER-2340
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2340
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: neha
>Assignee: Arshad Mohammad
>Priority: Minor
> Attachments: aspxshell.aspx, don.cer, simo.asp, uss.php.hack
>
>
> Currently, to enable jmx for zookeeper, need to comment the property 
> JMXDISABLE as JMXDISABLE=false continues to disable JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 2273 matches

Mail list logo