[jira] [Commented] (ZOOKEEPER-2315) Change client connect zk service timeout log level from Info to Warn level

2015-11-05 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993209#comment-14993209
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2315:


Thanks for the patch [~linyiqun]. Could you define string variables for these 
messages so that you can use them for both logging and exception?

> Change client connect zk service timeout log level from Info to Warn level
> --
>
> Key: ZOOKEEPER-2315
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2315
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Lin Yiqun
>Priority: Minor
> Attachments: ZOOKEEPER-2315.001.patch
>
>
> Recently my the resourmanager  of my hadoop cluster is fail suddenly,so I 
> look into the rsourcemanager log.But the log is not helpful for me to direct 
> find the reson until I found the zk timeout info log record.
> {code}
> 2015-11-06 06:34:11,257 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Assigned container container_1446016482901_292094_01_000140 of capacity 
>  on host mofa2089:41361, which has 30 containers, 
>  used and  available after 
> allocation
> 2015-11-06 06:34:11,266 INFO org.apache.zookeeper.ClientCnxn: Unable to 
> reconnect to ZooKeeper service, session 0x24f4fd5118e5c6e has expired, 
> closing socket connection
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1446016482901_292094_01_000105 Container Transitioned from RUNNING 
> to COMPLETED
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Completed container: container_1446016482901_292094_01_000105 in state: 
> COMPLETED event:FINISHED
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dongwei  
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1446016482901_292094  
> CONTAINERID=container_1446016482901_292094_01_000105
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Released container container_1446016482901_292094_01_000105 of capacity 
>  on host mofa010079:50991, which currently has 29 
> containers,  used and  
> available, release resources=true
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Application attempt appattempt_1446016482901_292094_01 released container 
> container_1446016482901_292094_01_000105 on node: host: mofa010079:50991 
> #containers=29 available= used= vCores:29> with event: FINISHED
> 2015-11-06 06:34:11,272 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1446016482901_292094_01_000141 Container Transitioned from NEW to 
> ALLOCATED
> 2015-11-06 06:34:11,272 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dongwei  
> OPERATION=AM Allocated ContainerTARGET=SchedulerApp 
> RESULT=SUCCESS  APPID=application_1446016482901_292094  
> CONTAINERID=container_1446016482901_292094_01_000141
> 2015-11-06 06:34:11,272 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Assigned container container_1446016482901_292094_01_000141 of capacity 
>  on host mofa010079:50991, which has 30 containers, 
>  used and  available after 
> allocation
> 2015-11-06 06:34:11,295 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
>  
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
>  interrupted. Returning.
> 2015-11-06 06:34:11,296 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 8032
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 8030
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 8032
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 8031
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 8030
> 2015-11-06 06:34:11,300 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 80312015-11-06 06:34:11,300 INFO 
> 

[jira] [Updated] (ZOOKEEPER-2315) Change client connect zk service timeout log level from Info to Warn level

2015-11-05 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2315:
---
Fix Version/s: 3.6.0
   3.5.2
   3.4.7

> Change client connect zk service timeout log level from Info to Warn level
> --
>
> Key: ZOOKEEPER-2315
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2315
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Lin Yiqun
>Priority: Minor
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2315.001.patch
>
>
> Recently my the resourmanager  of my hadoop cluster is fail suddenly,so I 
> look into the rsourcemanager log.But the log is not helpful for me to direct 
> find the reson until I found the zk timeout info log record.
> {code}
> 2015-11-06 06:34:11,257 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Assigned container container_1446016482901_292094_01_000140 of capacity 
>  on host mofa2089:41361, which has 30 containers, 
>  used and  available after 
> allocation
> 2015-11-06 06:34:11,266 INFO org.apache.zookeeper.ClientCnxn: Unable to 
> reconnect to ZooKeeper service, session 0x24f4fd5118e5c6e has expired, 
> closing socket connection
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1446016482901_292094_01_000105 Container Transitioned from RUNNING 
> to COMPLETED
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Completed container: container_1446016482901_292094_01_000105 in state: 
> COMPLETED event:FINISHED
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dongwei  
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1446016482901_292094  
> CONTAINERID=container_1446016482901_292094_01_000105
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Released container container_1446016482901_292094_01_000105 of capacity 
>  on host mofa010079:50991, which currently has 29 
> containers,  used and  
> available, release resources=true
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Application attempt appattempt_1446016482901_292094_01 released container 
> container_1446016482901_292094_01_000105 on node: host: mofa010079:50991 
> #containers=29 available= used= vCores:29> with event: FINISHED
> 2015-11-06 06:34:11,272 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1446016482901_292094_01_000141 Container Transitioned from NEW to 
> ALLOCATED
> 2015-11-06 06:34:11,272 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dongwei  
> OPERATION=AM Allocated ContainerTARGET=SchedulerApp 
> RESULT=SUCCESS  APPID=application_1446016482901_292094  
> CONTAINERID=container_1446016482901_292094_01_000141
> 2015-11-06 06:34:11,272 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Assigned container container_1446016482901_292094_01_000141 of capacity 
>  on host mofa010079:50991, which has 30 containers, 
>  used and  available after 
> allocation
> 2015-11-06 06:34:11,295 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
>  
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
>  interrupted. Returning.
> 2015-11-06 06:34:11,296 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 8032
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 8030
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 8032
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 8031
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 8030
> 2015-11-06 06:34:11,300 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 80312015-11-06 06:34:11,300 INFO 
> org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
> {code}
> The 

[jira] [Commented] (ZOOKEEPER-2315) Change client connect zk service timeout log level from Info to Warn level

2015-11-05 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993215#comment-14993215
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2315:


Also, you need to use --no-prefix if you are generating the patch using git 
diff.

> Change client connect zk service timeout log level from Info to Warn level
> --
>
> Key: ZOOKEEPER-2315
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2315
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Lin Yiqun
>Priority: Minor
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2315.001.patch
>
>
> Recently my the resourmanager  of my hadoop cluster is fail suddenly,so I 
> look into the rsourcemanager log.But the log is not helpful for me to direct 
> find the reson until I found the zk timeout info log record.
> {code}
> 2015-11-06 06:34:11,257 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Assigned container container_1446016482901_292094_01_000140 of capacity 
>  on host mofa2089:41361, which has 30 containers, 
>  used and  available after 
> allocation
> 2015-11-06 06:34:11,266 INFO org.apache.zookeeper.ClientCnxn: Unable to 
> reconnect to ZooKeeper service, session 0x24f4fd5118e5c6e has expired, 
> closing socket connection
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1446016482901_292094_01_000105 Container Transitioned from RUNNING 
> to COMPLETED
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: 
> Completed container: container_1446016482901_292094_01_000105 in state: 
> COMPLETED event:FINISHED
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dongwei  
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1446016482901_292094  
> CONTAINERID=container_1446016482901_292094_01_000105
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Released container container_1446016482901_292094_01_000105 of capacity 
>  on host mofa010079:50991, which currently has 29 
> containers,  used and  
> available, release resources=true
> 2015-11-06 06:34:11,271 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Application attempt appattempt_1446016482901_292094_01 released container 
> container_1446016482901_292094_01_000105 on node: host: mofa010079:50991 
> #containers=29 available= used= vCores:29> with event: FINISHED
> 2015-11-06 06:34:11,272 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1446016482901_292094_01_000141 Container Transitioned from NEW to 
> ALLOCATED
> 2015-11-06 06:34:11,272 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dongwei  
> OPERATION=AM Allocated ContainerTARGET=SchedulerApp 
> RESULT=SUCCESS  APPID=application_1446016482901_292094  
> CONTAINERID=container_1446016482901_292094_01_000141
> 2015-11-06 06:34:11,272 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Assigned container container_1446016482901_292094_01_000141 of capacity 
>  on host mofa010079:50991, which has 30 containers, 
>  used and  available after 
> allocation
> 2015-11-06 06:34:11,295 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
>  
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
>  interrupted. Returning.
> 2015-11-06 06:34:11,296 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 8032
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 8030
> 2015-11-06 06:34:11,297 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 8032
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 8031
> 2015-11-06 06:34:11,298 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 8030
> 2015-11-06 06:34:11,300 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 80312015-11-06 06:34:11,300 INFO 
> org.apache.hadoop.ipc.Server: 

[jira] [Updated] (ZOOKEEPER-2312) Fix arrow direction in the 2-phase commit diagram in Zookeeper internal docs

2015-11-03 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2312:
---
Assignee: Raju Bairishetti

> Fix arrow direction in the 2-phase commit diagram in Zookeeper internal docs
> 
>
> Key: ZOOKEEPER-2312
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2312
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Reporter: Raju Bairishetti
>Assignee: Raju Bairishetti
>Priority: Minor
>
> https://zookeeper.apache.org/doc/r3.3.3/zookeeperInternals.html
> Leader issues *commit request* to followers once the ack received from the 
> followers. But the 2-phase commit diagram shows the direction of commit from 
> Follower to Leader.
> [2-phase-commit-image|https://github.com/apache/zookeeper/blob/trunk/src/docs/src/documentation/resources/images/2pc.jpg]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2142) JMX ObjectName is incorrect for observers

2015-10-31 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984193#comment-14984193
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2142:


trunk: http://svn.apache.org/viewvc?view=revision=1711694
branch-3.5: http://svn.apache.org/viewvc?view=revision=1711695
branch-3.4: http://svn.apache.org/viewvc?view=revision=1711696

> JMX ObjectName is incorrect for observers
> -
>
> Key: ZOOKEEPER-2142
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2142
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6, 3.5.1
>Reporter: Karol Dudzinski
>Assignee: Edward Ribeiro
>Priority: Trivial
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2142, ZOOKEEPER-2142.2.patch
>
>
> Observers show up in JMX as StandaloneServer rather than Observer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-2142) JMX ObjectName is incorrect for observers

2015-10-31 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki resolved ZOOKEEPER-2142.

Resolution: Fixed

> JMX ObjectName is incorrect for observers
> -
>
> Key: ZOOKEEPER-2142
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2142
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6, 3.5.1
>Reporter: Karol Dudzinski
>Assignee: Edward Ribeiro
>Priority: Trivial
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2142, ZOOKEEPER-2142.2.patch
>
>
> Observers show up in JMX as StandaloneServer rather than Observer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1872) QuorumPeer is not shutdown in few cases

2015-10-31 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984197#comment-14984197
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1872:


+1 I'll re-trigger the build.

> QuorumPeer is not shutdown in few cases
> ---
>
> Key: ZOOKEEPER-1872
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1872
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>  Labels: test
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: LeaderSessionTrackerTest-output.txt, 
> ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, 
> ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, 
> ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, 
> ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, ZOOKEEPER-1872_br3_4.patch, 
> ZOOKEEPER-1872_br3_4.patch, stack-trace.txt
>
>
> Few cases are leaving quorumpeer running after the test case execution. Needs 
> proper teardown for these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2142) JMX ObjectName is incorrect for observers

2015-10-31 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984179#comment-14984179
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2142:


Sorry I'm jumping in late. The original patch ZOOKEEPER-2142 looks fine to me 
because it adds a new constructor to pass the name as the parameter, so classes 
that extend from ZooKeeperServerBean shouldn't need to implement getName() 
method. I'd refactor other classes (LeaderBean, FollowerBean) to use the new 
constructor and pass their names as well ("Leader" and "Follower") and get rid 
of getName() method from these classes.

> JMX ObjectName is incorrect for observers
> -
>
> Key: ZOOKEEPER-2142
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2142
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6, 3.5.1
>Reporter: Karol Dudzinski
>Assignee: Edward Ribeiro
>Priority: Trivial
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2142, ZOOKEEPER-2142.2.patch
>
>
> Observers show up in JMX as StandaloneServer rather than Observer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2142) JMX ObjectName is incorrect for observers

2015-10-31 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984183#comment-14984183
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2142:


Ok fair enough. Then I'm +1 on ZOOKEEPER-2142.2.patch. I'll check this in if 
you are ok with this patch as well.

> JMX ObjectName is incorrect for observers
> -
>
> Key: ZOOKEEPER-2142
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2142
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6, 3.5.1
>Reporter: Karol Dudzinski
>Assignee: Edward Ribeiro
>Priority: Trivial
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2142, ZOOKEEPER-2142.2.patch
>
>
> Observers show up in JMX as StandaloneServer rather than Observer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2263) ZooKeeper server should not start when neither clientPort no secureClientPort is configured

2015-09-02 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2263:
---
Fix Version/s: (was: 3.5.1)
   3.6.0
   3.5.2

> ZooKeeper server should not start when neither clientPort no secureClientPort 
> is configured
> ---
>
> Key: ZOOKEEPER-2263
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2263
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
>Priority: Minor
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2263-01.patch, ZOOKEEPER-2263-02.patch
>
>
> ZooKeeper server should not start when neither clientPort no secureClientPort 
> is configured.
> Without any client port ZooKeeper server can not server any purpose. It 
> should simply return with proper error message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2261) When only secureClientPort is configured connections, configuration, connection_stat_reset, and stats admin commands throw NullPointerException

2015-09-02 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2261:
---
Fix Version/s: (was: 3.5.1)
   3.6.0
   3.5.2

> When only secureClientPort is configured connections, configuration, 
> connection_stat_reset, and stats admin commands throw NullPointerException
> ---
>
> Key: ZOOKEEPER-2261
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2261
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
> Fix For: 3.5.2, 3.6.0
>
>
> When only secureClientPort is configured connections, configuration, 
> connection_stat_reset and stats admin commands throw NullPointerException. 
> Here is stack trace one of the connections command.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.admin.Commands$ConsCommand.run(Commands.java:177)
>   at 
> org.apache.zookeeper.server.admin.Commands.runCommand(Commands.java:92)
>   at 
> org.apache.zookeeper.server.admin.JettyAdminServer$CommandServlet.doGet(JettyAdminServer.java:166)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2252) Random test case failure in org.apache.zookeeper.test.StaticHostProviderTest

2015-09-02 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2252:
---
Fix Version/s: (was: 3.5.1)
   3.5.2

> Random test case failure in org.apache.zookeeper.test.StaticHostProviderTest
> 
>
> Key: ZOOKEEPER-2252
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2252
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Arshad Mohammad
>Priority: Minor
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZK-2252-try-repro-on-jenkins.patch, 
> ZOOKEEPER-2252-02.patch, ZOOKEEPER-2252-03.patch, ZOOKEEPER-2252-04.patch
>
>
> Test 
> {{org.apache.zookeeper.test.StaticHostProviderTest.testTwoInvalidHostAddresses()}}
>  fails randomly.
> Refer bellow test ci buils:
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2827/testReport/
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2828/testReport/
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2830/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2246) quorum connection manager takes a long time to shut down

2015-08-12 Thread Michi Mutsuzaki (JIRA)
Michi Mutsuzaki created ZOOKEEPER-2246:
--

 Summary: quorum connection manager takes a long time to shut down
 Key: ZOOKEEPER-2246
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2246
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Reporter: Michi Mutsuzaki
 Fix For: 3.5.2, 3.6.0


Receive worker can take a long time to shut down because the socket timeout is 
set to zero: http://s.apache.org/TfI

There was a discussion on the mailing list a while back: http://s.apache.org/cYG



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2140) NettyServerCnxn and NIOServerCnxn code should be improved

2015-07-28 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2140:
---
Fix Version/s: (was: 3.5.2)
   3.5.1

 NettyServerCnxn and NIOServerCnxn code should be improved
 -

 Key: ZOOKEEPER-2140
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2140
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Arshad Mohammad
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2140-1.patch, ZOOKEEPER-2140-2.patch, 
 ZOOKEEPER-2140-3.patch, ZOOKEEPER-2140-4.patch


 Classes org.apache.zookeeper.server.NIOServerCnxn and 
 org.apache.zookeeper.server.NettyServerCnxn have following need and scope for 
 improvement
 1) Duplicate code.
   These two classes have around 250 line duplicate code. All the command 
 code is duplicated
 2) Many improvement/bugFix done in one class but not done in other class. 
 These changes should be synced
 For example
 In NettyServerCnxn
 {code}
// clone should be faster than iteration
 // ie give up the cnxns lock faster
 AbstractSetServerCnxn cnxns;
 synchronized (factory.cnxns) {
 cnxns = new HashSetServerCnxn(factory.cnxns);
 }
 for (ServerCnxn c : cnxns) {
 c.dumpConnectionInfo(pw, false);
 pw.println();
 }
 {code}
 In NIOServerCnxn
 {code}
for (ServerCnxn c : factory.cnxns) {
 c.dumpConnectionInfo(pw, false);
 pw.println();
 }
 {code}
 3) NettyServerCnxn and  NIOServerCnxn classes are bulky unnecessarily. 
 Command classes have altogether different functionality, the command classes 
 should go in different class files.
 If this done it will be easy to add new command with minimal change to 
 existing classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2224) Four letter command hangs when network is slow

2015-07-28 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2224:
---
Fix Version/s: (was: 3.5.2)
   3.5.1

 Four letter command hangs when network is slow
 --

 Key: ZOOKEEPER-2224
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2224
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Reporter: Arshad Mohammad
Assignee: Arshad Mohammad
Priority: Minor
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2224-01.patch, ZOOKEEPER-2224-02.patch, 
 ZOOKEEPER-2224-03.patch, ZOOKEEPER-2224-04.patch, 
 ZOOKEEPER-2224_br_3_4-04.patch


 Four letter command hangs when network is slow or network goes down in 
 between the operation, and the application also, which calling this four 
 letter command,  hangs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2223) support method-level JUnit testcase

2015-07-28 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2223:
---
Fix Version/s: (was: 3.5.2)
   3.5.1

 support method-level JUnit testcase
 ---

 Key: ZOOKEEPER-2223
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2223
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Reporter: Akihiro Suda
Assignee: Akihiro Suda
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2223-v2.patch, ZOOKEEPER-2223-v3.patch, 
 ZOOKEEPER-2223-v4.patch, ZOOKEEPER-2223.patch


 Currently, a user can execute class-level single test, but cannot execute 
 method-level ones.
 This patch adds a support for method-level single test so as to facilitate 
 ease of debugging failing tests (like ZOOKEEPER-2080).
 Class-level test (exists in current version)
 {panel}
 $ ant -Dtestcase=ReconfigRecoveryTest test-core-java
 {panel}
 Method-level test (proposal)
 {panel}
 $ ant -Dtestcase=ReconfigRecoveryTest 
 -Dtest.method=testCurrentObserverIsParticipantInNewConfig test-core-java
 {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

2015-07-28 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2193:
---
Fix Version/s: 3.6.0
   3.5.1

 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
Assignee: Yasuhito Fukuda
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, 
 ZOOKEEPER-2193-v4.patch, ZOOKEEPER-2193-v5.patch, ZOOKEEPER-2193-v6.patch, 
 ZOOKEEPER-2193-v7.patch, ZOOKEEPER-2193-v8.patch, ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2221) Zookeeper JettyAdminServer server should start on configured IP.

2015-07-28 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2221:
---
Fix Version/s: (was: 3.5.2)
   3.5.1

 Zookeeper JettyAdminServer server should start on configured IP.
 

 Key: ZOOKEEPER-2221
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2221
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.0
Reporter: Surendra Singh Lilhore
Assignee: Surendra Singh Lilhore
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2221.patch, ZOOKEEPER-2221.patch, 
 ZOOKEEPER-2221.patch, ZOOKEEPER-2221.patch, ZOOKEEPER-2221_1.patch


 Currently JettyAdminServer starting on 0.0.0.0 IP. 0.0.0.0 means all IP 
 addresses on the local machine. So, if your webserver machine has two ip 
 addresses, 192.168.1.1(private) and 10.1.2.1(public), and you allow a 
 webserver daemon like apache to listen on 0.0.0.0, it will be reachable at 
 both of those IPs.
 This is security issue. webserver should be accessible from only configured IP



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2235) License update

2015-07-25 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641794#comment-14641794
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2235:


Thank you Flavio and Ivan. I'll check this in and create another release 
candidate.

 License update
 --

 Key: ZOOKEEPER-2235
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2235
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.6, 3.5.0
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2235.patch, ZOOKEEPER-2235.patch, 
 ZOOKEEPER-2235.patch, ZOOKEEPER-2235.patch, ZOOKEEPER-2235.patch, 
 notice-dependencies.txt


 Updating license files and notice.txt as needed. Here is a list of the jars 
 we are currently bundling with the release artifact with the corresponding 
 license:
 # commons-cli-1.2.jar -- ASF
 # javacc.jar -- BSD license
 # jline-2.11.jar -- BSD license
 # servlet-api-2.5-20081211.jar - CDDL
 # jackson-core-asl-1.9.11.jar -- ALv2 
 # jetty-6.1.26.jar -- ALv2   
 # log4j-1.2.16.jar -- ALv2   
 # jackson-mapper-asl-1.9.11.jar -- ALv2
 # jetty-util-6.1.26.jar -- ALv2
 # netty-3.7.0.Final.jar -- ALv2
 # slf4j-log4j12-1.7.5.jar -- MIT 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2235) License update

2015-07-17 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631821#comment-14631821
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2235:


Thanks for the patch Flavio. I'll review the patch this weekend.

 License update
 --

 Key: ZOOKEEPER-2235
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2235
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.6, 3.5.0
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Fix For: 3.4.7, 3.5.1

 Attachments: ZOOKEEPER-2235.patch


 Updating license files and notice.txt as needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2223) support method-level JUnit testcase

2015-07-07 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616231#comment-14616231
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2223:


Sure please go ahead and merge this to 3.5. Thanks!

 support method-level JUnit testcase
 ---

 Key: ZOOKEEPER-2223
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2223
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Reporter: Akihiro Suda
Assignee: Akihiro Suda
Priority: Minor
 Fix For: 3.6.0

 Attachments: ZOOKEEPER-2223-v2.patch, ZOOKEEPER-2223-v3.patch, 
 ZOOKEEPER-2223-v4.patch, ZOOKEEPER-2223.patch


 Currently, a user can execute class-level single test, but cannot execute 
 method-level ones.
 This patch adds a support for method-level single test so as to facilitate 
 ease of debugging failing tests (like ZOOKEEPER-2080).
 Class-level test (exists in current version)
 {panel}
 $ ant -Dtestcase=ReconfigRecoveryTest test-core-java
 {panel}
 Method-level test (proposal)
 {panel}
 $ ant -Dtestcase=ReconfigRecoveryTest 
 -Dtest.method=testCurrentObserverIsParticipantInNewConfig test-core-java
 {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1846) Cached InetSocketAddresses prevent proper dynamic DNS resolution

2015-07-06 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616128#comment-14616128
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1846:


Is this a duplicate of ZOOKEEPER-1506?

 Cached InetSocketAddresses prevent proper dynamic DNS resolution
 

 Key: ZOOKEEPER-1846
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1846
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.6
Reporter: Benjamin Jaton
Priority: Minor
  Labels: patch
 Attachments: DynamicIP.java.patch, Learner.java, 
 QuorumCnxManager.java, QuorumPeer.java


 The class QuorumPeer maintains a MapLong, QuorumServer quorumPeers.
 Each QuorumServer is created with an instance of InetSocketAddress 
 electionAddr, and holds it forever.
 I believe this is why the ZooKeeper servers can't resolve each other 
 dynamically: If a ZooKeeper in the ensemble cannot be resolved at startup, it 
 will never be resolved (until restart of the JVM), constantly failing with an 
 UnknownHostException, even when the node is back up and reachable.
 I would suggest to recreate an InetSocketAddress every time we retry the 
 connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2140) NettyServerCnxn and NIOServerCnxn code should be improved

2015-06-29 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607517#comment-14607517
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2140:


You need to edit the Contributors list in 
https://issues.apache.org/jira/plugins/servlet/project-config/ZOOKEEPER/roles .

 NettyServerCnxn and NIOServerCnxn code should be improved
 -

 Key: ZOOKEEPER-2140
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2140
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Arshad Mohammad
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2140-1.patch, ZOOKEEPER-2140-2.patch, 
 ZOOKEEPER-2140-3.patch, ZOOKEEPER-2140-4.patch


 Classes org.apache.zookeeper.server.NIOServerCnxn and 
 org.apache.zookeeper.server.NettyServerCnxn have following need and scope for 
 improvement
 1) Duplicate code.
   These two classes have around 250 line duplicate code. All the command 
 code is duplicated
 2) Many improvement/bugFix done in one class but not done in other class. 
 These changes should be synced
 For example
 In NettyServerCnxn
 {code}
// clone should be faster than iteration
 // ie give up the cnxns lock faster
 AbstractSetServerCnxn cnxns;
 synchronized (factory.cnxns) {
 cnxns = new HashSetServerCnxn(factory.cnxns);
 }
 for (ServerCnxn c : cnxns) {
 c.dumpConnectionInfo(pw, false);
 pw.println();
 }
 {code}
 In NIOServerCnxn
 {code}
for (ServerCnxn c : factory.cnxns) {
 c.dumpConnectionInfo(pw, false);
 pw.println();
 }
 {code}
 3) NettyServerCnxn and  NIOServerCnxn classes are bulky unnecessarily. 
 Command classes have altogether different functionality, the command classes 
 should go in different class files.
 If this done it will be easy to add new command with minimal change to 
 existing classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2170) Zookeeper is not logging as per the configuration in log4j.properties

2015-06-29 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2170:
---
Assignee: Arshad Mohammad

 Zookeeper is not logging as per the configuration in log4j.properties
 -

 Key: ZOOKEEPER-2170
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2170
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Arshad Mohammad
Assignee: Arshad Mohammad
 Fix For: 3.6.0

 Attachments: ZOOKEEPER-2170-002.patch, ZOOKEEPER-2170.001.patch


 In conf/log4j.properties default root logger is 
 {code}
 zookeeper.root.logger=INFO, CONSOLE
 {code}
 Changing root logger to bellow value or any other value does not change 
 logging effect
 {code}
 zookeeper.root.logger=DEBUG, ROLLINGFILE
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2218) Close IO Streams in finally block

2015-06-29 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2218:
---
Fix Version/s: 3.6.0
   3.5.2

 Close IO Streams in finally block
 -

 Key: ZOOKEEPER-2218
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2218
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Tang Xinye
Assignee: Bill Havanki
Priority: Critical
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2218.patch, ZOOKEEPER-2218.patch


 The problem here is that if an exception is thrown during the read process 
 the method will exit without closing the stream and hence without releasing 
 the file system resources, it may run out of resources before it does run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2218) Close IO Streams in finally block

2015-06-29 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607546#comment-14607546
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2218:


Thank you for the patch Tang! I have 2 comments:

- Please replace tabs with spaces.
- It's probably cleaner to use try-with-resources: 
https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html

 Close IO Streams in finally block
 -

 Key: ZOOKEEPER-2218
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2218
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Tang Xinye
Assignee: Bill Havanki
Priority: Critical
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2218.patch, ZOOKEEPER-2218.patch


 The problem here is that if an exception is thrown during the read process 
 the method will exit without closing the stream and hence without releasing 
 the file system resources, it may run out of resources before it does run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2210) clock_gettime is not available in os x

2015-06-20 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594766#comment-14594766
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2210:


Thank you for the review Chris and Raul. I'll update the patch to address 
Chris' comments.

 clock_gettime is not available in os x
 --

 Key: ZOOKEEPER-2210
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2210
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2210.patch


 {noformat}
 src/zookeeper.c:286:9: warning: implicit declaration of function 
 'clock_gettime' is invalid in C99 [-Wimplicit-function-declaration]
   ret = clock_gettime(CLOCK_MONOTONIC, ts);
 ^
 src/zookeeper.c:286:23: error: use of undeclared identifier 'CLOCK_MONOTONIC'
   ret = clock_gettime(CLOCK_MONOTONIC, ts);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-1626) Zookeeper C client should be tolerant of clock adjustments

2015-06-20 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki resolved ZOOKEEPER-1626.

Resolution: Fixed

The issue on os x is fixed in ZOOKEEPER-2210.

 Zookeeper C client should be tolerant of clock adjustments 
 ---

 Key: ZOOKEEPER-1626
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1626
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: c client
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1366.001.patch, ZOOKEEPER-1366.002.patch, 
 ZOOKEEPER-1366.003.patch, ZOOKEEPER-1366.004.patch, ZOOKEEPER-1366.006.patch, 
 ZOOKEEPER-1366.007.patch, ZOOKEEPER-1626.patch


 The Zookeeper C client should use monotonic time when available, in order to 
 be more tolerant of time adjustments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1029) C client bug in zookeeper_init (if bad hostname is given)

2015-06-20 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1029:
---
Fix Version/s: (was: 3.5.1)
   3.5.2

 C client bug in zookeeper_init (if bad hostname is given)
 -

 Key: ZOOKEEPER-1029
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1029
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.2, 3.4.6, 3.5.0
Reporter: Dheeraj Agrawal
 Fix For: 3.4.7, 3.5.2


 If you give invalid hostname to zookeeper_init method, it's not able to 
 resolve it, and it tries to do the cleanup (free buffer/completion lists/etc) 
 . The adaptor_init() is not called for this code path, so the lock,cond 
 variables (for adaptor, completion lists) are not initialized.
 As part of the cleanup it's trying to clean up some buffers and acquires 
 locks and unlocks (where the locks have not yet been initialized, so 
 unlocking fails) 
 lock_completion_list(zh-sent_requests); - pthread_mutex/cond not 
 initialized
 tmp_list = zh-sent_requests;
 zh-sent_requests.head = 0;
 zh-sent_requests.last = 0;
 unlock_completion_list(zh-sent_requests);   trying to broadcast here 
 on uninitialized cond
 It should do error checking to see if locking succeeds before unlocking it. 
 If Locking fails, then appropriate error handling has to be done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2163) Introduce new ZNode type: container

2015-06-20 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2163:
---
Fix Version/s: 3.6.0

 Introduce new ZNode type: container
 ---

 Key: ZOOKEEPER-2163
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
 Project: ZooKeeper
  Issue Type: New Feature
  Components: c client, java client, server
Affects Versions: 3.5.0
Reporter: Jordan Zimmerman
Assignee: Jordan Zimmerman
 Fix For: 3.5.1, 3.6.0

 Attachments: zookeeper-2163.10.patch, zookeeper-2163.11.patch, 
 zookeeper-2163.12.patch, zookeeper-2163.13.patch, zookeeper-2163.14.patch, 
 zookeeper-2163.15.patch, zookeeper-2163.3.patch, zookeeper-2163.5.patch, 
 zookeeper-2163.6.patch, zookeeper-2163.7.patch, zookeeper-2163.8.patch, 
 zookeeper-2163.9.patch


 BACKGROUND
 
 A recurring problem for ZooKeeper users is garbage collection of parent 
 nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a 
 parent node under which participants create sequential nodes. When the 
 participant is done, it deletes its node. In practice, the ZooKeeper tree 
 begins to fill up with orphaned parent nodes that are no longer needed. The 
 ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can 
 become unstable due to the number of these nodes.
 CURRENT SOLUTIONS
 ===
 Apache Curator has a workaround solution for this by providing the Reaper 
 class which runs in the background looking for orphaned parent nodes and 
 deleting them. This isn’t ideal and it would be better if ZooKeeper supported 
 this directly.
 PROPOSAL
 =
 ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes 
 to contain child nodes. This is not optimum as EPHEMERALs are tied to a 
 session and the general use case of parent nodes is for PERSISTENT nodes. 
 This proposal adds a new node type, CONTAINER. A CONTAINER node is the same 
 as a PERSISTENT node with the additional property that when its last child is 
 deleted, it is deleted (and CONTAINER nodes recursively up the tree are 
 deleted if empty).
 CANONICAL USAGE
 
 {code}
 while ( true) { // or some reasonable limit
 try {
 zk.create(path, ...);
 break;
 } catch ( KeeperException.NoNodeException e ) {
 try {
 zk.createContainer(containerPath, ...);
 } catch ( KeeperException.NodeExistsException ignore) {
}
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2210) clock_gettime is not available in os x

2015-06-20 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2210:
---
Attachment: ZOOKEEPER-2210.patch

 clock_gettime is not available in os x
 --

 Key: ZOOKEEPER-2210
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2210
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2210.patch, ZOOKEEPER-2210.patch


 {noformat}
 src/zookeeper.c:286:9: warning: implicit declaration of function 
 'clock_gettime' is invalid in C99 [-Wimplicit-function-declaration]
   ret = clock_gettime(CLOCK_MONOTONIC, ts);
 ^
 src/zookeeper.c:286:23: error: use of undeclared identifier 'CLOCK_MONOTONIC'
   ret = clock_gettime(CLOCK_MONOTONIC, ts);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2211) PurgeTxnLog does not correctly purge when snapshots and logs are at different locations

2015-06-20 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2211:
---
Fix Version/s: (was: 3.5.1)
   (was: 3.4.6)
   3.5.2
   3.4.7

 PurgeTxnLog does not correctly purge when snapshots and logs are at different 
 locations
 ---

 Key: ZOOKEEPER-2211
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2211
 Project: ZooKeeper
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.4.6, 3.5.0
 Environment: Ubuntu 12.04, Java 1.7.
Reporter: Wesley Chow
Assignee: Wesley Chow
Priority: Minor
 Fix For: 3.4.7, 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2211.patch


 PurgeTxnLog does not work when snapshots and transaction logs are at 
 different file paths. The argument handling is buggy and only works when both 
 snap and datalog dirs are given, and datalog dir contains both logs and snaps 
 (snap is ignored).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2211) PurgeTxnLog does not correctly purge when snapshots and logs are at different locations

2015-06-20 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2211:
---
Assignee: Wesley Chow

 PurgeTxnLog does not correctly purge when snapshots and logs are at different 
 locations
 ---

 Key: ZOOKEEPER-2211
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2211
 Project: ZooKeeper
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.4.6, 3.5.0
 Environment: Ubuntu 12.04, Java 1.7.
Reporter: Wesley Chow
Assignee: Wesley Chow
Priority: Minor
 Fix For: 3.4.7, 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2211.patch


 PurgeTxnLog does not work when snapshots and transaction logs are at 
 different file paths. The argument handling is buggy and only works when both 
 snap and datalog dirs are given, and datalog dir contains both logs and snaps 
 (snap is ignored).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2210) clock_gettime is not available in os x

2015-06-13 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2210:
---
Attachment: ZOOKEEPER-2210.patch

 clock_gettime is not available in os x
 --

 Key: ZOOKEEPER-2210
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2210
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2210.patch


 {noformat}
 src/zookeeper.c:286:9: warning: implicit declaration of function 
 'clock_gettime' is invalid in C99 [-Wimplicit-function-declaration]
   ret = clock_gettime(CLOCK_MONOTONIC, ts);
 ^
 src/zookeeper.c:286:23: error: use of undeclared identifier 'CLOCK_MONOTONIC'
   ret = clock_gettime(CLOCK_MONOTONIC, ts);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-10 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2213:
---
Fix Version/s: 3.6.0
   3.5.1
   3.4.7

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (ZOOKEEPER-2163) Introduce new ZNode type: container

2015-06-09 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki reopened ZOOKEEPER-2163:


[~rgs] could you address Rakesh's comment?

 Introduce new ZNode type: container
 ---

 Key: ZOOKEEPER-2163
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
 Project: ZooKeeper
  Issue Type: New Feature
  Components: c client, java client, server
Affects Versions: 3.5.0
Reporter: Jordan Zimmerman
Assignee: Jordan Zimmerman
 Fix For: 3.5.1, 3.6.0

 Attachments: zookeeper-2163.10.patch, zookeeper-2163.11.patch, 
 zookeeper-2163.12.patch, zookeeper-2163.13.patch, zookeeper-2163.14.patch, 
 zookeeper-2163.3.patch, zookeeper-2163.5.patch, zookeeper-2163.6.patch, 
 zookeeper-2163.7.patch, zookeeper-2163.8.patch, zookeeper-2163.9.patch


 BACKGROUND
 
 A recurring problem for ZooKeeper users is garbage collection of parent 
 nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a 
 parent node under which participants create sequential nodes. When the 
 participant is done, it deletes its node. In practice, the ZooKeeper tree 
 begins to fill up with orphaned parent nodes that are no longer needed. The 
 ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can 
 become unstable due to the number of these nodes.
 CURRENT SOLUTIONS
 ===
 Apache Curator has a workaround solution for this by providing the Reaper 
 class which runs in the background looking for orphaned parent nodes and 
 deleting them. This isn’t ideal and it would be better if ZooKeeper supported 
 this directly.
 PROPOSAL
 =
 ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes 
 to contain child nodes. This is not optimum as EPHEMERALs are tied to a 
 session and the general use case of parent nodes is for PERSISTENT nodes. 
 This proposal adds a new node type, CONTAINER. A CONTAINER node is the same 
 as a PERSISTENT node with the additional property that when its last child is 
 deleted, it is deleted (and CONTAINER nodes recursively up the tree are 
 deleted if empty).
 CANONICAL USAGE
 
 {code}
 while ( true) { // or some reasonable limit
 try {
 zk.create(path, ...);
 break;
 } catch ( KeeperException.NoNodeException e ) {
 try {
 zk.createContainer(containerPath, ...);
 } catch ( KeeperException.NodeExistsException ignore) {
}
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2210) clock_gettime is not available in os x

2015-06-08 Thread Michi Mutsuzaki (JIRA)
Michi Mutsuzaki created ZOOKEEPER-2210:
--

 Summary: clock_gettime is not available in os x
 Key: ZOOKEEPER-2210
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2210
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.5.1, 3.6.0


{noformat}
src/zookeeper.c:286:9: warning: implicit declaration of function 
'clock_gettime' is invalid in C99 [-Wimplicit-function-declaration]
  ret = clock_gettime(CLOCK_MONOTONIC, ts);
^
src/zookeeper.c:286:23: error: use of undeclared identifier 'CLOCK_MONOTONIC'
  ret = clock_gettime(CLOCK_MONOTONIC, ts);
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2202) Cluster crashes when reconfig adds an unreachable observer

2015-06-08 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2202:
---
Fix Version/s: (was: 3.5.1)
   3.5.2

 Cluster crashes when reconfig adds an unreachable observer
 --

 Key: ZOOKEEPER-2202
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2202
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.5.0, 3.6.0
Reporter: Raul Gutierrez Segales
 Fix For: 3.5.2, 3.6.0


 While adding support for reconfig() in Kazoo 
 (https://github.com/python-zk/kazoo/pull/333) I found that the cluster can be 
 crashed if you add an observer whose election port isn't reachable (i.e.: 
 packets for that destination are dropped, not rejected). This will raise a 
 SocketTimeoutException which will bring down the PrepRequestProcessor:
 {code}
 2015-06-02 14:37:16,473 [myid:3] - WARN  [ProcessThread(sid:3 
 cport:-1)::QuorumCnxManager@384] - Cannot open channel to 100 at election 
 address /8.8.8.8:38703
 java.net.SocketTimeoutException: connect timed out
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at 
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
 at 
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
 at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
 at java.net.Socket.connect(Socket.java:589)
 at 
 org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:369)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1288)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1315)
 at org.apache.zookeeper.server.quorum.Leader.propose(Leader.java:1056)
 at 
 org.apache.zookeeper.server.quorum.ProposalRequestProcessor.processRequest(ProposalRequestProcessor.java:78)
 at 
 org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:877)
 at 
 org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:143)
 {code}
 A simple repro can be obtained by using the code in the referenced pull 
 request above and using 8.8.8.8:3888 (for example) instead of a free (but 
 closed) port in the loopback. 
 I think that adding an Observer (or a Participant) that isn't currently 
 reachable is a valid use case (i.e.: you are provisioning the machine and 
 it's not currently needed) so I think we could handle this with lower connect 
 timeouts, not sure. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2163) Introduce new ZNode type: container

2015-06-08 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2163:
---
Fix Version/s: (was: 3.5.0)
   3.6.0
   3.5.1

 Introduce new ZNode type: container
 ---

 Key: ZOOKEEPER-2163
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
 Project: ZooKeeper
  Issue Type: New Feature
  Components: c client, java client, server
Affects Versions: 3.5.0
Reporter: Jordan Zimmerman
Assignee: Jordan Zimmerman
 Fix For: 3.5.1, 3.6.0

 Attachments: zookeeper-2163.10.patch, zookeeper-2163.11.patch, 
 zookeeper-2163.12.patch, zookeeper-2163.13.patch, zookeeper-2163.14.patch, 
 zookeeper-2163.3.patch, zookeeper-2163.5.patch, zookeeper-2163.6.patch, 
 zookeeper-2163.7.patch, zookeeper-2163.8.patch, zookeeper-2163.9.patch


 BACKGROUND
 
 A recurring problem for ZooKeeper users is garbage collection of parent 
 nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a 
 parent node under which participants create sequential nodes. When the 
 participant is done, it deletes its node. In practice, the ZooKeeper tree 
 begins to fill up with orphaned parent nodes that are no longer needed. The 
 ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can 
 become unstable due to the number of these nodes.
 CURRENT SOLUTIONS
 ===
 Apache Curator has a workaround solution for this by providing the Reaper 
 class which runs in the background looking for orphaned parent nodes and 
 deleting them. This isn’t ideal and it would be better if ZooKeeper supported 
 this directly.
 PROPOSAL
 =
 ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes 
 to contain child nodes. This is not optimum as EPHEMERALs are tied to a 
 session and the general use case of parent nodes is for PERSISTENT nodes. 
 This proposal adds a new node type, CONTAINER. A CONTAINER node is the same 
 as a PERSISTENT node with the additional property that when its last child is 
 deleted, it is deleted (and CONTAINER nodes recursively up the tree are 
 deleted if empty).
 CANONICAL USAGE
 
 {code}
 while ( true) { // or some reasonable limit
 try {
 zk.create(path, ...);
 break;
 } catch ( KeeperException.NoNodeException e ) {
 try {
 zk.createContainer(containerPath, ...);
 } catch ( KeeperException.NodeExistsException ignore) {
}
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2199) Don't include unistd.h in windows

2015-06-01 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566982#comment-14566982
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2199:


Oops sorry I totally missed that.

 Don't include unistd.h in windows
 -

 Key: ZOOKEEPER-2199
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2199
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2199.patch


 Windows doesn't have unistd.h.
 https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-WinVS2008/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2178) Native client fails compilation on Windows.

2015-06-01 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2178:
---
Fix Version/s: 3.6.0
   3.5.1

 Native client fails compilation on Windows.
 ---

 Key: ZOOKEEPER-2178
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2178
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
 Environment: Windows
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2178.001.patch


 Due to several recent changes, the native client fails to compile on Windows:
 # ZOOKEEPER-827 (read-only mode) mismatched a function return type between 
 the declaration and definition.
 # ZOOKEEPER-1626 (monotonic clock for tolerance to time adjustments) added an 
 include of unistd.h, which does not exist on Windows.
 # Additionally, ZOOKEEPER-1626 did not implement a code path for accessing 
 the Windows monotonic clock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2178) Native client fails compilation on Windows.

2015-06-01 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566985#comment-14566985
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2178:


+1 good catch. Thanks Chris!

 Native client fails compilation on Windows.
 ---

 Key: ZOOKEEPER-2178
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2178
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
 Environment: Windows
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2178.001.patch


 Due to several recent changes, the native client fails to compile on Windows:
 # ZOOKEEPER-827 (read-only mode) mismatched a function return type between 
 the declaration and definition.
 # ZOOKEEPER-1626 (monotonic clock for tolerance to time adjustments) added an 
 include of unistd.h, which does not exist on Windows.
 # Additionally, ZOOKEEPER-1626 did not implement a code path for accessing 
 the Windows monotonic clock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2197) non-ascii character in FinalRequestProcessor.java

2015-06-01 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2197:
---
Attachment: ZOOKEEPER-2197.patch

Addressed Chris's comment.

 non-ascii character in FinalRequestProcessor.java
 -

 Key: ZOOKEEPER-2197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2197.patch, ZOOKEEPER-2197.patch, 
 ZOOKEEPER-2197.patch, ZOOKEEPER-2197.patch


 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134: 
 error: unmappable character for encoding ASCII
 [javac] // was not being queued ??? ZOOKEEPER-558) properly. This 
 happens, for example,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2178) Native client fails compilation on Windows.

2015-06-01 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567002#comment-14567002
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2178:


https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008/

 Native client fails compilation on Windows.
 ---

 Key: ZOOKEEPER-2178
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2178
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
 Environment: Windows
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2178.001.patch


 Due to several recent changes, the native client fails to compile on Windows:
 # ZOOKEEPER-827 (read-only mode) mismatched a function return type between 
 the declaration and definition.
 # ZOOKEEPER-1626 (monotonic clock for tolerance to time adjustments) added an 
 include of unistd.h, which does not exist on Windows.
 # Additionally, ZOOKEEPER-1626 did not implement a code path for accessing 
 the Windows monotonic clock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2197) non-ascii character in FinalRequestProcessor.java

2015-06-01 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2197:
---
Attachment: ZOOKEEPER-2197.patch

Addressed Raul's comment.

 non-ascii character in FinalRequestProcessor.java
 -

 Key: ZOOKEEPER-2197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2197.patch, ZOOKEEPER-2197.patch, 
 ZOOKEEPER-2197.patch, ZOOKEEPER-2197.patch, ZOOKEEPER-2197.patch


 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134: 
 error: unmappable character for encoding ASCII
 [javac] // was not being queued ??? ZOOKEEPER-558) properly. This 
 happens, for example,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2199) Don't include unistd.h in windows

2015-05-31 Thread Michi Mutsuzaki (JIRA)
Michi Mutsuzaki created ZOOKEEPER-2199:
--

 Summary: Don't include unistd.h in windows
 Key: ZOOKEEPER-2199
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2199
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.5.1, 3.6.0


Windows doesn't have unistd.h.

https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-WinVS2008/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2199) Don't include unistd.h in windows

2015-05-31 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2199:
---
Attachment: ZOOKEEPER-2199.patch

 Don't include unistd.h in windows
 -

 Key: ZOOKEEPER-2199
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2199
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2199.patch


 Windows doesn't have unistd.h.
 https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-WinVS2008/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2197) non-ascii character in FinalRequestProcessor.java

2015-05-31 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2197:
---
Attachment: ZOOKEEPER-2197.patch

 non-ascii character in FinalRequestProcessor.java
 -

 Key: ZOOKEEPER-2197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2197.patch, ZOOKEEPER-2197.patch, 
 ZOOKEEPER-2197.patch


 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134: 
 error: unmappable character for encoding ASCII
 [javac] // was not being queued ??? ZOOKEEPER-558) properly. This 
 happens, for example,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2164) fast leader election keeps failing

2015-05-31 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2164:
---
Fix Version/s: 3.6.0
   3.5.2

 fast leader election keeps failing
 --

 Key: ZOOKEEPER-2164
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.5
Reporter: Michi Mutsuzaki
Assignee: Hongchao Deng
 Fix For: 3.5.2, 3.6.0


 I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. 
 When I shut down 2, 1 and 3 keep going back to leader election. Here is what 
 seems to be happening.
 - Both 1 and 3 elect 3 as the leader.
 - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a 
 follower.
 - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't 
 timeout for 5 seconds: 
 https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346
 - By the time 3 receives votes, 1 has given up trying to connect to 3: 
 https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247
 I'm using 3.4.5, but it looks like this part of the code hasn't changed for a 
 while, so I'm guessing later versions have the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2197) non-ascii character in FinalRequestProcessor.java

2015-05-31 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2197:
---
Attachment: ZOOKEEPER-2197.patch

 non-ascii character in FinalRequestProcessor.java
 -

 Key: ZOOKEEPER-2197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2197.patch, ZOOKEEPER-2197.patch


 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134: 
 error: unmappable character for encoding ASCII
 [javac] // was not being queued ??? ZOOKEEPER-558) properly. This 
 happens, for example,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2198) Set default test.junit.threads to 1.

2015-05-31 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566474#comment-14566474
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2198:


+1 Thanks Chris!

 Set default test.junit.threads to 1.
 

 Key: ZOOKEEPER-2198
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2198
 Project: ZooKeeper
  Issue Type: Bug
  Components: build
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2198.001.patch


 Some systems are seeing test failures under concurrent execution.  This issue 
 proposes to change the default {{test.junit.threads}} to 1 so that those 
 environments continue to get consistent test runs.  Jenkins and individual 
 developer environments can set multiple threads with a command line argument, 
 so most environments will still get the benefit of faster test runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2197) non-ascii character in FinalRequestProcessor.java

2015-05-30 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2197:
---
Attachment: ZOOKEEPER-2197.patch

 non-ascii character in FinalRequestProcessor.java
 -

 Key: ZOOKEEPER-2197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2197.patch


 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134: 
 error: unmappable character for encoding ASCII
 [javac] // was not being queued ??? ZOOKEEPER-558) properly. This 
 happens, for example,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2197) non-ascii character in FinalRequestProcessor.java

2015-05-30 Thread Michi Mutsuzaki (JIRA)
Michi Mutsuzaki created ZOOKEEPER-2197:
--

 Summary: non-ascii character in FinalRequestProcessor.java
 Key: ZOOKEEPER-2197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.5.1, 3.6.0


src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134: 
error: unmappable character for encoding ASCII
[javac] // was not being queued ??? ZOOKEEPER-558) properly. This 
happens, for example,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2197) non-ascii character in FinalRequestProcessor.java

2015-05-30 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566313#comment-14566313
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2197:


that sounds fine. i guess we can set the encoding in build.xml?

 non-ascii character in FinalRequestProcessor.java
 -

 Key: ZOOKEEPER-2197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2197.patch


 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134: 
 error: unmappable character for encoding ASCII
 [javac] // was not being queued ??? ZOOKEEPER-558) properly. This 
 happens, for example,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

2015-05-30 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566302#comment-14566302
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2193:


Thank you for the patch 


 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
Assignee: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, 
 ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

2015-05-30 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2193:
---
Comment: was deleted

(was: Thank you for the patch 
)

 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
Assignee: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, 
 ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-30 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566340#comment-14566340
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2172:


I'm guessing node1 is hitting this case?

https://github.com/apache/zookeeper/blob/76bb6747c8250f28157636cf4011b78e7569727a/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L332

In this case we don't log the message that gets sent out.

 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, 
 zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
 zookeeper-2.log, zookeeper-3.log


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2116) zkCli.sh doesn't honor host:port parameter

2015-05-28 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563410#comment-14563410
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2116:


[~surendrasingh] which jira fixes this issue?

 zkCli.sh doesn't honor host:port parameter
 --

 Key: ZOOKEEPER-2116
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2116
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client, scripts
Affects Versions: 3.4.6
 Environment: Ubuntu 12
Reporter: Maxim Novikov
Assignee: surendra singh lilhore
Priority: Critical
 Fix For: 3.6.0


 This doc http://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html 
 (Connecting to ZooKeeper section) says:
 Once ZooKeeper is running, you have several options for connection to it:
 Java: Use
 bin/zkCli.sh 127.0.0.1:2181
 In fact, it doesn't work that way. I am running ZooKeeper with a different 
 port to listen to client connections (2888), and this command
 {code}
 bin/zkCli.sh 127.0.0.1:2888
 {code}
 is still trying to connect to 2181.
 {code:title=output|borderStyle=solid}
 Connecting to localhost:2181
 2015-02-11 15:38:14,415 [myid:] - INFO  [main:Environment@100] - Client 
 environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
 2015-02-11 15:38:14,421 [myid:] - INFO  [main:Environment@100] - Client 
 environment:host.name=localhost
 2015-02-11 15:38:14,421 [myid:] - INFO  [main:Environment@100] - Client 
 environment:java.version=1.7.0_17
 2015-02-11 15:38:14,424 [myid:] - INFO  [main:Environment@100] - Client 
 environment:java.vendor=Oracle Corporation
 2015-02-11 15:38:14,424 [myid:] - INFO  [main:Environment@100] - Client 
 environment:java.home=/usr/java/jdk1.7.0_17/jre
 2015-02-11 15:38:14,424 [myid:] - INFO  [main:Environment@100] - Client 
 environment:java.class.path=/opt/zookeeper-3.4.6/bin/../build/classes:/opt/zookeeper-3.4.6/bin/../build/lib/*.jar:/opt/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/opt/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/opt/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/opt/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/opt/zookeeper-3.4.6/bin/../src/java/lib/*.jar:../conf::/usr/share/antlr3/lib/antlr-3.5-complete-no-st3.jar
 2015-02-11 15:38:14,425 [myid:] - INFO  [main:Environment@100] - Client 
 environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
 2015-02-11 15:38:14,425 [myid:] - INFO  [main:Environment@100] - Client 
 environment:java.io.tmpdir=/tmp
 2015-02-11 15:38:14,425 [myid:] - INFO  [main:Environment@100] - Client 
 environment:java.compiler=NA
 2015-02-11 15:38:14,425 [myid:] - INFO  [main:Environment@100] - Client 
 environment:os.name=Linux
 2015-02-11 15:38:14,425 [myid:] - INFO  [main:Environment@100] - Client 
 environment:os.arch=amd64
 2015-02-11 15:38:14,426 [myid:] - INFO  [main:Environment@100] - Client 
 environment:os.version=3.8.0-41-generic
 2015-02-11 15:38:14,426 [myid:] - INFO  [main:Environment@100] - Client 
 environment:user.name=mnovikov
 2015-02-11 15:38:14,426 [myid:] - INFO  [main:Environment@100] - Client 
 environment:user.home=/home/mnovikov
 2015-02-11 15:38:14,426 [myid:] - INFO  [main:Environment@100] - Client 
 environment:user.dir=/opt/zookeeper-3.4.6/bin
 2015-02-11 15:38:14,428 [myid:] - INFO  [main:ZooKeeper@438] - Initiating 
 client connection, connectString=localhost:2181 sessionTimeout=3 
 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@3107eafc
 Welcome to ZooKeeper!
 2015-02-11 15:38:14,471 [myid:] - INFO  
 [main-SendThread(localhost:2181):ClientCnxn$SendThread@975] - Opening socket 
 connection to server localhost/127.0.0.1:2181. Will not attempt to 
 authenticate using SASL (unknown error)
 2015-02-11 15:38:14,479 [myid:] - WARN  
 [main-SendThread(localhost:2181):ClientCnxn$SendThread@1102] - Session 0x0 
 for server null, unexpected error, closing socket connection and attempting 
 reconnect
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
 {code}
 PS1 I can connect to ZK at 2888 using ZK Java client from code specifying the 
 correct port with no issues. But CLI seems just to ignore the provided 
 host:port parameter.
 PS2 Tried to run it with the pre-defined ZOOCFGDIR environment variable (to 
 point to the path with the config file where the client port is set to 2888). 
 No luck, same results as 

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-25 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558059#comment-14558059
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2172:


Thanks Ziyou.

[~fpj] [~shralex] it looks like node1 and node2 are not forming a quorum 
because node2 has seen zxid 0x10059 but node1 keeps sending 0x0 as its 
zxid. Isn't node1 supposed send the highest zxid it has seen?

From zookeeper-1.log:

{noformat}
2015-05-25 12:34:36,920 [myid:1] - DEBUG 
[WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - 
Sending new notification. My id =1 recipient=2 zxid=0x0 leader=1 config version 
= 10049
2015-05-25 12:34:39,090 [myid:1] - DEBUG 
[WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - 
Sending new notification. My id =1 recipient=3 zxid=0x0 leader=1 config version 
= 10049
2015-05-25 12:35:28,128 [myid:1] - DEBUG 
[WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - 
Sending new notification. My id =1 recipient=2 zxid=0x0 leader=1 config version 
= 10049
2015-05-25 12:35:30,301 [myid:1] - DEBUG 
[WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - 
Sending new notification. My id =1 recipient=3 zxid=0x0 leader=1 config version 
= 10049
{noformat}

From zookeeper-2.log:

{noformat}
2015-05-25 12:34:36,918 [myid:2] - INFO  
[WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message 
format version), 2 (n.leader), 0x10059 (n.zxid), 0x1 (n.round), LOOKING 
(n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)10049 (n.config 
version)
2015-05-25 12:34:36,923 [myid:2] - INFO  
[WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message 
format version), 1 (n.leader), 0x0 (n.zxid), 0x (n.round), 
LEADING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)10049 
(n.config version)
2015-05-25 12:35:28,124 [myid:2] - DEBUG 
[QuorumPeer[myid=2]/10.0.0.2:1300:FastLeaderElection@688] - Sending 
Notification: 2 (n.leader), 0x10059 (n.zxid), 0x1 (n.round), 1 (recipient), 
2 (myid), 0x1 (n.peerEpoch)
2015-05-25 12:35:28,125 [myid:2] - DEBUG 
[QuorumPeer[myid=2]/10.0.0.2:1300:FastLeaderElection@688] - Sending 
Notification: 2 (n.leader), 0x10059 (n.zxid), 0x1 (n.round), 2 (recipient), 
2 (myid), 0x1 (n.peerEpoch)
{noformat}


 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, 
 zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
 zookeeper-2.log, zookeeper-3.log


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-05-24 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1506:
---
Assignee: Raul Gutierrez Segales  (was: Michi Mutsuzaki)

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Raul Gutierrez Segales
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


In our zoo.cfg we use hostnames to identify the ZK servers that are part 
 of an ensemble. These hostnames are configured with a low (= 60s) TTL and 
 the IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1927) zkServer.sh fails to read dataDir (and others) from zoo.cfg on Solaris 10 (grep issue, manifests as FAILED TO WRITE PID).

2015-05-24 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1927:
---
Fix Version/s: 3.6.0
   3.5.2

 zkServer.sh fails to read dataDir (and others) from zoo.cfg on Solaris 10 
 (grep issue, manifests as FAILED TO WRITE PID).  
 ---

 Key: ZOOKEEPER-1927
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1927
 Project: ZooKeeper
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.4.6
 Environment: Solaris 5.10 
Reporter: Ed Schmed
Assignee: Chris Nauroth
 Fix For: 3.4.7, 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-1927.001.patch


 Fails to write PID file with a permissions error, because the startup script 
 fails to read the dataDir variable from zoo.cfg, and then tries to use the 
 drive root ( / ) as the data dir.
 Tracked the problem down to line 84 of zkServer.sh:
 ZOO_DATADIR=$(grep ^[[:space:]]*dataDir $ZOOCFG | sed -e 's/.*=//')
 If i run just that line and point it right at the config file, ZOO_DATADIR is 
 empty.
 If I remove [[:space:]]* from the grep:
 ZOO_DATADIR=$(grep ^dataDir $ZOOCFG | sed -e 's/.*=//')
 Then it works fine. (If I also make the same change on line 164 and 169)
 My regex skills are pretty bad, so I'm afraid to comment on why [[space]]* 
 needs to be in there?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1868) Server not coming back up in QuorumZxidSyncTest

2015-05-24 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557887#comment-14557887
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1868:


[~fpj] let us know if you are still seeing this issue.

 Server not coming back up in QuorumZxidSyncTest
 ---

 Key: ZOOKEEPER-1868
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1868
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Flavio Junqueira
 Fix For: 3.4.7

 Attachments: QuorumZxidSyncTest-output.txt


 We got this stack trace:
 {noformat}
 [junit] 2014-01-27 09:14:08,481 [myid:] - INFO  [main:ZKTestCase$1@65] - 
 FAILED testLateLogs
 [junit] java.lang.AssertionError: waiting for server up
 [junit]   at org.junit.Assert.fail(Assert.java:91)
 [junit]   at org.junit.Assert.assertTrue(Assert.java:43)
 [junit]   at 
 org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:188)
 [junit]   at 
 org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:113)
 [junit]   at 
 org.apache.zookeeper.test.QuorumZxidSyncTest.testLateLogs(QuorumZxidSyncTest.java:116)
 {noformat}
 which occurs here, when we stop the servers and restart them.
 {noformat}
 qb.shutdownServers();
 qb.startServers();
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1872) QuorumPeer is not shutdown in few cases

2015-05-24 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557885#comment-14557885
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1872:


[~rakeshr] what's the status of this issue?

 QuorumPeer is not shutdown in few cases
 ---

 Key: ZOOKEEPER-1872
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1872
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
  Labels: test
 Fix For: 3.5.2, 3.6.0

 Attachments: LeaderSessionTrackerTest-output.txt, 
 ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, 
 ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, 
 ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, ZOOKEEPER-1872.patch, 
 ZOOKEEPER-1872.patch, ZOOKEEPER-1872_br3_4.patch, ZOOKEEPER-1872_br3_4.patch, 
 stack-trace.txt


 Few cases are leaving quorumpeer running after the test case execution. Needs 
 proper teardown for these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2124) Allow Zookeeper version string to have underscore '_'

2015-05-24 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557656#comment-14557656
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2124:


+1 for the 3.4 patch. Thanks Chris!

 Allow Zookeeper version string to have underscore '_'
 -

 Key: ZOOKEEPER-2124
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2124
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.6
Reporter: Jerry He
Assignee: Chris Nauroth
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2124-branch-3.4.001.patch, 
 ZOOKEEPER-2124.001.patch, zookeeper-rpm-spec-files.patch


 Using Bigtop or other RPM build for Zookeeper, there is a problem with using 
 the hyphen '-' character in the version string:
 {noformat}
 [bigdata@bdvs1166 bigtop]$ gradle zookeeper-rpm
 :buildSrc:compileJava UP-TO-DATE
 :buildSrc:compileGroovy UP-TO-DATE
 :buildSrc:processResources UP-TO-DATE
 :buildSrc:classes UP-TO-DATE
 :buildSrc:jar UP-TO-DATE
 :buildSrc:assemble UP-TO-DATE
 :buildSrc:compileTestJava UP-TO-DATE
 :buildSrc:compileTestGroovy UP-TO-DATE
 :buildSrc:processTestResources UP-TO-DATE
 :buildSrc:testClasses UP-TO-DATE
 :buildSrc:test UP-TO-DATE
 :buildSrc:check UP-TO-DATE
 :buildSrc:build UP-TO-DATE
 :zookeeper_vardefines
 :zookeeper-download
 :zookeeper-tar
 Copy /home/bigdata/bigtop/dl/zookeeper-3.4.6-IBM-1.tar.gz to 
 /home/bigdata/bigtop/build/zookeeper/tar/zookeeper-3.4.6-IBM-1.tar.gz
 :zookeeper-srpm
 error: line 64: Illegal char '-' in: Version: 3.4.6-IBM-1
 :zookeeper-srpm FAILED
 FAILURE: Build failed with an exception.
 * Where:
 Script '/home/bigdata/bigtop/packages.gradle' line: 462
 * What went wrong:
 Execution failed for task ':zookeeper-srpm'.
  Process 'command 'rpmbuild'' finished with non-zero exit value 1
 * Try:
 Run with --stacktrace option to get the stack trace. Run with --info or 
 --debug option to get more log output.
 BUILD FAILED
 {noformat}
 Also, according to the 
 [rpm-maven-plugin|http://mojo.codehaus.org/rpm-maven-plugin/ident-params.html]
  documentation:
 {noformat}
 version
 The version number to use for the RPM package. By default, this is the 
 project version. This value cannot contain a dash (-) due to contraints in 
 the RPM file naming convention. Any specified value will be truncated at the 
 first dash
 release
 The release number of the RPM.
 Beginning with release 2.0-beta-2, this is an optional parameter. By default, 
 the release will be generated from the modifier portion of the project 
 version using the following rules:
 If no modifier exists, the release will be 1.
 If the modifier ends with SNAPSHOT, the timestamp (in UTC) of the build will 
 be appended to end.
 All instances of '-' in the modifier will be replaced with '_'.
 If a modifier exists and does not end with SNAPSHOT, _1 will be appended to 
 end.
 {noformat}
 We should allow underscore '_' as part of the version string. e.g. 
 3.4.6_abc_1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2015-05-24 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557652#comment-14557652
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2186:


+1 for the 3.4 patch. Thanks Raul!

 QuorumCnxManager#receiveConnection may crash with random input
 --

 Key: ZOOKEEPER-2186
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2186-v3.4.patch, ZOOKEEPER-2186.patch, 
 ZOOKEEPER-2186.patch, ZOOKEEPER-2186.patch


 This will allocate an arbitrarily large byte buffer (and try to read it!):
 {code}
 public boolean receiveConnection(Socket sock) {
 Long sid = null;
 ...
 sid = din.readLong();
 // next comes the #bytes in the remainder of the message  

 int num_remaining_bytes = din.readInt();
 byte[] b = new byte[num_remaining_bytes];
 // remove the remainder of the message from din   

 int num_read = din.read(b);
 {code}
 This will crash the QuorumCnxManager thread, so the cluster will keep going 
 but future elections might fail to converge (ditto for leaving/joining 
 members). 
 Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1927) zkServer.sh fails to read dataDir (and others) from zoo.cfg on Solaris 10 (grep issue, manifests as FAILED TO WRITE PID).

2015-05-24 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557659#comment-14557659
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1927:


The patch looks good to me. Do we need this fix for trunk/3.5, or is it 
specific to 3.4?

 zkServer.sh fails to read dataDir (and others) from zoo.cfg on Solaris 10 
 (grep issue, manifests as FAILED TO WRITE PID).  
 ---

 Key: ZOOKEEPER-1927
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1927
 Project: ZooKeeper
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.4.6
 Environment: Solaris 5.10 
Reporter: Ed Schmed
Assignee: Chris Nauroth
 Fix For: 3.4.7

 Attachments: ZOOKEEPER-1927.001.patch


 Fails to write PID file with a permissions error, because the startup script 
 fails to read the dataDir variable from zoo.cfg, and then tries to use the 
 drive root ( / ) as the data dir.
 Tracked the problem down to line 84 of zkServer.sh:
 ZOO_DATADIR=$(grep ^[[:space:]]*dataDir $ZOOCFG | sed -e 's/.*=//')
 If i run just that line and point it right at the config file, ZOO_DATADIR is 
 empty.
 If I remove [[:space:]]* from the grep:
 ZOO_DATADIR=$(grep ^dataDir $ZOOCFG | sed -e 's/.*=//')
 Then it works fine. (If I also make the same change on line 164 and 169)
 My regex skills are pretty bad, so I'm afraid to comment on why [[space]]* 
 needs to be in there?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1853) zkCli.sh can't issue a CREATE command containing spaces in the data

2015-05-24 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1853:
---
Fix Version/s: 3.6.0

 zkCli.sh can't issue a CREATE command containing spaces in the data
 ---

 Key: ZOOKEEPER-1853
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1853
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.6, 3.5.0
Reporter: sekine coulibaly
Assignee: Ryan Lamore
Priority: Minor
  Labels: patch
 Fix For: 3.4.7, 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-1853.patch, ZOOKEEPER-1853.patch, 
 ZOOKEEPER-1853.patch, ZkSpaceMan.java


 Execute the following command in zkCli.sh :
 create /contacts/1  {country:CA,name:De La Salle}
 The results is that only {id:1,fullname:De is stored.
 The expected result is to have the full JSON payload stored.
 The CREATE command seems to be croped after the first space of the data 
 payload. When issuing a create command, all arguments not being -s nor -e 
 shall be treated as the actual data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1402) Upload Zookeeper package to Maven Central

2015-05-24 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557662#comment-14557662
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1402:


As Giri said, artifacts get synced automatically from apache nexus to maven 
central. 

https://repository.apache.org/content/groups/staging/org/apache/zookeeper/zookeeper/3.5.0-alpha/
http://search.maven.org/#artifactdetails|org.apache.zookeeper|zookeeper|3.5.0-alpha|pom

[~phunt] can we close this issue?

 Upload Zookeeper package to Maven Central
 -

 Key: ZOOKEEPER-1402
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1402
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.3.4
Reporter: Igor Lazebny
Assignee: Flavio Junqueira
Priority: Minor
 Fix For: 3.4.7


 It would be great to make Zookeeper package available in Maven Central as 
 other Apache projects do (Camel, CXF, ActiveMQ, Karaf, etc).
 That would simplify usage of this package in maven builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-18 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549742#comment-14549742
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2172:


[~ziyouw] would it be possible to reproduce it without zk-2031? If not, could 
you try reproducing this with debug log enabled? Thanks!

 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, 
 zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-16 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547023#comment-14547023
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2172:


node1 doesn't seem to receive the vote from itself. it receives votes from 
node2 and node3:

{noformat}
node-1.log:2015-04-20 12:55:03,358 [myid:1] - INFO  
[WorkerReceiver[myid=1]:FastLeaderElection@698] - Notification: 2 (message 
format version), 2 (n.leader), 0x10084 (n.zxid), 0x1 (n.round), LOOKING 
(n.state), 2 (n.sid), 0x1 (n.peerEPoch), LEADING (my state)1005d (n.config 
version)
node-1.log:2015-04-20 12:55:51,547 [myid:1] - INFO  
[WorkerReceiver[myid=1]:FastLeaderElection@698] - Notification: 2 (message 
format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 
3 (n.sid), 0x1 (n.peerEPoch), LEADING (my state)1005d (n.config version)
{noformat}

node2 receives votes from node1 and itself:

{noformat}
node-2.log:2015-04-20 12:55:03,361 [myid:2] - INFO  
[WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message 
format version), 1 (n.leader), 0x0 (n.zxid), 0x (n.round), 
LEADING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1005d 
(n.config version)
node-2.log:2015-04-20 12:55:54,564 [myid:2] - INFO  
[WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message 
format version), 2 (n.leader), 0x10084 (n.zxid), 0x1 (n.round), LOOKING 
(n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1005d (n.config 
version)
{noformat}

Is node3's vote somehow confusing node1?

Yes, I think this cluster is using the patch from ZOOKEEPER-2031. Do you think 
that might be related to this issue?

 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, 
 zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-16 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546965#comment-14546965
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2172:


[~ziyouw] could you also post the initial configuration files for all the 
nodes? [~shralex] could you take a look at these log files when you get a 
chance?

 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, 
 zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-16 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546972#comment-14546972
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2172:


Would it be possible that this is hitting the case described in 
http://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html#sc_reconfig_general
 :

bq. Finally, note that once connected to the leader, a joiner adopts the last 
committed configuration, in which it is absent (the initial config of the 
joiner is backed up before being rewritten). If the joiner restarts in this 
state, it will not be able to boot since it is absent from its configuration 
file. In order to start it you’ll once again have to specify an initial 
configuration.

 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, 
 zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers

2015-05-15 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2190:
---
Fix Version/s: (was: 3.5.2)
   3.5.1

 In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as 
 joining servers
 ---

 Key: ZOOKEEPER-2190
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Reporter: Hongchao Deng
Assignee: Hongchao Deng
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2190.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-15 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2101:
---
Fix Version/s: (was: 3.5.1)
   3.6.0
   3.5.2

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-05-15 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1506:
---
Description: 
   In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP 
address they map to can and does change. Our procedure for replacing/upgrading 
a ZK node is to boot an entirely new instance and remap the hostname to the new 
instance's IP address. Our expectation is that when the original ZK node is 
terminated/shutdown, the remaining nodes in the ensemble would reconnect to the 
new instance.

However, what we are noticing is that the remaining ZK nodes do not attempt to 
re-resolve the hostname-IP mapping for the new server. Once the original ZK 
node is terminated, the existing servers continue to attempt contacting it at 
the old IP address. It would be great if the ZK servers could try to re-resolve 
the hostname when attempting to connect to a lost ZK server, instead of caching 
the lookup indefinitely. Currently we must do a rolling restart of the ZK 
ensemble after swapping a node -- which at three nodes means we periodically 
lose quorum.

The exact method we are following is to boot new instances in EC2 and attach 
one, of a set of three, Elastic IP address. External to EC2 this IP address 
remains the same and maps to whatever instance it is attached to. Internal to 
EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
to the internal (10.x.y.z) address of the instance it is attached to. 
Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that 
the elastic IP hostname gets mapped to and reconnect appropriately.

  was:
In our zoo.cfg we use hostnames to identify the ZK servers that are part of an 
ensemble. These hostnames are configured with a low (= 60s) TTL and the IP 
address they map to can and does change. Our procedure for replacing/upgrading 
a ZK node is to boot an entirely new instance and remap the hostname to the new 
instance's IP address. Our expectation is that when the original ZK node is 
terminated/shutdown, the remaining nodes in the ensemble would reconnect to the 
new instance.

However, what we are noticing is that the remaining ZK nodes do not attempt to 
re-resolve the hostname-IP mapping for the new server. Once the original ZK 
node is terminated, the existing servers continue to attempt contacting it at 
the old IP address. It would be great if the ZK servers could try to re-resolve 
the hostname when attempting to connect to a lost ZK server, instead of caching 
the lookup indefinitely. Currently we must do a rolling restart of the ZK 
ensemble after swapping a node -- which at three nodes means we periodically 
lose quorum.

The exact method we are following is to boot new instances in EC2 and attach 
one, of a set of three, Elastic IP address. External to EC2 this IP address 
remains the same and maps to whatever instance it is attached to. Internal to 
EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
to the internal (10.x.y.z) address of the instance it is attached to. 
Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that 
the elastic IP hostname gets mapped to and reconnect appropriately.


 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


In our zoo.cfg we use hostnames to identify the ZK servers that are part 
 of an ensemble. These hostnames are configured with a low (= 60s) TTL and 
 the IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It 

[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-14 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543288#comment-14543288
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2183:


+1 Thanks Chris, this is a great patch!

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, 
 threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2015-05-13 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543196#comment-14543196
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2186:


+1 I'm checking this in.

 QuorumCnxManager#receiveConnection may crash with random input
 --

 Key: ZOOKEEPER-2186
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2186.patch, ZOOKEEPER-2186.patch, 
 ZOOKEEPER-2186.patch


 This will allocate an arbitrarily large byte buffer (and try to read it!):
 {code}
 public boolean receiveConnection(Socket sock) {
 Long sid = null;
 ...
 sid = din.readLong();
 // next comes the #bytes in the remainder of the message  

 int num_remaining_bytes = din.readInt();
 byte[] b = new byte[num_remaining_bytes];
 // remove the remainder of the message from din   

 int num_read = din.read(b);
 {code}
 This will crash the QuorumCnxManager thread, so the cluster will keep going 
 but future elections might fail to converge (ditto for leaving/joining 
 members). 
 Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2015-05-13 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543203#comment-14543203
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2186:


[~rgs] this patch does not apply to branch-3.4. Could you upload a separate 
patch for 3.4? Thanks!

trunk: http://svn.apache.org/viewvc?view=revisionrevision=1679313
branch-3.5: http://svn.apache.org/viewvc?view=revisionrevision=1679314

 QuorumCnxManager#receiveConnection may crash with random input
 --

 Key: ZOOKEEPER-2186
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2186.patch, ZOOKEEPER-2186.patch, 
 ZOOKEEPER-2186.patch


 This will allocate an arbitrarily large byte buffer (and try to read it!):
 {code}
 public boolean receiveConnection(Socket sock) {
 Long sid = null;
 ...
 sid = din.readLong();
 // next comes the #bytes in the remainder of the message  

 int num_remaining_bytes = din.readInt();
 byte[] b = new byte[num_remaining_bytes];
 // remove the remainder of the message from din   

 int num_read = din.read(b);
 {code}
 This will crash the QuorumCnxManager thread, so the cluster will keep going 
 but future elections might fail to converge (ditto for leaving/joining 
 members). 
 Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2015-05-09 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537018#comment-14537018
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2186:


https://reviews.apache.org/r/34023

 QuorumCnxManager#receiveConnection may crash with random input
 --

 Key: ZOOKEEPER-2186
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2186.patch


 This will allocate an arbitrarily large byte buffer (and try to read it!):
 {code}
 public boolean receiveConnection(Socket sock) {
 Long sid = null;
 ...
 sid = din.readLong();
 // next comes the #bytes in the remainder of the message  

 int num_remaining_bytes = din.readInt();
 byte[] b = new byte[num_remaining_bytes];
 // remove the remainder of the message from din   

 int num_read = din.read(b);
 {code}
 This will crash the QuorumCnxManager thread, so the cluster will keep going 
 but future elections might fail to converge (ditto for leaving/joining 
 members). 
 Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2171) avoid reverse lookups in QuorumCnxManager

2015-05-09 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536930#comment-14536930
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2171:


+1 Thanks Raul!

 avoid reverse lookups in QuorumCnxManager
 -

 Key: ZOOKEEPER-2171
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2171
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2171.patch, ZOOKEEPER-2171.patch


 Apparently, ZOOKEEPER-107 (via a quick git-blame look) introduced a bunch of 
 getHostName() calls in QCM. Besides the overhead, these can cause problems 
 when mixed with failing/mis-configured DNS servers.
 It would be nice to reduce them, if that doesn't affect operational 
 correctness. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2015-05-09 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536947#comment-14536947
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2186:


do we need this for 3.5.1?

 QuorumCnxManager#receiveConnection may crash with random input
 --

 Key: ZOOKEEPER-2186
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2186.patch


 This will allocate an arbitrarily large byte buffer (and try to read it!):
 {code}
 public boolean receiveConnection(Socket sock) {
 Long sid = null;
 ...
 sid = din.readLong();
 // next comes the #bytes in the remainder of the message  

 int num_remaining_bytes = din.readInt();
 byte[] b = new byte[num_remaining_bytes];
 // remove the remainder of the message from din   

 int num_read = din.read(b);
 {code}
 This will crash the QuorumCnxManager thread, so the cluster will keep going 
 but future elections might fail to converge (ditto for leaving/joining 
 members). 
 Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2015-05-09 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537035#comment-14537035
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2186:


the patch looks good to me overall. i just a couple of questions. also, it 
would be great if you can add a test case for this change.

 QuorumCnxManager#receiveConnection may crash with random input
 --

 Key: ZOOKEEPER-2186
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2186.patch


 This will allocate an arbitrarily large byte buffer (and try to read it!):
 {code}
 public boolean receiveConnection(Socket sock) {
 Long sid = null;
 ...
 sid = din.readLong();
 // next comes the #bytes in the remainder of the message  

 int num_remaining_bytes = din.readInt();
 byte[] b = new byte[num_remaining_bytes];
 // remove the remainder of the message from din   

 int num_read = din.read(b);
 {code}
 This will crash the QuorumCnxManager thread, so the cluster will keep going 
 but future elections might fail to converge (ditto for leaving/joining 
 members). 
 Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2177) point to md5/sha1/asc files in releases.html

2015-04-30 Thread Michi Mutsuzaki (JIRA)
Michi Mutsuzaki created ZOOKEEPER-2177:
--

 Summary: point to md5/sha1/asc files in releases.html
 Key: ZOOKEEPER-2177
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2177
 Project: ZooKeeper
  Issue Type: Task
Reporter: Michi Mutsuzaki
Priority: Minor


these files are not mirrored. we should link to these files in 
http://zookeeper.apache.org/releases.html 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2171) avoid reverse lookups in QuorumCnxManager

2015-04-29 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518753#comment-14518753
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2171:


Thanks Raul for quick turnaround! We dropped support for java6 in 
ZOOKEEPER-1963. I prefer to use gethoststring consistently throughout the code 
base to avoid future confusions/issues. What do you think?

 avoid reverse lookups in QuorumCnxManager
 -

 Key: ZOOKEEPER-2171
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2171
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2171.patch


 Apparently, ZOOKEEPER-107 (via a quick git-blame look) introduced a bunch of 
 getHostName() calls in QCM. Besides the overhead, these can cause problems 
 when mixed with failing/mis-configured DNS servers.
 It would be nice to reduce them, if that doesn't affect operational 
 correctness. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503871#comment-14503871
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


Thanks Raul for testing this. I'd try replacing calls to getHostName to 
getHostString. For example, I found another one in QuorumCnxManager.java:

org/apache/zookeeper/server/quorum/QuorumCnxManager.java:String 
addr = self.getElectionAddress().getHostName() + : + 
self.getElectionAddress().getPort();


 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2171) avoid reverse lookups in QuorumCnxManager

2015-04-20 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504028#comment-14504028
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2171:


Thanks Raul. We should replace getHostName() with getHostString(), and also 
remove src/java/main/org/apache/zookeeper/common/HostNameUtils.java. I don't 
think the code relies on getHostName() performing a reverse dns lookup, so 
replacing it with getHostString() shouldn't cause any correctness issues.

 avoid reverse lookups in QuorumCnxManager
 -

 Key: ZOOKEEPER-2171
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2171
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.1, 3.6.0


 Apparently, ZOOKEEPER-107 (via a quick git-blame look) introduced a bunch of 
 getHostName() calls in QCM. Besides the overhead, these can cause problems 
 when mixed with failing/mis-configured DNS servers.
 It would be nice to reduce them, if that doesn't affect operational 
 correctness. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503671#comment-14503671
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


The patch uses HostNameUtils.getHostString(), which supposedly avoid reverse 
lookup. Maybe there is a bug in HostNameUtils.getHostString()?

We can replace HostNameUtils with InetSocketAddress.getHostString since we now 
require java7.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1506:
---
Attachment: ZOOKEEPER-1506-fix.patch

Raul, could you try ZOOKEEPER-1506-fix.patch and see if it fixes the problem? 
Thanks!

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2171) avoid reverse lookups in QuorumCnxManager

2015-04-20 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2171:
---
Fix Version/s: 3.6.0
   3.5.1

 avoid reverse lookups in QuorumCnxManager
 -

 Key: ZOOKEEPER-2171
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2171
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.1, 3.6.0


 Apparently, ZOOKEEPER-107 (via a quick git-blame look) introduced a bunch of 
 getHostName() calls in QCM. Besides the overhead, these can cause problems 
 when mixed with failing/mis-configured DNS servers.
 It would be nice to reduce them, if that doesn't affect operational 
 correctness. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2163) Introduce new ZNode type: container

2015-04-15 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2163:
---
Assignee: Jordan Zimmerman

 Introduce new ZNode type: container
 ---

 Key: ZOOKEEPER-2163
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
 Project: ZooKeeper
  Issue Type: New Feature
  Components: c client, java client, server
Affects Versions: 3.5.0
Reporter: Jordan Zimmerman
Assignee: Jordan Zimmerman

 BACKGROUND
 
 A recurring problem for ZooKeeper users is garbage collection of parent 
 nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a 
 parent node under which participants create sequential nodes. When the 
 participant is done, it deletes its node. In practice, the ZooKeeper tree 
 begins to fill up with orphaned parent nodes that are no longer needed. The 
 ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can 
 become unstable due to the number of these nodes.
 CURRENT SOLUTIONS
 ===
 Apache Curator has a workaround solution for this by providing the Reaper 
 class which runs in the background looking for orphaned parent nodes and 
 deleting them. This isn’t ideal and it would be better if ZooKeeper supported 
 this directly.
 PROPOSAL
 =
 ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes 
 to contain child nodes. This is not optimum as EPHEMERALs are tied to a 
 session and the general use case of parent nodes is for PERSISTENT nodes. 
 This proposal adds a new node type, CONTAINER. A CONTAINER node is the same 
 as a PERSISTENT node with the additional property that when its last child is 
 deleted, it is deleted (and CONTAINER nodes recursively up the tree are 
 deleted if empty).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2164) fast leader election keeps failing

2015-04-15 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2164:
---
Assignee: Hongchao Deng

 fast leader election keeps failing
 --

 Key: ZOOKEEPER-2164
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.5
Reporter: Michi Mutsuzaki
Assignee: Hongchao Deng

 I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. 
 When I shut down 2, 1 and 3 keep going back to leader election. Here is what 
 seems to be happening.
 - Both 1 and 3 elect 3 as the leader.
 - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a 
 follower.
 - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't 
 timeout for 5 seconds: 
 https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346
 - By the time 3 receives votes, 1 has given up trying to connect to 3: 
 https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247
 I'm using 3.4.5, but it looks like this part of the code hasn't changed for a 
 while, so I'm guessing later versions have the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2163) Introduce new ZNode type: container

2015-04-15 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495725#comment-14495725
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2163:


[~randgalt] could you open a sub-task for the C APIs? I think I can find some 
time to pick it up.

 Introduce new ZNode type: container
 ---

 Key: ZOOKEEPER-2163
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
 Project: ZooKeeper
  Issue Type: New Feature
  Components: c client, java client, server
Affects Versions: 3.5.0
Reporter: Jordan Zimmerman
Assignee: Jordan Zimmerman

 BACKGROUND
 
 A recurring problem for ZooKeeper users is garbage collection of parent 
 nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a 
 parent node under which participants create sequential nodes. When the 
 participant is done, it deletes its node. In practice, the ZooKeeper tree 
 begins to fill up with orphaned parent nodes that are no longer needed. The 
 ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can 
 become unstable due to the number of these nodes.
 CURRENT SOLUTIONS
 ===
 Apache Curator has a workaround solution for this by providing the Reaper 
 class which runs in the background looking for orphaned parent nodes and 
 deleting them. This isn’t ideal and it would be better if ZooKeeper supported 
 this directly.
 PROPOSAL
 =
 ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes 
 to contain child nodes. This is not optimum as EPHEMERALs are tied to a 
 session and the general use case of parent nodes is for PERSISTENT nodes. 
 This proposal adds a new node type, CONTAINER. A CONTAINER node is the same 
 as a PERSISTENT node with the additional property that when its last child is 
 deleted, it is deleted (and CONTAINER nodes recursively up the tree are 
 deleted if empty).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1853) zkCli.sh can't issue a CREATE command containing spaces in the data

2015-04-15 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1853:
---
Assignee: Ryan Lamore

 zkCli.sh can't issue a CREATE command containing spaces in the data
 ---

 Key: ZOOKEEPER-1853
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1853
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.6, 3.5.0
Reporter: sekine coulibaly
Assignee: Ryan Lamore
Priority: Minor
  Labels: patch
 Fix For: 3.4.7, 3.5.2

 Attachments: ZOOKEEPER-1853.patch, ZOOKEEPER-1853.patch, 
 ZOOKEEPER-1853.patch, ZkSpaceMan.java


 Execute the following command in zkCli.sh :
 create /contacts/1  {country:CA,name:De La Salle}
 The results is that only {id:1,fullname:De is stored.
 The expected result is to have the full JSON payload stored.
 The CREATE command seems to be croped after the first space of the data 
 payload. When issuing a create command, all arguments not being -s nor -e 
 shall be treated as the actual data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1853) zkCli.sh can't issue a CREATE command containing spaces in the data

2015-04-15 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1853:
---
Fix Version/s: 3.5.2
   3.4.7

 zkCli.sh can't issue a CREATE command containing spaces in the data
 ---

 Key: ZOOKEEPER-1853
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1853
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.6, 3.5.0
Reporter: sekine coulibaly
Priority: Minor
  Labels: patch
 Fix For: 3.4.7, 3.5.2

 Attachments: ZOOKEEPER-1853.patch, ZOOKEEPER-1853.patch, 
 ZOOKEEPER-1853.patch, ZkSpaceMan.java


 Execute the following command in zkCli.sh :
 create /contacts/1  {country:CA,name:De La Salle}
 The results is that only {id:1,fullname:De is stored.
 The expected result is to have the full JSON payload stored.
 The CREATE command seems to be croped after the first space of the data 
 payload. When issuing a create command, all arguments not being -s nor -e 
 shall be treated as the actual data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2164) fast leader election keeps failing

2015-04-15 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495709#comment-14495709
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2164:


Sure sounds good. Thank you for driving this Hongchao!

 fast leader election keeps failing
 --

 Key: ZOOKEEPER-2164
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.5
Reporter: Michi Mutsuzaki

 I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. 
 When I shut down 2, 1 and 3 keep going back to leader election. Here is what 
 seems to be happening.
 - Both 1 and 3 elect 3 as the leader.
 - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a 
 follower.
 - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't 
 timeout for 5 seconds: 
 https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346
 - By the time 3 receives votes, 1 has given up trying to connect to 3: 
 https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247
 I'm using 3.4.5, but it looks like this part of the code hasn't changed for a 
 while, so I'm guessing later versions have the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2165) OSGi requires package server.quorom.flexible be exported

2015-04-15 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495717#comment-14495717
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2165:


Thank you for the report [~skitching]. Would you like to submit a patch? Also, 
is there a way to verify that we are OSGi compliant as a part of the build? I 
remember having similar issues before.

 OSGi requires package server.quorom.flexible be exported
 --

 Key: ZOOKEEPER-2165
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2165
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Reporter: Simon Kitching
Priority: Minor

 Class QuoromPeer has a constructor which takes a QuorumVerifier value as a 
 parameter. This class is defined in package 
 org.apache.zookeeper.server.quorum.flexible but that package is not 
 exported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2167) Restarting current leader node sometimes results in a permanent loss of quorum

2015-04-15 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495779#comment-14495779
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2167:


Hi Mike, I took a quick look at the log file. What time did node 2 get removed 
from the cluster? I see that node 3 and 5 keep trying to connect to node 2:

{noformat}
Apr 14 00:09:49.579516 10.20.0.20 warning zookeeper: Cannot open channel to 2 
at election address /10.20.0.18:3888 java.net.SocketTimeoutException: connect 
timed out
Apr 14 00:09:49.644749 10.20.0.22 warning zookeeper: Cannot open channel to 2 
at election address /10.20.0.18:3888 java.net.SocketTimeoutException: connect 
timed out
{noformat}

Would it be possible that the configuration didn't get updated on these 2 
nodes? You might be hitting the same issue as ZOOKEEPER-2164.

 Restarting current leader node sometimes results in a permanent loss of quorum
 --

 Key: ZOOKEEPER-2167
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2167
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.6
Reporter: Mike Lundy
 Attachments: fails-to-rejoin-quorum.gz


 I'm seeing an issue where a restart of the current leader node results in a 
 long-term / permanent loss of quorum (I've only waited 30 minutes, but it 
 doesn't look like it's making any progress). Restarting the same instance 
 _again_ seems to resolve the problem.
 To me, this looks a lot like the issue described in 
 https://issues.apache.org/jira/browse/ZOOKEEPER-1026, but I'm filing this 
 separately for the moment in case I am wrong.
 Notes on the attached log:
 1) If you search for XXX in the log, you'll see where I've annotated it to 
 include where the process was told to terminate, when it is reported to have 
 completed that, and then the same for the start
 2) To save you the trouble of figuring it out, here's the zkid = ip mapping:
 zid=1, ip=10.20.0.19
 zid=2, ip=10.20.0.18
 zid=3, ip=10.20.0.20
 zid=4, ip=10.20.0.21
 zid=5, ip=10.20.0.22
 3) It's important to note that this is log is during the process of a rolling 
 service restart to remove an instance; in this case, zid #2 / 10.20.0.18 is 
 the one being removed, so if you see a conspicuous silence from that service, 
 that's why. 
 4) I've been unable to reproduce this problem _except_ during cluster size 
 changes, so I suspect that may be related; it's also important to note that 
 this test is going from 5 - 4 (which means, since we remove one and then do 
 a rolling restart, we are actually temporarily dropping to 3). I know this is 
 not a recommended thing (this is more of a stress test). We have seen this 
 same problem on larger cluster sizes, it just seems easier to reproduce it on 
 smaller sizes.
 5) The log starts roughly at the point 10.20.0.21 / zid=4 wins the election 
 during the final quorum; zid=4 is the one whose shutdown triggers the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2164) fast leader election keeps failing

2015-04-14 Thread Michi Mutsuzaki (JIRA)
Michi Mutsuzaki created ZOOKEEPER-2164:
--

 Summary: fast leader election keeps failing
 Key: ZOOKEEPER-2164
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.5
Reporter: Michi Mutsuzaki


I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. When 
I shut down 2, 1 and 3 keep going back to leader election. Here is what seems 
to be happening.

- Both 1 and 3 elect 3 as the leader.
- 1 receives votes from 3 and itself, and starts trying to connect to 3 as a 
follower.
- 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't 
timeout for 5 seconds: 
https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346
- By the time 3 receives votes, 1 has given up trying to connect to 3: 
https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247

I'm using 3.4.5, but it looks like this part of the code hasn't changed for a 
while, so I'm guessing later versions have the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-1616) time calculations should use a monotonic clock

2015-04-11 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki resolved ZOOKEEPER-1616.

Resolution: Duplicate

This is fixed in ZOOKEEPER-1366.

 time calculations should use a monotonic clock
 --

 Key: ZOOKEEPER-1616
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1616
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Todd Lipcon

 We recently had an issue with ZooKeeper sessions acting strangely due to a 
 bad NTP setup on a set of hosts. Looking at the code, ZK seems to use 
 System.currentTimeMillis to measure durations or intervals in many places. 
 This is bad since that time can move backwards or skip ahead by several 
 minutes. Instead, it should use System.nanoTime (or a wrapper such as Guava's 
 Stopwatch class)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1626) Zookeeper C client should be tolerant of clock adjustments

2015-04-10 Thread Michi Mutsuzaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1626:
---
Attachment: ZOOKEEPER-1626.patch

rebased ZOOKEEPER-1366.007.patch

 Zookeeper C client should be tolerant of clock adjustments 
 ---

 Key: ZOOKEEPER-1626
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1626
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: c client
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1366.001.patch, ZOOKEEPER-1366.002.patch, 
 ZOOKEEPER-1366.003.patch, ZOOKEEPER-1366.004.patch, ZOOKEEPER-1366.006.patch, 
 ZOOKEEPER-1366.007.patch, ZOOKEEPER-1626.patch


 The Zookeeper C client should use monotonic time when available, in order to 
 be more tolerant of time adjustments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >