[jira] [Created] (ZOOKEEPER-3607) Potential data inconsistency due to the inconsistency between ZKDatabase.committedLog and dataTree in Trunc sync.

2019-11-05 Thread Jiafu Jiang (Jira)
Jiafu Jiang created ZOOKEEPER-3607:
--

 Summary: Potential data inconsistency due to the inconsistency 
between ZKDatabase.committedLog and dataTree in Trunc sync.
 Key: ZOOKEEPER-3607
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3607
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.14
Reporter: Jiafu Jiang


I will describe the problem by a detail example.


1. Suppose we have three zk servers: zk1, zk2, and zk3. zk1 and zk2 are online, 
zk3 is offline, zk1 is the leader.


2. In TRUNC sync, zk1 sends a TRUNC request to zk2, then sends the remaining 
proposals in the committedLog. *When the follower zk2 receives the proposals, 
it applies them directly into the datatree, but not the committedLog.* 


3. After the data sync phase, zk1 may continue to send zk2 more committed 
proposals, and they will be applied to both the datatree and the committedLog 
of zk2.

 

4. Then zk1 fails, zk3 restarts successfully, zk2 becomes the leader.

 

5. The leader zk2 sends a TRUNC request to zk3, then the remaining proposals 
from the committedLog. But since some proposals, which are from the leader zk1 
in TRUNC sync(as I describe above), are not in the committedLog, they will not 
be sent to zk3.

 

6. Now data inconsistency happens between zk2 and zk3, since some data may 
exist in zk2's datatree, but not zk3's datatree.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3393) Read-only file system may make the whole ZooKeeper cluster to be unavailable.

2019-05-13 Thread Jiafu Jiang (JIRA)
Jiafu Jiang created ZOOKEEPER-3393:
--

 Summary: Read-only file system may make the whole ZooKeeper 
cluster to be unavailable.
 Key: ZOOKEEPER-3393
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3393
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.4.14, 3.4.12
Reporter: Jiafu Jiang


Say we have 3 nodes: zk1, zk2, and zk3, zk3 is the leader.

If the file system of the ZooKeeper data directory of the leader is read-only 
due to some hardware error, the leader will exit and begin a new election.

But the election will keep looping because the new leader may be zk3 again, but 
zk3 will fail to write epoch to disk due to read-only file system.

 

Since we have 3 nodes, if only one of them is in problem, should the ZooKeeper 
cluster be available? If the answer is yes, then we ought to fix this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3266) ZooKeeper Java client blocks for a very long time.

2019-02-02 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3266:
---
Description: 
I found that ZooKeeper java client blocked, and the related call stack was 
shown below:

"Election thread-20" #20 prio=5 os_prio=0 tid=0x7f7deeadfd80 nid=0x5ec3 in 
Object.wait() [0x7f7ddd5d8000]
 java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1411)
 - locked <0xe04b63b0> (a org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1177)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1210)
 at 
com.sugon.parastor.zookeeper.ZooKeeperClient.exists(ZooKeeperClient.java:643)
 

 

And I also found that the block process did not have the SendThread thread. It 
seems like a normal process with ZooKeeper java client should have a 
SendThread, like below:

"Thread-0-SendThread(ofs_zk1:2181)" #23 daemon prio=5 os_prio=0 
tid=0x7f8c540379c0 nid=0x739 runnable [0x7f8c5ad71000]
 java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
 - locked <0xe00287a8> (a sun.nio.ch.Util$3)
 - locked <0xe0028798> (a java.util.Collections$UnmodifiableSet)
 - locked <0xe0028750> (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
 at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)

 

So, will the missing of the SendThread thread cause the blocking of exist 
method?? I'm not sure.

  was:
I found that ZooKeeper java client blocked, and the related call stack was 
showing below:

"Election thread-20" #20 prio=5 os_prio=0 tid=0x7f7deeadfd80 nid=0x5ec3 in 
Object.wait() [0x7f7ddd5d8000]
 java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1411)
 - locked <0xe04b63b0> (a org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1177)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1210)
 at 
com.sugon.parastor.zookeeper.ZooKeeperClient.exists(ZooKeeperClient.java:643)
 

 

And I also found that the block process did not have the SendThread. It seems 
like a normal process that have ZooKeeper java client should have a SendThread, 
like below:

"Thread-0-SendThread(ofs_zk1:2181)" #23 daemon prio=5 os_prio=0 
tid=0x7f8c540379c0 nid=0x739 runnable [0x7f8c5ad71000]
 java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
 - locked <0xe00287a8> (a sun.nio.ch.Util$3)
 - locked <0xe0028798> (a java.util.Collections$UnmodifiableSet)
 - locked <0xe0028750> (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
 at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)

 

So, will the missing of the SendThread cause the blocking of exist method?? I'm 
not sure.


> ZooKeeper Java client blocks for a very long time.
> --
>
> Key: ZOOKEEPER-3266
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3266
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.13
>Reporter: Jiafu Jiang
>Priority: Major
>
> I found that ZooKeeper java client blocked, and the related call stack was 
> shown below:
> "Election thread-20" #20 prio=5 os_prio=0 tid=0x7f7deeadfd80 nid=0x5ec3 
> in Object.wait() [0x7f7ddd5d8000]
>  java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1411)
>  - locked <0xe04b63b0> (a org.apache.zookeeper.ClientCnxn$Packet)
>  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1177)
>  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1210)
>  at 
> com.sugon.parastor.zookeeper.ZooKeeperClient.exists(ZooKeeperClient.java:643)
>  

[jira] [Updated] (ZOOKEEPER-3266) ZooKeeper Java client blocks for a very long time.

2019-01-31 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3266:
---
Description: 
I found that ZooKeeper java client blocked, and the related call stack was 
showing below:

"Election thread-20" #20 prio=5 os_prio=0 tid=0x7f7deeadfd80 nid=0x5ec3 in 
Object.wait() [0x7f7ddd5d8000]
 java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1411)
 - locked <0xe04b63b0> (a org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1177)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1210)
 at 
com.sugon.parastor.zookeeper.ZooKeeperClient.exists(ZooKeeperClient.java:643)
 

 

And I also found that the block process did not have the SendThread. It seems 
like a normal process that have ZooKeeper java client should have a SendThread, 
like below:

"Thread-0-SendThread(ofs_zk1:2181)" #23 daemon prio=5 os_prio=0 
tid=0x7f8c540379c0 nid=0x739 runnable [0x7f8c5ad71000]
 java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
 - locked <0xe00287a8> (a sun.nio.ch.Util$3)
 - locked <0xe0028798> (a java.util.Collections$UnmodifiableSet)
 - locked <0xe0028750> (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
 at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)

 

So, will the missing of the SendThread cause the blocking of exist method?? I'm 
not sure.

  was:
I found that ZooKeeper java client blocked, and the related call stack was 
showing below:

"Election thread-20" #20 prio=5 os_prio=0 tid=0x7f7deeadfd80 nid=0x5ec3 in 
Object.wait() [0x7f7ddd5d8000]
 java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1411)
 - locked <0xe04b63b0> (a org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1177)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1210)
 at 
com.sugon.parastor.zookeeper.ZooKeeperClient.exists(ZooKeeperClient.java:643)


 

And I also found that the block process did not have the SendThread. It seems 
like a normal process that have ZooKeeper java client should have a SendThread, 
like below:


"Thread-0-SendThread(ofs_zk1:2181)" #23 daemon prio=5 os_prio=0 
tid=0x7f8c540379c0 nid=0x739 runnable [0x7f8c5ad71000]
 java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
 - locked <0xe00287a8> (a sun.nio.ch.Util$3)
 - locked <0xe0028798> (a java.util.Collections$UnmodifiableSet)
 - locked <0xe0028750> (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
 at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)

 

So, will the missing of SendThread cause the blocking of exist method?? I'm not 
sure.


> ZooKeeper Java client blocks for a very long time.
> --
>
> Key: ZOOKEEPER-3266
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3266
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.13
>Reporter: Jiafu Jiang
>Priority: Major
>
> I found that ZooKeeper java client blocked, and the related call stack was 
> showing below:
> "Election thread-20" #20 prio=5 os_prio=0 tid=0x7f7deeadfd80 nid=0x5ec3 
> in Object.wait() [0x7f7ddd5d8000]
>  java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1411)
>  - locked <0xe04b63b0> (a org.apache.zookeeper.ClientCnxn$Packet)
>  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1177)
>  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1210)
>  at 
> com.sugon.parastor.zookeeper.ZooKeeperClient.exists(ZooKeeperClient.java:643)
>  

[jira] [Created] (ZOOKEEPER-3266) ZooKeeper Java client blocks for a very long time.

2019-01-31 Thread Jiafu Jiang (JIRA)
Jiafu Jiang created ZOOKEEPER-3266:
--

 Summary: ZooKeeper Java client blocks for a very long time.
 Key: ZOOKEEPER-3266
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3266
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.13
Reporter: Jiafu Jiang


I found that ZooKeeper java client blocked, and the related call stack was 
showing below:

"Election thread-20" #20 prio=5 os_prio=0 tid=0x7f7deeadfd80 nid=0x5ec3 in 
Object.wait() [0x7f7ddd5d8000]
 java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1411)
 - locked <0xe04b63b0> (a org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1177)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1210)
 at 
com.sugon.parastor.zookeeper.ZooKeeperClient.exists(ZooKeeperClient.java:643)


 

And I also found that the block process did not have the SendThread. It seems 
like a normal process that have ZooKeeper java client should have a SendThread, 
like below:


"Thread-0-SendThread(ofs_zk1:2181)" #23 daemon prio=5 os_prio=0 
tid=0x7f8c540379c0 nid=0x739 runnable [0x7f8c5ad71000]
 java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
 at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
 - locked <0xe00287a8> (a sun.nio.ch.Util$3)
 - locked <0xe0028798> (a java.util.Collections$UnmodifiableSet)
 - locked <0xe0028750> (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
 at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)

 

So, will the missing of SendThread cause the blocking of exist method?? I'm not 
sure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3231) Purge task may lost data when we have many invalid snapshots.

2018-12-28 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3231:
---
Description: 
I read the ZooKeeper source code, and I find the purge task use 
FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does not 
check whether the snapshots are valid.

Consider a worse case, a ZooKeeper server may have many invalid snapshots, and 
when a purge task begins, it will use the zxid in the last snapshot's name to 
purge old snapshots and transaction logs, then we may lost data. 

I think we should use FileSnap#findNValidSnapshots(int) instead of 
FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots, but I am 
not sure.

 

  was:
I read the ZooKeeper source code, and I find the purge task use 
FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does not 
check whether the snapshots are valid.

Consider a worse case, a ZooKeeper server may have many invalid snapshots, and 
when a purge task begins, it will use the zxid in the last snapshot's name to 
purge old snapshots and transaction logs, then we may lost data. 

I think we should use FileSnap#findNValidSnapshots(int) instead of 
FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots. I am not 
sure.

 


>  Purge task may lost data when we have many invalid snapshots.
> --
>
> Key: ZOOKEEPER-3231
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3231
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Major
>
> I read the ZooKeeper source code, and I find the purge task use 
> FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does 
> not check whether the snapshots are valid.
> Consider a worse case, a ZooKeeper server may have many invalid snapshots, 
> and when a purge task begins, it will use the zxid in the last snapshot's 
> name to purge old snapshots and transaction logs, then we may lost data. 
> I think we should use FileSnap#findNValidSnapshots(int) instead of 
> FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots, but I 
> am not sure.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3231) Purge task may lost data when we have many invalid snapshots.

2018-12-28 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3231:
---
Summary:  Purge task may lost data when we have many invalid snapshots.  
(was:  Purge task may lost data when we have many invalid snapshot files.)

>  Purge task may lost data when we have many invalid snapshots.
> --
>
> Key: ZOOKEEPER-3231
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3231
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Major
>
> I read the ZooKeeper source code, and I find the purge task use 
> FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does 
> not check whether the snapshots are valid.
> Consider a worse case, a ZooKeeper server may have many invalid snapshots, 
> and when a purge task begins, it will use the zxid in the last snapshot's 
> name to purge old snapshots and transaction logs, then we may lost data. 
> I think we should use FileSnap#findNValidSnapshots(int) instead of 
> FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots. I am 
> not sure.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3231) Purge task may lost data when we have many invalid snapshot files.

2018-12-28 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3231:
---
Description: 
I read the ZooKeeper source code, and I find the purge task use 
FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does not 
check whether the snapshots are valid.

Consider a worse case, a ZooKeeper server may have many invalid snapshots, and 
when a purge task begins, it will use the zxid in the last snapshot file name 
to purge old snapshots or transaction logs, then we may lost data. 

I think we should use FileSnap#findNValidSnapshots(int) instead of 
FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots. I am not 
sure.

 

  was:
I read the ZooKeeper source code, and I find the purge task use 
FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does not 
check whether the snapshots are valid.

Consider a worse case, a ZooKeeper server may have many invalid snapshots, and 
when a purge task begins, is will use the zxid in the last snapshot file name 
to purge old snapshots or transaction logs, then we may lost data. 

I think we should use FileSnap#findNValidSnapshots(int) instead of 
FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots. I am not 
sure.

 


>  Purge task may lost data when we have many invalid snapshot files.
> ---
>
> Key: ZOOKEEPER-3231
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3231
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Major
>
> I read the ZooKeeper source code, and I find the purge task use 
> FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does 
> not check whether the snapshots are valid.
> Consider a worse case, a ZooKeeper server may have many invalid snapshots, 
> and when a purge task begins, it will use the zxid in the last snapshot file 
> name to purge old snapshots or transaction logs, then we may lost data. 
> I think we should use FileSnap#findNValidSnapshots(int) instead of 
> FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots. I am 
> not sure.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3231) Purge task may lost data when we have many invalid snapshot files.

2018-12-28 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3231:
---
Description: 
I read the ZooKeeper source code, and I find the purge task use 
FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does not 
check whether the snapshots are valid.

Consider a worse case, a ZooKeeper server may have many invalid snapshots, and 
when a purge task begins, it will use the zxid in the last snapshot's name to 
purge old snapshots and transaction logs, then we may lost data. 

I think we should use FileSnap#findNValidSnapshots(int) instead of 
FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots. I am not 
sure.

 

  was:
I read the ZooKeeper source code, and I find the purge task use 
FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does not 
check whether the snapshots are valid.

Consider a worse case, a ZooKeeper server may have many invalid snapshots, and 
when a purge task begins, it will use the zxid in the last snapshot file name 
to purge old snapshots or transaction logs, then we may lost data. 

I think we should use FileSnap#findNValidSnapshots(int) instead of 
FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots. I am not 
sure.

 


>  Purge task may lost data when we have many invalid snapshot files.
> ---
>
> Key: ZOOKEEPER-3231
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3231
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Major
>
> I read the ZooKeeper source code, and I find the purge task use 
> FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does 
> not check whether the snapshots are valid.
> Consider a worse case, a ZooKeeper server may have many invalid snapshots, 
> and when a purge task begins, it will use the zxid in the last snapshot's 
> name to purge old snapshots and transaction logs, then we may lost data. 
> I think we should use FileSnap#findNValidSnapshots(int) instead of 
> FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots. I am 
> not sure.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3231) Purge task may lost data when we have many invalid snapshot files.

2018-12-28 Thread Jiafu Jiang (JIRA)
Jiafu Jiang created ZOOKEEPER-3231:
--

 Summary:  Purge task may lost data when we have many invalid 
snapshot files.
 Key: ZOOKEEPER-3231
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3231
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.13, 3.5.4
Reporter: Jiafu Jiang


I read the ZooKeeper source code, and I find the purge task use 
FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does not 
check whether the snapshots are valid.

Consider a worse case, a ZooKeeper server may have many invalid snapshots, and 
when a purge task begins, is will use the zxid in the last snapshot file name 
to purge old snapshots or transaction logs, then we may lost data. 

I think we should use FileSnap#findNValidSnapshots(int) instead of 
FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots. I am not 
sure.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729312#comment-16729312
 ] 

Jiafu Jiang commented on ZOOKEEPER-3220:


[~nixon] Thanks very much!

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729308#comment-16729308
 ] 

Jiafu Jiang commented on ZOOKEEPER-3220:


[~maoling]

 

why this situation happend? The disk is full? 

No, but the machine restarted.

Do you see some logs about *FileTxnSnapLog#save* at that time?

No any error log, if fact, during the machine reboot, some log of the follower 
was missing. But from the log of the leader, the follower had received a 
snapshot and began to received other transaction logs, so the  
*FileTxnSnapLog#save of follower must have succeed, but the data is not in 
disk!*

 

*2.Even if this situation that the size of snapshot is 0 could not cause data 
inconsistency.*

Yes, I know. Zookeeper recover it's data from both logs and snapshot.

If a ZooKeeper follower believes a snapshot is saved, it believes that the data 
in the snapshot is all in the disk(but in fact it may be not), it will begin to 
receive logs that come after the snapshot. If the snapshot is invalid, 
ZooKeeper server will recover data from logs only, but some data is missing, 
because the data is only saved in the snapshot.

 

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-24 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728572#comment-16728572
 ] 

Jiafu Jiang commented on ZOOKEEPER-3220:


In my environment, the save method returned successfully, that means no 
exception had been thrown. But, the data was not in disk! That's the problem I 
want to report!

 

And yes, the snapshot with size 0 was invalid, and was skip when ZooKeeper 
server restarted again.

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-18 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3220:
---
Description: 
We known that ZooKeeper server will call fsync to make sure that log data has 
been successfully saved to disk. But ZooKeeper server does not call fsync to 
make sure that a snapshot has been successfully saved, which may cause 
potential problems. Since a close to a file description does not make sure that 
data is written to disk, see 
[http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.

 

If the snapshot is not successfully  saved to disk, it may lead to data 
inconsistency. Here is my example, which is also a real problem I have ever met.

1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the leader.

2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.

3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
saved to log files of both zk2(leader) and zk3(follower).

4. After zk1 restarted successfully, it found itself to be a follower, and it 
began to synchronize data with the leader. The leader sent a snapshot(records 
from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
method returned, the snapshot data was not saved to disk yet. In fact the 
snapshot file was created, but the size was 0.

5. zk1 finished the synchronization and began to accept new requests from the 
leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  saved 
to log file. With fsync zk1 could make sure log data was not lost.

6. zk1 restarted again. Since the snapshot's size was 0, it would not be used, 
therefore zk1 recovered using the log files. But the records from log(X+1) ~ 
logY were lost ! 

 

Sorry for my poor English.

 

  was:
We known that ZooKeeper server will call fsync to make sure that log data has 
been successfully saved to disk. But ZooKeeper server does not call fsync to 
make sure that a snapshot has been successfully saved, which may cause 
potential problems. Since a close to a file description does not make sure that 
data is written to disk, see 
[http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.

 

If the snapshot is not successfully  saved to disk, it may lead to data 
inconsistency. Here is my example, which is also a real problem I have ever met.

1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the leader.

2. Both zk1 and zk2 had the log records from log1~logX, X is the zxid.

3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
saved to log files of both zk2(leader) and zk3(follower).

4. After zk1 restarted successfully, it found itself to be a follower, and it 
began to synchronize log with the leader. The leader sent a snapshot(records 
from log 1 ~ log Y) to zk1, zk1 saved the snapshot to local disk by calling the 
method ZooKeeperServer.takeSnapshot. But unfortunately, when the method 
returned, the snapshot data was not saved to disk yet. If fact the snapshot 
file was created, but the size was 0.

5. zk1 finished the synchronization and began to accept new request from the 
leader. Say log(Y + 1) ~ log Z was accepted by zk1 and  saved to log file. With 
fsync zk1 can make sure log data is not lost.

6. zk1 restarted again. Since the snapshot's size was 0, it would not be used, 
therefore zk1 recovered using the log files. But the records from log(X+1) ~ 
logY were lost ! 

 

Sorry for my poor English.

 


> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both 

[jira] [Created] (ZOOKEEPER-3220) Snapshot is not written to disk and cause data inconsistency.

2018-12-18 Thread Jiafu Jiang (JIRA)
Jiafu Jiang created ZOOKEEPER-3220:
--

 Summary: Snapshot is not written to disk and cause data 
inconsistency.
 Key: ZOOKEEPER-3220
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.13, 3.4.12
Reporter: Jiafu Jiang


We known that ZooKeeper server will call fsync to make sure that log data has 
been successfully saved to disk. But ZooKeeper server does not call fsync to 
make sure that a snapshot has been successfully saved, which may cause 
potential problems. Since a close to a file description does not make sure that 
data is written to disk, see 
[http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.

 

If the snapshot is not successfully  saved to disk, it may lead to data 
inconsistency. Here is my example, which is also a real problem I have ever met.

1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the leader.

2. Both zk1 and zk2 had the log records from log1~logX, X is the zxid.

3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
saved to log files of both zk2(leader) and zk3(follower).

4. After zk1 restarted successfully, it found itself to be a follower, and it 
began to synchronize log with the leader. The leader sent a snapshot(records 
from log 1 ~ log Y) to zk1, zk1 saved the snapshot to local disk by calling the 
method ZooKeeperServer.takeSnapshot. But unfortunately, when the method 
returned, the snapshot data was not saved to disk yet. If fact the snapshot 
file was created, but the size was 0.

5. zk1 finished the synchronization and began to accept new request from the 
leader. Say log(Y + 1) ~ log Z was accepted by zk1 and  saved to log file. With 
fsync zk1 can make sure log data is not lost.

6. zk1 restarted again. Since the snapshot's size was 0, it would not be used, 
therefore zk1 recovered using the log files. But the records from log(X+1) ~ 
logY were lost ! 

 

Sorry for my poor English.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-18 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3220:
---
Summary: The snapshot is not saved to disk and may cause data 
inconsistency.  (was: Snapshot is not written to disk and cause data 
inconsistency.)

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X is the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize log with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 saved the snapshot to local disk by calling 
> the method ZooKeeperServer.takeSnapshot. But unfortunately, when the method 
> returned, the snapshot data was not saved to disk yet. If fact the snapshot 
> file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new request from the 
> leader. Say log(Y + 1) ~ log Z was accepted by zk1 and  saved to log file. 
> With fsync zk1 can make sure log data is not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3099) ZooKeeper cluster is unavailable for session_timeout time due to network partition in a three-node environment.  

2018-10-14 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3099:
---
Summary: ZooKeeper cluster is unavailable for session_timeout time due to 
network partition in a three-node environment.     (was: ZooKeeper cluster is 
unavailable for session_timeout time when the leader shutdown in a three-node 
environment.   )

> ZooKeeper cluster is unavailable for session_timeout time due to network 
> partition in a three-node environment.   
> --
>
> Key: ZOOKEEPER-3099
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3099
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client, java client
>Affects Versions: 3.4.11, 3.5.4, 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Major
>
>  
> The default readTimeout timeout of ZooKeeper client is 2/3 * session_time, 
> the default connectTimeout is session_time/hostProvider.size(). If the 
> ZooKeeper cluster has 3 nodes, then connectTimeout is 1/3 * session_time.
>  
> Supports we have three ZooKeeper servers: zk1, zk2, zk3 deployed. And zk3 is 
> now the leader. Client c1 is now connected to zk2(follower). Then we shutdown 
> the network of zk3(leader), the same time, client c1 begin to write some data 
> to ZooKeeper. After a (syncLimit * tick) timeout, zk2 will disconnect with 
> leader and begin a new election, and zk2 becomes the leader.
>  
> The write operation will not succeed due to the leader is shutdown. It will 
> take at most readTimeout time for c1 to discover the failure, and client c1 
> will try to choose another ZooKeeper server. Unfortunately, c1 may choose 
> zk3, which is unreachable now, then c1 will spend connectTimeout to find out 
> that zk3 is unused. Notice that readTimeout + connectTimeout = 
> sesstion_timeout in my case(three-node cluster).
>  
> Therefore, in this case, the ZooKeeper cluster is unavailable for session 
> timeout time when only one ZooKeeper server is shutdown.
>  
> I have some suggestions:
>  # The HostProvider used by ZooKeeper can be specified by an argument.
>  # readTimeout can also be specified in any way.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3099) ZooKeeper cluster is unavailable for session_timeout time due to network partition in a three-node environment.  

2018-10-14 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3099:
---
Description: 
 

The default readTimeout timeout of ZooKeeper client is 2/3 * session_time, the 
default connectTimeout is session_time/hostProvider.size(). If the ZooKeeper 
cluster has 3 nodes, then connectTimeout is 1/3 * session_time.

 

Supports we have three ZooKeeper servers: zk1, zk2, zk3 deployed. And zk3 is 
now the leader. Client c1 is now connected to zk2(follower). Then we shutdown 
the network of zk3(leader), the same time, client c1 begin to write some data 
to ZooKeeper. After a (syncLimit * tick) timeout, zk2 will disconnect with 
leader and begin a new election, and zk2 becomes the leader.

 

The write operation will not succeed due to the leader is unavailable. It will 
take at most readTimeout time for c1 to discover the failure, and client c1 
will try to choose another ZooKeeper server. Unfortunately, c1 may choose zk3, 
which is unreachable now, then c1 will spend connectTimeout to find out that 
zk3 is unused. Notice that readTimeout + connectTimeout = sesstion_timeout in 
my case(three-node cluster).

 

Therefore, in this case, the ZooKeeper cluster is unavailable for session 
timeout time when only one ZooKeeper server is unreachable due to network 
partition.

 

I have some suggestions:
 # The HostProvider used by ZooKeeper can be specified by an argument.
 # readTimeout can also be specified in any way.

 

 

 

  was:
 

The default readTimeout timeout of ZooKeeper client is 2/3 * session_time, the 
default connectTimeout is session_time/hostProvider.size(). If the ZooKeeper 
cluster has 3 nodes, then connectTimeout is 1/3 * session_time.

 

Supports we have three ZooKeeper servers: zk1, zk2, zk3 deployed. And zk3 is 
now the leader. Client c1 is now connected to zk2(follower). Then we shutdown 
the network of zk3(leader), the same time, client c1 begin to write some data 
to ZooKeeper. After a (syncLimit * tick) timeout, zk2 will disconnect with 
leader and begin a new election, and zk2 becomes the leader.

 

The write operation will not succeed due to the leader is unavailable. It will 
take at most readTimeout time for c1 to discover the failure, and client c1 
will try to choose another ZooKeeper server. Unfortunately, c1 may choose zk3, 
which is unreachable now, then c1 will spend connectTimeout to find out that 
zk3 is unused. Notice that readTimeout + connectTimeout = sesstion_timeout in 
my case(three-node cluster).

 

Therefore, in this case, the ZooKeeper cluster is unavailable for session 
timeout time when only one ZooKeeper server is unreachable due to network .

 

I have some suggestions:
 # The HostProvider used by ZooKeeper can be specified by an argument.
 # readTimeout can also be specified in any way.

 

 

 


> ZooKeeper cluster is unavailable for session_timeout time due to network 
> partition in a three-node environment.   
> --
>
> Key: ZOOKEEPER-3099
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3099
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client, java client
>Affects Versions: 3.4.11, 3.5.4, 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Major
>
>  
> The default readTimeout timeout of ZooKeeper client is 2/3 * session_time, 
> the default connectTimeout is session_time/hostProvider.size(). If the 
> ZooKeeper cluster has 3 nodes, then connectTimeout is 1/3 * session_time.
>  
> Supports we have three ZooKeeper servers: zk1, zk2, zk3 deployed. And zk3 is 
> now the leader. Client c1 is now connected to zk2(follower). Then we shutdown 
> the network of zk3(leader), the same time, client c1 begin to write some data 
> to ZooKeeper. After a (syncLimit * tick) timeout, zk2 will disconnect with 
> leader and begin a new election, and zk2 becomes the leader.
>  
> The write operation will not succeed due to the leader is unavailable. It 
> will take at most readTimeout time for c1 to discover the failure, and client 
> c1 will try to choose another ZooKeeper server. Unfortunately, c1 may choose 
> zk3, which is unreachable now, then c1 will spend connectTimeout to find out 
> that zk3 is unused. Notice that readTimeout + connectTimeout = 
> sesstion_timeout in my case(three-node cluster).
>  
> Therefore, in this case, the ZooKeeper cluster is unavailable for session 
> timeout time when only one ZooKeeper server is unreachable due to network 
> partition.
>  
> I have some suggestions:
>  # The HostProvider used by ZooKeeper can be specified by an argument.
>  # readTimeout can also be specified in any way.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3099) ZooKeeper cluster is unavailable for session_timeout time due to network partition in a three-node environment.  

2018-10-14 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649635#comment-16649635
 ] 

Jiafu Jiang commented on ZOOKEEPER-3099:


[~lvfangmin] thanks for your advice. I have changed the title and the 
description.

> ZooKeeper cluster is unavailable for session_timeout time due to network 
> partition in a three-node environment.   
> --
>
> Key: ZOOKEEPER-3099
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3099
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client, java client
>Affects Versions: 3.4.11, 3.5.4, 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Major
>
>  
> The default readTimeout timeout of ZooKeeper client is 2/3 * session_time, 
> the default connectTimeout is session_time/hostProvider.size(). If the 
> ZooKeeper cluster has 3 nodes, then connectTimeout is 1/3 * session_time.
>  
> Supports we have three ZooKeeper servers: zk1, zk2, zk3 deployed. And zk3 is 
> now the leader. Client c1 is now connected to zk2(follower). Then we shutdown 
> the network of zk3(leader), the same time, client c1 begin to write some data 
> to ZooKeeper. After a (syncLimit * tick) timeout, zk2 will disconnect with 
> leader and begin a new election, and zk2 becomes the leader.
>  
> The write operation will not succeed due to the leader is unavailable. It 
> will take at most readTimeout time for c1 to discover the failure, and client 
> c1 will try to choose another ZooKeeper server. Unfortunately, c1 may choose 
> zk3, which is unreachable now, then c1 will spend connectTimeout to find out 
> that zk3 is unused. Notice that readTimeout + connectTimeout = 
> sesstion_timeout in my case(three-node cluster).
>  
> Therefore, in this case, the ZooKeeper cluster is unavailable for session 
> timeout time when only one ZooKeeper server is unreachable due to network 
> partition.
>  
> I have some suggestions:
>  # The HostProvider used by ZooKeeper can be specified by an argument.
>  # readTimeout can also be specified in any way.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3099) ZooKeeper cluster is unavailable for session_timeout time due to network partition in a three-node environment.  

2018-10-14 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3099:
---
Description: 
 

The default readTimeout timeout of ZooKeeper client is 2/3 * session_time, the 
default connectTimeout is session_time/hostProvider.size(). If the ZooKeeper 
cluster has 3 nodes, then connectTimeout is 1/3 * session_time.

 

Supports we have three ZooKeeper servers: zk1, zk2, zk3 deployed. And zk3 is 
now the leader. Client c1 is now connected to zk2(follower). Then we shutdown 
the network of zk3(leader), the same time, client c1 begin to write some data 
to ZooKeeper. After a (syncLimit * tick) timeout, zk2 will disconnect with 
leader and begin a new election, and zk2 becomes the leader.

 

The write operation will not succeed due to the leader is unavailable. It will 
take at most readTimeout time for c1 to discover the failure, and client c1 
will try to choose another ZooKeeper server. Unfortunately, c1 may choose zk3, 
which is unreachable now, then c1 will spend connectTimeout to find out that 
zk3 is unused. Notice that readTimeout + connectTimeout = sesstion_timeout in 
my case(three-node cluster).

 

Therefore, in this case, the ZooKeeper cluster is unavailable for session 
timeout time when only one ZooKeeper server is unreachable due to network .

 

I have some suggestions:
 # The HostProvider used by ZooKeeper can be specified by an argument.
 # readTimeout can also be specified in any way.

 

 

 

  was:
 

The default readTimeout timeout of ZooKeeper client is 2/3 * session_time, the 
default connectTimeout is session_time/hostProvider.size(). If the ZooKeeper 
cluster has 3 nodes, then connectTimeout is 1/3 * session_time.

 

Supports we have three ZooKeeper servers: zk1, zk2, zk3 deployed. And zk3 is 
now the leader. Client c1 is now connected to zk2(follower). Then we shutdown 
the network of zk3(leader), the same time, client c1 begin to write some data 
to ZooKeeper. After a (syncLimit * tick) timeout, zk2 will disconnect with 
leader and begin a new election, and zk2 becomes the leader.

 

The write operation will not succeed due to the leader is shutdown. It will 
take at most readTimeout time for c1 to discover the failure, and client c1 
will try to choose another ZooKeeper server. Unfortunately, c1 may choose zk3, 
which is unreachable now, then c1 will spend connectTimeout to find out that 
zk3 is unused. Notice that readTimeout + connectTimeout = sesstion_timeout in 
my case(three-node cluster).

 

Therefore, in this case, the ZooKeeper cluster is unavailable for session 
timeout time when only one ZooKeeper server is shutdown.

 

I have some suggestions:
 # The HostProvider used by ZooKeeper can be specified by an argument.
 # readTimeout can also be specified in any way.

 

 

 


> ZooKeeper cluster is unavailable for session_timeout time due to network 
> partition in a three-node environment.   
> --
>
> Key: ZOOKEEPER-3099
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3099
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client, java client
>Affects Versions: 3.4.11, 3.5.4, 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Major
>
>  
> The default readTimeout timeout of ZooKeeper client is 2/3 * session_time, 
> the default connectTimeout is session_time/hostProvider.size(). If the 
> ZooKeeper cluster has 3 nodes, then connectTimeout is 1/3 * session_time.
>  
> Supports we have three ZooKeeper servers: zk1, zk2, zk3 deployed. And zk3 is 
> now the leader. Client c1 is now connected to zk2(follower). Then we shutdown 
> the network of zk3(leader), the same time, client c1 begin to write some data 
> to ZooKeeper. After a (syncLimit * tick) timeout, zk2 will disconnect with 
> leader and begin a new election, and zk2 becomes the leader.
>  
> The write operation will not succeed due to the leader is unavailable. It 
> will take at most readTimeout time for c1 to discover the failure, and client 
> c1 will try to choose another ZooKeeper server. Unfortunately, c1 may choose 
> zk3, which is unreachable now, then c1 will spend connectTimeout to find out 
> that zk3 is unused. Notice that readTimeout + connectTimeout = 
> sesstion_timeout in my case(three-node cluster).
>  
> Therefore, in this case, the ZooKeeper cluster is unavailable for session 
> timeout time when only one ZooKeeper server is unreachable due to network .
>  
> I have some suggestions:
>  # The HostProvider used by ZooKeeper can be specified by an argument.
>  # readTimeout can also be specified in any way.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3099) ZooKeeper cluster is unavailable for session_timeout time when the leader shutdown in a three-node environment.  

2018-07-22 Thread Jiafu Jiang (JIRA)
Jiafu Jiang created ZOOKEEPER-3099:
--

 Summary: ZooKeeper cluster is unavailable for session_timeout time 
when the leader shutdown in a three-node environment.   
 Key: ZOOKEEPER-3099
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3099
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client, java client
Affects Versions: 3.4.13, 3.4.12, 3.5.4, 3.4.11
Reporter: Jiafu Jiang


 

The default readTimeout timeout of ZooKeeper client is 2/3 * session_time, the 
default connectTimeout is session_time/hostProvider.size(). If the ZooKeeper 
cluster has 3 nodes, then connectTimeout is 1/3 * session_time.

 

Supports we have three ZooKeeper servers: zk1, zk2, zk3 deployed. And zk3 is 
now the leader. Client c1 is now connected to zk2(follower). Then we shutdown 
the network of zk3(leader), the same time, client c1 begin to write some data 
to ZooKeeper. After a (syncLimit * tick) timeout, zk2 will disconnect with 
leader and begin a new election, and zk2 becomes the leader.

 

The write operation will not succeed due to the leader is shutdown. It will 
take at most readTimeout time for c1 to discover the failure, and client c1 
will try to choose another ZooKeeper server. Unfortunately, c1 may choose zk3, 
which is unreachable now, then c1 will spend connectTimeout to find out that 
zk3 is unused. Notice that readTimeout + connectTimeout = sesstion_timeout in 
my case(three-node cluster).

 

Therefore, in this case, the ZooKeeper cluster is unavailable for session 
timeout time when only one ZooKeeper server is shutdown.

 

I have some suggestions:
 # The HostProvider used by ZooKeeper can be specified by an argument.
 # readTimeout can also be specified in any way.

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2701) Timeout for RecvWorker is too long

2018-07-16 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544918#comment-16544918
 ] 

Jiafu Jiang commented on ZOOKEEPER-2701:


I read the source code of  ZooKeeper 3.4.12, and I find the SendWorker or 
RecvWorker will be finished when an IOException  happends.

When network problems happen, the OS may or may not discover the dead 
connection in time,  especially when the socket timeout is infinity.  This will 
lead to problem that ZooKeeper take *several*  minutes to elect new leader.

 

 

> Timeout for RecvWorker is too long
> --
>
> Key: ZOOKEEPER-2701
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2701
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.8, 3.4.9, 3.4.10, 3.4.11
> Environment: Centos6.5
> ZooKeeper 3.4.8
>Reporter: Jiafu Jiang
>Priority: Major
>
> Environment:
>  I deploy ZooKeeper in a cluster of three nodes. Each node has three network 
> interfaces(eth0, eth1, eth2).
> Hostname is used instead of IP address in zoo.cfg, and 
> quorumListenOnAllIPs=true
> Probleam:
>  I start three ZooKeeper servers( node A, node B, and node C) one by one, 
>  when the leader election finishes, node B is the leader. 
>  Then I shutdown one network interface of node A by command "ifdown eth0". 
> The ZooKeeper server on node A will lost connection to node B and node C. In 
> my test, I will take about 20 minites that the ZooKeepr server of node A 
> realizes the event and try to call the QuorumServer.recreateSocketAddress the 
> resolve the hostname.
> I try to read the source code, and I find the code in
> {code:java|title=QuorumCnxManager.java:|borderStyle=solid}
> class RecvWorker extends ZooKeeperThread {
> Long sid;
> Socket sock;
> volatile boolean running = true;
> final DataInputStream din;
> final SendWorker sw;
> RecvWorker(Socket sock, DataInputStream din, Long sid, SendWorker sw) 
> {
> super("RecvWorker:" + sid);
> this.sid = sid;
> this.sock = sock;
> this.sw = sw;
> this.din = din;
> try {
> // OK to wait until socket disconnects while reading.
> sock.setSoTimeout(0);
> } catch (IOException e) {
> LOG.error("Error while accessing socket for " + sid, e);
> closeSocket(sock);
> running = false;
> }
> }
>...
>  }
> {code}
> I notice that the soTime is set to 0 in RecvWorker constructor. I think this 
> is reasonable when the IP address of a ZooKeeper server never change, but 
> considering that the IP address of each ZooKeeper server may change, maybe we 
> should better set a timeout here.
> I think this is a problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-2701) Timeout for RecvWorker is too long

2018-07-16 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2701:
---
Priority: Trivial  (was: Minor)

> Timeout for RecvWorker is too long
> --
>
> Key: ZOOKEEPER-2701
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2701
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.8, 3.4.9, 3.4.10, 3.4.11
> Environment: Centos6.5
> ZooKeeper 3.4.8
>Reporter: Jiafu Jiang
>Priority: Trivial
>
> Environment:
>  I deploy ZooKeeper in a cluster of three nodes. Each node has three network 
> interfaces(eth0, eth1, eth2).
> Hostname is used instead of IP address in zoo.cfg, and 
> quorumListenOnAllIPs=true
> Probleam:
>  I start three ZooKeeper servers( node A, node B, and node C) one by one, 
>  when the leader election finishes, node B is the leader. 
>  Then I shutdown one network interface of node A by command "ifdown eth0". 
> The ZooKeeper server on node A will lost connection to node B and node C. In 
> my test, I will take about 20 minites that the ZooKeepr server of node A 
> realizes the event and try to call the QuorumServer.recreateSocketAddress the 
> resolve the hostname.
> I try to read the source code, and I find the code in
> {code:java|title=QuorumCnxManager.java:|borderStyle=solid}
> class RecvWorker extends ZooKeeperThread {
> Long sid;
> Socket sock;
> volatile boolean running = true;
> final DataInputStream din;
> final SendWorker sw;
> RecvWorker(Socket sock, DataInputStream din, Long sid, SendWorker sw) 
> {
> super("RecvWorker:" + sid);
> this.sid = sid;
> this.sock = sock;
> this.sw = sw;
> this.din = din;
> try {
> // OK to wait until socket disconnects while reading.
> sock.setSoTimeout(0);
> } catch (IOException e) {
> LOG.error("Error while accessing socket for " + sid, e);
> closeSocket(sock);
> running = false;
> }
> }
>...
>  }
> {code}
> I notice that the soTime is set to 0 in RecvWorker constructor. I think this 
> is reasonable when the IP address of a ZooKeeper server never change, but 
> considering that the IP address of each ZooKeeper server may change, maybe we 
> should better set a timeout here.
> I think this is a problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-2701) Timeout for RecvWorker is too long

2018-07-16 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2701:
---
Priority: Major  (was: Trivial)

> Timeout for RecvWorker is too long
> --
>
> Key: ZOOKEEPER-2701
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2701
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.8, 3.4.9, 3.4.10, 3.4.11
> Environment: Centos6.5
> ZooKeeper 3.4.8
>Reporter: Jiafu Jiang
>Priority: Major
>
> Environment:
>  I deploy ZooKeeper in a cluster of three nodes. Each node has three network 
> interfaces(eth0, eth1, eth2).
> Hostname is used instead of IP address in zoo.cfg, and 
> quorumListenOnAllIPs=true
> Probleam:
>  I start three ZooKeeper servers( node A, node B, and node C) one by one, 
>  when the leader election finishes, node B is the leader. 
>  Then I shutdown one network interface of node A by command "ifdown eth0". 
> The ZooKeeper server on node A will lost connection to node B and node C. In 
> my test, I will take about 20 minites that the ZooKeepr server of node A 
> realizes the event and try to call the QuorumServer.recreateSocketAddress the 
> resolve the hostname.
> I try to read the source code, and I find the code in
> {code:java|title=QuorumCnxManager.java:|borderStyle=solid}
> class RecvWorker extends ZooKeeperThread {
> Long sid;
> Socket sock;
> volatile boolean running = true;
> final DataInputStream din;
> final SendWorker sw;
> RecvWorker(Socket sock, DataInputStream din, Long sid, SendWorker sw) 
> {
> super("RecvWorker:" + sid);
> this.sid = sid;
> this.sock = sock;
> this.sw = sw;
> this.din = din;
> try {
> // OK to wait until socket disconnects while reading.
> sock.setSoTimeout(0);
> } catch (IOException e) {
> LOG.error("Error while accessing socket for " + sid, e);
> closeSocket(sock);
> running = false;
> }
> }
>...
>  }
> {code}
> I notice that the soTime is set to 0 in RecvWorker constructor. I think this 
> is reasonable when the IP address of a ZooKeeper server never change, but 
> considering that the IP address of each ZooKeeper server may change, maybe we 
> should better set a timeout here.
> I think this is a problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2701) Timeout for RecvWorker is too long

2018-07-16 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544897#comment-16544897
 ] 

Jiafu Jiang commented on ZOOKEEPER-2701:


I remove the following code:
try {// OK to wait until socket disconnects while 
reading.sock.setSoTimeout(0);
} catch (IOException e) {
LOG.error("Error while accessing socket for " + sid, e);
closeSocket(sock);
running = false;
}
 

And I find it works fine in my test environment.

> Timeout for RecvWorker is too long
> --
>
> Key: ZOOKEEPER-2701
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2701
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.8, 3.4.9, 3.4.10, 3.4.11
> Environment: Centos6.5
> ZooKeeper 3.4.8
>Reporter: Jiafu Jiang
>Priority: Minor
>
> Environment:
>  I deploy ZooKeeper in a cluster of three nodes. Each node has three network 
> interfaces(eth0, eth1, eth2).
> Hostname is used instead of IP address in zoo.cfg, and 
> quorumListenOnAllIPs=true
> Probleam:
>  I start three ZooKeeper servers( node A, node B, and node C) one by one, 
>  when the leader election finishes, node B is the leader. 
>  Then I shutdown one network interface of node A by command "ifdown eth0". 
> The ZooKeeper server on node A will lost connection to node B and node C. In 
> my test, I will take about 20 minites that the ZooKeepr server of node A 
> realizes the event and try to call the QuorumServer.recreateSocketAddress the 
> resolve the hostname.
> I try to read the source code, and I find the code in
> {code:java|title=QuorumCnxManager.java:|borderStyle=solid}
> class RecvWorker extends ZooKeeperThread {
> Long sid;
> Socket sock;
> volatile boolean running = true;
> final DataInputStream din;
> final SendWorker sw;
> RecvWorker(Socket sock, DataInputStream din, Long sid, SendWorker sw) 
> {
> super("RecvWorker:" + sid);
> this.sid = sid;
> this.sock = sock;
> this.sw = sw;
> this.din = din;
> try {
> // OK to wait until socket disconnects while reading.
> sock.setSoTimeout(0);
> } catch (IOException e) {
> LOG.error("Error while accessing socket for " + sid, e);
> closeSocket(sock);
> running = false;
> }
> }
>...
>  }
> {code}
> I notice that the soTime is set to 0 in RecvWorker constructor. I think this 
> is reasonable when the IP address of a ZooKeeper server never change, but 
> considering that the IP address of each ZooKeeper server may change, maybe we 
> should better set a timeout here.
> I think this is a problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2701) Timeout for RecvWorker is too long

2018-07-16 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544893#comment-16544893
 ] 

Jiafu Jiang commented on ZOOKEEPER-2701:


I remove the following code:
try {// OK to wait until socket disconnects while 
reading.sock.setSoTimeout(0);
} catch (IOException e) {
LOG.error("Error while accessing socket for " + sid, e);
closeSocket(sock);
running = false;
}

And I find it works fine in my test environment. 


> Timeout for RecvWorker is too long
> --
>
> Key: ZOOKEEPER-2701
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2701
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.8, 3.4.9, 3.4.10, 3.4.11
> Environment: Centos6.5
> ZooKeeper 3.4.8
>Reporter: Jiafu Jiang
>Priority: Minor
>
> Environment:
>  I deploy ZooKeeper in a cluster of three nodes. Each node has three network 
> interfaces(eth0, eth1, eth2).
> Hostname is used instead of IP address in zoo.cfg, and 
> quorumListenOnAllIPs=true
> Probleam:
>  I start three ZooKeeper servers( node A, node B, and node C) one by one, 
>  when the leader election finishes, node B is the leader. 
>  Then I shutdown one network interface of node A by command "ifdown eth0". 
> The ZooKeeper server on node A will lost connection to node B and node C. In 
> my test, I will take about 20 minites that the ZooKeepr server of node A 
> realizes the event and try to call the QuorumServer.recreateSocketAddress the 
> resolve the hostname.
> I try to read the source code, and I find the code in
> {code:java|title=QuorumCnxManager.java:|borderStyle=solid}
> class RecvWorker extends ZooKeeperThread {
> Long sid;
> Socket sock;
> volatile boolean running = true;
> final DataInputStream din;
> final SendWorker sw;
> RecvWorker(Socket sock, DataInputStream din, Long sid, SendWorker sw) 
> {
> super("RecvWorker:" + sid);
> this.sid = sid;
> this.sock = sock;
> this.sw = sw;
> this.din = din;
> try {
> // OK to wait until socket disconnects while reading.
> sock.setSoTimeout(0);
> } catch (IOException e) {
> LOG.error("Error while accessing socket for " + sid, e);
> closeSocket(sock);
> running = false;
> }
> }
>...
>  }
> {code}
> I notice that the soTime is set to 0 in RecvWorker constructor. I think this 
> is reasonable when the IP address of a ZooKeeper server never change, but 
> considering that the IP address of each ZooKeeper server may change, maybe we 
> should better set a timeout here.
> I think this is a problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (ZOOKEEPER-2701) Timeout for RecvWorker is too long

2018-07-16 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2701:
---
Comment: was deleted

(was: I remove the following code:
try {// OK to wait until socket disconnects while reading.  
  sock.setSoTimeout(0);
} catch (IOException e) {
LOG.error("Error while accessing socket for " + sid, e);
closeSocket(sock);
running = false;
}
And I find it works fine in my test environment.)

> Timeout for RecvWorker is too long
> --
>
> Key: ZOOKEEPER-2701
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2701
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.8, 3.4.9, 3.4.10, 3.4.11
> Environment: Centos6.5
> ZooKeeper 3.4.8
>Reporter: Jiafu Jiang
>Priority: Minor
>
> Environment:
>  I deploy ZooKeeper in a cluster of three nodes. Each node has three network 
> interfaces(eth0, eth1, eth2).
> Hostname is used instead of IP address in zoo.cfg, and 
> quorumListenOnAllIPs=true
> Probleam:
>  I start three ZooKeeper servers( node A, node B, and node C) one by one, 
>  when the leader election finishes, node B is the leader. 
>  Then I shutdown one network interface of node A by command "ifdown eth0". 
> The ZooKeeper server on node A will lost connection to node B and node C. In 
> my test, I will take about 20 minites that the ZooKeepr server of node A 
> realizes the event and try to call the QuorumServer.recreateSocketAddress the 
> resolve the hostname.
> I try to read the source code, and I find the code in
> {code:java|title=QuorumCnxManager.java:|borderStyle=solid}
> class RecvWorker extends ZooKeeperThread {
> Long sid;
> Socket sock;
> volatile boolean running = true;
> final DataInputStream din;
> final SendWorker sw;
> RecvWorker(Socket sock, DataInputStream din, Long sid, SendWorker sw) 
> {
> super("RecvWorker:" + sid);
> this.sid = sid;
> this.sock = sock;
> this.sw = sw;
> this.din = din;
> try {
> // OK to wait until socket disconnects while reading.
> sock.setSoTimeout(0);
> } catch (IOException e) {
> LOG.error("Error while accessing socket for " + sid, e);
> closeSocket(sock);
> running = false;
> }
> }
>...
>  }
> {code}
> I notice that the soTime is set to 0 in RecvWorker constructor. I think this 
> is reasonable when the IP address of a ZooKeeper server never change, but 
> considering that the IP address of each ZooKeeper server may change, maybe we 
> should better set a timeout here.
> I think this is a problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ZOOKEEPER-2701) Timeout for RecvWorker is too long

2018-07-16 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544893#comment-16544893
 ] 

Jiafu Jiang edited comment on ZOOKEEPER-2701 at 7/16/18 7:32 AM:
-

I remove the following code:
try {// OK to wait until socket disconnects while reading.  
  sock.setSoTimeout(0);
} catch (IOException e) {
LOG.error("Error while accessing socket for " + sid, e);
closeSocket(sock);
running = false;
}
And I find it works fine in my test environment.


was (Author: jiangjiafu):
I remove the following code:
try {// OK to wait until socket disconnects while 
reading.sock.setSoTimeout(0);
} catch (IOException e) {
LOG.error("Error while accessing socket for " + sid, e);
closeSocket(sock);
running = false;
}

And I find it works fine in my test environment. 


> Timeout for RecvWorker is too long
> --
>
> Key: ZOOKEEPER-2701
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2701
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.8, 3.4.9, 3.4.10, 3.4.11
> Environment: Centos6.5
> ZooKeeper 3.4.8
>Reporter: Jiafu Jiang
>Priority: Minor
>
> Environment:
>  I deploy ZooKeeper in a cluster of three nodes. Each node has three network 
> interfaces(eth0, eth1, eth2).
> Hostname is used instead of IP address in zoo.cfg, and 
> quorumListenOnAllIPs=true
> Probleam:
>  I start three ZooKeeper servers( node A, node B, and node C) one by one, 
>  when the leader election finishes, node B is the leader. 
>  Then I shutdown one network interface of node A by command "ifdown eth0". 
> The ZooKeeper server on node A will lost connection to node B and node C. In 
> my test, I will take about 20 minites that the ZooKeepr server of node A 
> realizes the event and try to call the QuorumServer.recreateSocketAddress the 
> resolve the hostname.
> I try to read the source code, and I find the code in
> {code:java|title=QuorumCnxManager.java:|borderStyle=solid}
> class RecvWorker extends ZooKeeperThread {
> Long sid;
> Socket sock;
> volatile boolean running = true;
> final DataInputStream din;
> final SendWorker sw;
> RecvWorker(Socket sock, DataInputStream din, Long sid, SendWorker sw) 
> {
> super("RecvWorker:" + sid);
> this.sid = sid;
> this.sock = sock;
> this.sw = sw;
> this.din = din;
> try {
> // OK to wait until socket disconnects while reading.
> sock.setSoTimeout(0);
> } catch (IOException e) {
> LOG.error("Error while accessing socket for " + sid, e);
> closeSocket(sock);
> running = false;
> }
> }
>...
>  }
> {code}
> I notice that the soTime is set to 0 in RecvWorker constructor. I think this 
> is reasonable when the IP address of a ZooKeeper server never change, but 
> considering that the IP address of each ZooKeeper server may change, maybe we 
> should better set a timeout here.
> I think this is a problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-2701) Timeout for RecvWorker is too long

2018-07-16 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2701:
---
Affects Version/s: 3.4.11

> Timeout for RecvWorker is too long
> --
>
> Key: ZOOKEEPER-2701
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2701
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.8, 3.4.9, 3.4.10, 3.4.11
> Environment: Centos6.5
> ZooKeeper 3.4.8
>Reporter: Jiafu Jiang
>Priority: Minor
>
> Environment:
>  I deploy ZooKeeper in a cluster of three nodes. Each node has three network 
> interfaces(eth0, eth1, eth2).
> Hostname is used instead of IP address in zoo.cfg, and 
> quorumListenOnAllIPs=true
> Probleam:
>  I start three ZooKeeper servers( node A, node B, and node C) one by one, 
>  when the leader election finishes, node B is the leader. 
>  Then I shutdown one network interface of node A by command "ifdown eth0". 
> The ZooKeeper server on node A will lost connection to node B and node C. In 
> my test, I will take about 20 minites that the ZooKeepr server of node A 
> realizes the event and try to call the QuorumServer.recreateSocketAddress the 
> resolve the hostname.
> I try to read the source code, and I find the code in
> {code:java|title=QuorumCnxManager.java:|borderStyle=solid}
> class RecvWorker extends ZooKeeperThread {
> Long sid;
> Socket sock;
> volatile boolean running = true;
> final DataInputStream din;
> final SendWorker sw;
> RecvWorker(Socket sock, DataInputStream din, Long sid, SendWorker sw) 
> {
> super("RecvWorker:" + sid);
> this.sid = sid;
> this.sock = sock;
> this.sw = sw;
> this.din = din;
> try {
> // OK to wait until socket disconnects while reading.
> sock.setSoTimeout(0);
> } catch (IOException e) {
> LOG.error("Error while accessing socket for " + sid, e);
> closeSocket(sock);
> running = false;
> }
> }
>...
>  }
> {code}
> I notice that the soTime is set to 0 in RecvWorker constructor. I think this 
> is reasonable when the IP address of a ZooKeeper server never change, but 
> considering that the IP address of each ZooKeeper server may change, maybe we 
> should better set a timeout here.
> I think this is a problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2018-02-28 Thread Jiafu Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381347#comment-16381347
 ] 

Jiafu Jiang commented on ZOOKEEPER-2930:


I hope this PB can be fix in version 3.4.X, since 3.4.X is the stable version.

> Leader cannot be elected due to network timeout of some members.
> 
>
> Key: ZOOKEEPER-2930
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.4.10, 3.5.3, 3.4.11, 3.5.4, 3.4.12
> Environment: Java 8
> ZooKeeper 3.4.11(from github)
> Centos6.5
>Reporter: Jiafu Jiang
>Priority: Critical
> Attachments: zoo.cfg, zookeeper1.log, zookeeper2.log
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:20.10.11.101, 30.10.11.101
> ofs_zk2:20.10.11.102, 30.10.11.102
> ofs_zk3:20.10.11.103, 30.10.11.103
> I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.
> It is supposed that the new Leader should be elected in some seconds, but the 
> fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
> them can become the new Leader.
> I change the log level to DEBUG (the default is INFO), and restart zookeeper 
> servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.
> I read the log and the ZooKeeper source code, and I think I find the reason.
> When the potential leader(says ofs_zk3) begins the 
> election(FastLeaderElection.lookForLeader()), it will send notifications to 
> all the servers. 
> When it fails to receive any notification during a timeout, it will resend 
> the notifications, and double the timeout. This process will repeat until any 
> notification is received or the timeout reaches a max value.
> The FastLeaderElection.sendNotifications() just put the notification message 
> into a queue and return. The WorkerSender is responsable to send the 
> notifications.
> The WorkerSender just process the notifications one by one by passing the 
> notifications to QuorumCnxManager. Here comes the problem, the 
> QuorumCnxManager.toSend() blocks for a long time when the notification is 
> send to ofs_zk2(whose network is down) and some notifications (which belongs 
> to ofs_zk1) will thus be blocked for a long time. The repeated notifications 
> by FastLeaderElection.sendNotifications() just make things worse.
> Here is the related source code:
> {code:java}
> public void toSend(Long sid, ByteBuffer b) {
> /*
>  * If sending message to myself, then simply enqueue it (loopback).
>  */
> if (this.mySid == sid) {
>  b.position(0);
>  addToRecvQueue(new Message(b.duplicate(), sid));
> /*
>  * Otherwise send to the corresponding thread to send.
>  */
> } else {
>  /*
>   * Start a new connection if doesn't have one already.
>   */
>  ArrayBlockingQueue bq = new 
> ArrayBlockingQueue(SEND_CAPACITY);
>  ArrayBlockingQueue bqExisting = 
> queueSendMap.putIfAbsent(sid, bq);
>  if (bqExisting != null) {
>  addToSendQueue(bqExisting, b);
>  } else {
>  addToSendQueue(bq, b);
>  }
>  
>  // This may block!!!
>  connectOne(sid);
> 
> }
> }
> {code}
> Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
> epoch ack, but in fact the ofs_zk1 does not receive the notification(which 
> says the leader is ofs_zk3) because the ofs_zk3 has not sent the 
> notification(which may still exist in the sendqueue of WorkerSender). At 
> last, the potential leader ofs_zk3 fails to receive the epoch ack in timeout, 
> so it quits the leader and begins a new election. 
> The log files of ofs_zk1 and ofs_zk3 are attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2018-01-10 Thread Jiafu Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2930:
---
Affects Version/s: 3.5.4
   3.5.3

> Leader cannot be elected due to network timeout of some members.
> 
>
> Key: ZOOKEEPER-2930
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.4.10, 3.5.3, 3.4.11, 3.5.4, 3.4.12
> Environment: Java 8
> ZooKeeper 3.4.11(from github)
> Centos6.5
>Reporter: Jiafu Jiang
>Priority: Critical
> Attachments: zoo.cfg, zookeeper1.log, zookeeper2.log
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:20.10.11.101, 30.10.11.101
> ofs_zk2:20.10.11.102, 30.10.11.102
> ofs_zk3:20.10.11.103, 30.10.11.103
> I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.
> It is supposed that the new Leader should be elected in some seconds, but the 
> fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
> them can become the new Leader.
> I change the log level to DEBUG (the default is INFO), and restart zookeeper 
> servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.
> I read the log and the ZooKeeper source code, and I think I find the reason.
> When the potential leader(says ofs_zk3) begins the 
> election(FastLeaderElection.lookForLeader()), it will send notifications to 
> all the servers. 
> When it fails to receive any notification during a timeout, it will resend 
> the notifications, and double the timeout. This process will repeat until any 
> notification is received or the timeout reaches a max value.
> The FastLeaderElection.sendNotifications() just put the notification message 
> into a queue and return. The WorkerSender is responsable to send the 
> notifications.
> The WorkerSender just process the notifications one by one by passing the 
> notifications to QuorumCnxManager. Here comes the problem, the 
> QuorumCnxManager.toSend() blocks for a long time when the notification is 
> send to ofs_zk2(whose network is down) and some notifications (which belongs 
> to ofs_zk1) will thus be blocked for a long time. The repeated notifications 
> by FastLeaderElection.sendNotifications() just make things worse.
> Here is the related source code:
> {code:java}
> public void toSend(Long sid, ByteBuffer b) {
> /*
>  * If sending message to myself, then simply enqueue it (loopback).
>  */
> if (this.mySid == sid) {
>  b.position(0);
>  addToRecvQueue(new Message(b.duplicate(), sid));
> /*
>  * Otherwise send to the corresponding thread to send.
>  */
> } else {
>  /*
>   * Start a new connection if doesn't have one already.
>   */
>  ArrayBlockingQueue bq = new 
> ArrayBlockingQueue(SEND_CAPACITY);
>  ArrayBlockingQueue bqExisting = 
> queueSendMap.putIfAbsent(sid, bq);
>  if (bqExisting != null) {
>  addToSendQueue(bqExisting, b);
>  } else {
>  addToSendQueue(bq, b);
>  }
>  
>  // This may block!!!
>  connectOne(sid);
> 
> }
> }
> {code}
> Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
> epoch ack, but in fact the ofs_zk1 does not receive the notification(which 
> says the leader is ofs_zk3) because the ofs_zk3 has not sent the 
> notification(which may still exist in the sendqueue of WorkerSender). At 
> last, the potential leader ofs_zk3 fails to receive the epoch ack in timeout, 
> so it quits the leader and begins a new election. 
> The log files of ofs_zk1 and ofs_zk3 are attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2018-01-10 Thread Jiafu Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2930:
---
Affects Version/s: 3.4.12

> Leader cannot be elected due to network timeout of some members.
> 
>
> Key: ZOOKEEPER-2930
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.4.10, 3.4.11, 3.4.12
> Environment: Java 8
> ZooKeeper 3.4.11(from github)
> Centos6.5
>Reporter: Jiafu Jiang
>Priority: Critical
> Attachments: zoo.cfg, zookeeper1.log, zookeeper2.log
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:20.10.11.101, 30.10.11.101
> ofs_zk2:20.10.11.102, 30.10.11.102
> ofs_zk3:20.10.11.103, 30.10.11.103
> I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.
> It is supposed that the new Leader should be elected in some seconds, but the 
> fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
> them can become the new Leader.
> I change the log level to DEBUG (the default is INFO), and restart zookeeper 
> servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.
> I read the log and the ZooKeeper source code, and I think I find the reason.
> When the potential leader(says ofs_zk3) begins the 
> election(FastLeaderElection.lookForLeader()), it will send notifications to 
> all the servers. 
> When it fails to receive any notification during a timeout, it will resend 
> the notifications, and double the timeout. This process will repeat until any 
> notification is received or the timeout reaches a max value.
> The FastLeaderElection.sendNotifications() just put the notification message 
> into a queue and return. The WorkerSender is responsable to send the 
> notifications.
> The WorkerSender just process the notifications one by one by passing the 
> notifications to QuorumCnxManager. Here comes the problem, the 
> QuorumCnxManager.toSend() blocks for a long time when the notification is 
> send to ofs_zk2(whose network is down) and some notifications (which belongs 
> to ofs_zk1) will thus be blocked for a long time. The repeated notifications 
> by FastLeaderElection.sendNotifications() just make things worse.
> Here is the related source code:
> {code:java}
> public void toSend(Long sid, ByteBuffer b) {
> /*
>  * If sending message to myself, then simply enqueue it (loopback).
>  */
> if (this.mySid == sid) {
>  b.position(0);
>  addToRecvQueue(new Message(b.duplicate(), sid));
> /*
>  * Otherwise send to the corresponding thread to send.
>  */
> } else {
>  /*
>   * Start a new connection if doesn't have one already.
>   */
>  ArrayBlockingQueue bq = new 
> ArrayBlockingQueue(SEND_CAPACITY);
>  ArrayBlockingQueue bqExisting = 
> queueSendMap.putIfAbsent(sid, bq);
>  if (bqExisting != null) {
>  addToSendQueue(bqExisting, b);
>  } else {
>  addToSendQueue(bq, b);
>  }
>  
>  // This may block!!!
>  connectOne(sid);
> 
> }
> }
> {code}
> Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
> epoch ack, but in fact the ofs_zk1 does not receive the notification(which 
> says the leader is ofs_zk3) because the ofs_zk3 has not sent the 
> notification(which may still exist in the sendqueue of WorkerSender). At 
> last, the potential leader ofs_zk3 fails to receive the epoch ack in timeout, 
> so it quits the leader and begins a new election. 
> The log files of ofs_zk1 and ofs_zk3 are attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZOOKEEPER-2962) The function queueEmpty() in FastLeaderElection.Messenger is not used, should be removed.

2017-12-25 Thread Jiafu Jiang (JIRA)
Jiafu Jiang created ZOOKEEPER-2962:
--

 Summary: The function queueEmpty() in FastLeaderElection.Messenger 
is not used, should be removed.
 Key: ZOOKEEPER-2962
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2962
 Project: ZooKeeper
  Issue Type: Improvement
  Components: leaderElection
Affects Versions: 3.4.11
Reporter: Jiafu Jiang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2017-11-07 Thread Jiafu Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2930:
---
Priority: Critical  (was: Major)

> Leader cannot be elected due to network timeout of some members.
> 
>
> Key: ZOOKEEPER-2930
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.4.10, 3.4.11
> Environment: Java 8
> ZooKeeper 3.4.11(from github)
> Centos6.5
>Reporter: Jiafu Jiang
>Priority: Critical
> Attachments: zoo.cfg, zookeeper1.log, zookeeper2.log
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:20.10.11.101, 30.10.11.101
> ofs_zk2:20.10.11.102, 30.10.11.102
> ofs_zk3:20.10.11.103, 30.10.11.103
> I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.
> It is supposed that the new Leader should be elected in some seconds, but the 
> fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
> them can become the new Leader.
> I change the log level to DEBUG (the default is INFO), and restart zookeeper 
> servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.
> I read the log and the ZooKeeper source code, and I think I find the reason.
> When the potential leader(says ofs_zk3) begins the 
> election(FastLeaderElection.lookForLeader()), it will send notifications to 
> all the servers. 
> When it fails to receive any notification during a timeout, it will resend 
> the notifications, and double the timeout. This process will repeat until any 
> notification is received or the timeout reaches a max value.
> The FastLeaderElection.sendNotifications() just put the notification message 
> into a queue and return. The WorkerSender is responsable to send the 
> notifications.
> The WorkerSender just process the notifications one by one by passing the 
> notifications to QuorumCnxManager. Here comes the problem, the 
> QuorumCnxManager.toSend() blocks for a long time when the notification is 
> send to ofs_zk2(whose network is down) and some notifications (which belongs 
> to ofs_zk1) will thus be blocked for a long time. The repeated notifications 
> by FastLeaderElection.sendNotifications() just make things worse.
> Here is the related source code:
> {code:java}
> public void toSend(Long sid, ByteBuffer b) {
> /*
>  * If sending message to myself, then simply enqueue it (loopback).
>  */
> if (this.mySid == sid) {
>  b.position(0);
>  addToRecvQueue(new Message(b.duplicate(), sid));
> /*
>  * Otherwise send to the corresponding thread to send.
>  */
> } else {
>  /*
>   * Start a new connection if doesn't have one already.
>   */
>  ArrayBlockingQueue bq = new 
> ArrayBlockingQueue(SEND_CAPACITY);
>  ArrayBlockingQueue bqExisting = 
> queueSendMap.putIfAbsent(sid, bq);
>  if (bqExisting != null) {
>  addToSendQueue(bqExisting, b);
>  } else {
>  addToSendQueue(bq, b);
>  }
>  
>  // This may block!!!
>  connectOne(sid);
> 
> }
> }
> {code}
> Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
> epoch ack, but in fact the ofs_zk1 does not receive the notification(which 
> says the leader is ofs_zk3) because the ofs_zk3 has not sent the 
> notification(which may still exist in the sendqueue of WorkerSender). At 
> last, the potential leader ofs_zk3 fails to receive the epoch ack in timeout, 
> so it quits the leader and begins a new election. 
> The log files of ofs_zk1 and ofs_zk3 are attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2017-11-07 Thread Jiafu Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2930:
---
Affects Version/s: 3.4.11

> Leader cannot be elected due to network timeout of some members.
> 
>
> Key: ZOOKEEPER-2930
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.4.10, 3.4.11
> Environment: Java 8
> ZooKeeper 3.4.11(from github)
> Centos6.5
>Reporter: Jiafu Jiang
> Attachments: zoo.cfg, zookeeper1.log, zookeeper2.log
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:20.10.11.101, 30.10.11.101
> ofs_zk2:20.10.11.102, 30.10.11.102
> ofs_zk3:20.10.11.103, 30.10.11.103
> I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.
> It is supposed that the new Leader should be elected in some seconds, but the 
> fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
> them can become the new Leader.
> I change the log level to DEBUG (the default is INFO), and restart zookeeper 
> servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.
> I read the log and the ZooKeeper source code, and I think I find the reason.
> When the potential leader(says ofs_zk3) begins the 
> election(FastLeaderElection.lookForLeader()), it will send notifications to 
> all the servers. 
> When it fails to receive any notification during a timeout, it will resend 
> the notifications, and double the timeout. This process will repeat until any 
> notification is received or the timeout reaches a max value.
> The FastLeaderElection.sendNotifications() just put the notification message 
> into a queue and return. The WorkerSender is responsable to send the 
> notifications.
> The WorkerSender just process the notifications one by one by passing the 
> notifications to QuorumCnxManager. Here comes the problem, the 
> QuorumCnxManager.toSend() blocks for a long time when the notification is 
> send to ofs_zk2(whose network is down) and some notifications (which belongs 
> to ofs_zk1) will thus be blocked for a long time. The repeated notifications 
> by FastLeaderElection.sendNotifications() just make things worse.
> Here is the related source code:
> {code:java}
> public void toSend(Long sid, ByteBuffer b) {
> /*
>  * If sending message to myself, then simply enqueue it (loopback).
>  */
> if (this.mySid == sid) {
>  b.position(0);
>  addToRecvQueue(new Message(b.duplicate(), sid));
> /*
>  * Otherwise send to the corresponding thread to send.
>  */
> } else {
>  /*
>   * Start a new connection if doesn't have one already.
>   */
>  ArrayBlockingQueue bq = new 
> ArrayBlockingQueue(SEND_CAPACITY);
>  ArrayBlockingQueue bqExisting = 
> queueSendMap.putIfAbsent(sid, bq);
>  if (bqExisting != null) {
>  addToSendQueue(bqExisting, b);
>  } else {
>  addToSendQueue(bq, b);
>  }
>  
>  // This may block!!!
>  connectOne(sid);
> 
> }
> }
> {code}
> Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
> epoch ack, but in fact the ofs_zk1 does not receive the notification(which 
> says the leader is ofs_zk3) because the ofs_zk3 has not sent the 
> notification(which may still exist in the sendqueue of WorkerSender). At 
> last, the potential leader ofs_zk3 fails to receive the epoch ack in timeout, 
> so it quits the leader and begins a new election. 
> The log files of ofs_zk1 and ofs_zk3 are attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2017-11-06 Thread Jiafu Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2930:
---
Component/s: server
 quorum

> Leader cannot be elected due to network timeout of some members.
> 
>
> Key: ZOOKEEPER-2930
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.4.10
> Environment: Java 8
> ZooKeeper 3.4.11(from github)
> Centos6.5
>Reporter: Jiafu Jiang
> Attachments: zoo.cfg, zookeeper1.log, zookeeper2.log
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:20.10.11.101, 30.10.11.101
> ofs_zk2:20.10.11.102, 30.10.11.102
> ofs_zk3:20.10.11.103, 30.10.11.103
> I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.
> It is supposed that the new Leader should be elected in some seconds, but the 
> fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
> them can become the new Leader.
> I change the log level to DEBUG (the default is INFO), and restart zookeeper 
> servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.
> I read the log and the ZooKeeper source code, and I think I find the reason.
> When the potential leader(says ofs_zk3) begins the 
> election(FastLeaderElection.lookForLeader()), it will send notifications to 
> all the servers. 
> When it fails to receive any notification during a timeout, it will resend 
> the notifications, and double the timeout. This process will repeat until any 
> notification is received or the timeout reaches a max value.
> The FastLeaderElection.sendNotifications() just put the notification message 
> into a queue and return. The WorkerSender is responsable to send the 
> notifications.
> The WorkerSender just process the notifications one by one by passing the 
> notifications to QuorumCnxManager. Here comes the problem, the 
> QuorumCnxManager.toSend() blocks for a long time when the notification is 
> send to ofs_zk2(whose network is down) and some notifications (which belongs 
> to ofs_zk1) will thus be blocked for a long time. The repeated notifications 
> by FastLeaderElection.sendNotifications() just make things worse.
> Here is the related source code:
> {code:java}
> public void toSend(Long sid, ByteBuffer b) {
> /*
>  * If sending message to myself, then simply enqueue it (loopback).
>  */
> if (this.mySid == sid) {
>  b.position(0);
>  addToRecvQueue(new Message(b.duplicate(), sid));
> /*
>  * Otherwise send to the corresponding thread to send.
>  */
> } else {
>  /*
>   * Start a new connection if doesn't have one already.
>   */
>  ArrayBlockingQueue bq = new 
> ArrayBlockingQueue(SEND_CAPACITY);
>  ArrayBlockingQueue bqExisting = 
> queueSendMap.putIfAbsent(sid, bq);
>  if (bqExisting != null) {
>  addToSendQueue(bqExisting, b);
>  } else {
>  addToSendQueue(bq, b);
>  }
>  
>  // This may block!!!
>  connectOne(sid);
> 
> }
> }
> {code}
> Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
> epoch ack, but in fact the ofs_zk1 does not receive the notification(which 
> says the leader is ofs_zk3) because the ofs_zk3 has not sent the 
> notification(which may still exist in the sendqueue of WorkerSender). At 
> last, the potential leader ofs_zk3 fails to receive the epoch ack in timeout, 
> so it quits the leader and begins a new election. 
> The log files of ofs_zk1 and ofs_zk3 are attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2017-11-04 Thread Jiafu Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238828#comment-16238828
 ] 

Jiafu Jiang commented on ZOOKEEPER-2930:


I suggest that there can be more that one WorkSender in FastLeaderElection, so 
that network failure of some zk servers will not affect the notifications to 
others.

> Leader cannot be elected due to network timeout of some members.
> 
>
> Key: ZOOKEEPER-2930
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.4.10
> Environment: Java 8
> ZooKeeper 3.4.11(from github)
> Centos6.5
>Reporter: Jiafu Jiang
>Priority: Major
> Attachments: zoo.cfg, zookeeper1.log, zookeeper2.log
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:20.10.11.101, 30.10.11.101
> ofs_zk2:20.10.11.102, 30.10.11.102
> ofs_zk3:20.10.11.103, 30.10.11.103
> I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.
> It is supposed that the new Leader should be elected in some seconds, but the 
> fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
> them can become the new Leader.
> I change the log level to DEBUG (the default is INFO), and restart zookeeper 
> servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.
> I read the log and the ZooKeeper source code, and I think I find the reason.
> When the potential leader(says ofs_zk3) begins the 
> election(FastLeaderElection.lookForLeader()), it will send notifications to 
> all the servers. 
> When it fails to receive any notification during a timeout, it will resend 
> the notifications, and double the timeout. This process will repeat until any 
> notification is received or the timeout reaches a max value.
> The FastLeaderElection.sendNotifications() just put the notification message 
> into a queue and return. The WorkerSender is responsable to send the 
> notifications.
> The WorkerSender just process the notifications one by one by passing the 
> notifications to QuorumCnxManager. Here comes the problem, the 
> QuorumCnxManager.toSend() blocks for a long time when the notification is 
> send to ofs_zk2(whose network is down) and some notifications (which belongs 
> to ofs_zk1) will thus be blocked for a long time. The repeated notifications 
> by FastLeaderElection.sendNotifications() just make things worse.
> Here is the related source code:
> {code:java}
> public void toSend(Long sid, ByteBuffer b) {
> /*
>  * If sending message to myself, then simply enqueue it (loopback).
>  */
> if (this.mySid == sid) {
>  b.position(0);
>  addToRecvQueue(new Message(b.duplicate(), sid));
> /*
>  * Otherwise send to the corresponding thread to send.
>  */
> } else {
>  /*
>   * Start a new connection if doesn't have one already.
>   */
>  ArrayBlockingQueue bq = new 
> ArrayBlockingQueue(SEND_CAPACITY);
>  ArrayBlockingQueue bqExisting = 
> queueSendMap.putIfAbsent(sid, bq);
>  if (bqExisting != null) {
>  addToSendQueue(bqExisting, b);
>  } else {
>  addToSendQueue(bq, b);
>  }
>  
>  // This may block!!!
>  connectOne(sid);
> 
> }
> }
> {code}
> Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
> epoch ack, but in fact the ofs_zk1 does not receive the notification(which 
> says the leader is ofs_zk3) because the ofs_zk3 has not sent the 
> notification(which may still exist in the sendqueue of WorkerSender). At 
> last, the potential leader ofs_zk3 fails to receive the epoch ack in timeout, 
> so it quits the leader and begins a new election. 
> The log files of ofs_zk1 and ofs_zk3 are attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2017-11-03 Thread Jiafu Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2930:
---
Description: 
I deploy a cluster of ZooKeeper with three nodes:

ofs_zk1:20.10.11.101, 30.10.11.101
ofs_zk2:20.10.11.102, 30.10.11.102
ofs_zk3:20.10.11.103, 30.10.11.103

I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.

It is supposed that the new Leader should be elected in some seconds, but the 
fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
them can become the new Leader.

I change the log level to DEBUG (the default is INFO), and restart zookeeper 
servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.

I read the log and the ZooKeeper source code, and I think I find the reason.

When the potential leader(says ofs_zk3) begins the 
election(FastLeaderElection.lookForLeader()), it will send notifications to all 
the servers. 
When it fails to receive any notification during a timeout, it will resend the 
notifications, and double the timeout. This process will repeat until any 
notification is received or the timeout reaches a max value.
The FastLeaderElection.sendNotifications() just put the notification message 
into a queue and return. The WorkerSender is responsable to send the 
notifications.

The WorkerSender just process the notifications one by one by passing the 
notifications to QuorumCnxManager. Here comes the problem, the 
QuorumCnxManager.toSend() blocks for a long time when the notification is send 
to ofs_zk2(whose network is down) and some notifications (which belongs to 
ofs_zk1) will thus be blocked for a long time. The repeated notifications by 
FastLeaderElection.sendNotifications() just make things worse.

Here is the related source code:

{code:java}
public void toSend(Long sid, ByteBuffer b) {
/*
 * If sending message to myself, then simply enqueue it (loopback).
 */
if (this.mySid == sid) {
 b.position(0);
 addToRecvQueue(new Message(b.duplicate(), sid));
/*
 * Otherwise send to the corresponding thread to send.
 */
} else {
 /*
  * Start a new connection if doesn't have one already.
  */
 ArrayBlockingQueue bq = new 
ArrayBlockingQueue(SEND_CAPACITY);
 ArrayBlockingQueue bqExisting = 
queueSendMap.putIfAbsent(sid, bq);
 if (bqExisting != null) {
 addToSendQueue(bqExisting, b);
 } else {
 addToSendQueue(bq, b);
 }
 
 // This may block!!!
 connectOne(sid);

}
}
{code}

Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
epoch ack, but in fact the ofs_zk1 does not receive the notification(which says 
the leader is ofs_zk3) because the ofs_zk3 has not sent the notification(which 
may still exist in the sendqueue of WorkerSender). At last, the potential 
leader ofs_zk3 fails to receive the epoch ack in timeout, so it quits the 
leader and begins a new election. 

The log files of ofs_zk1 and ofs_zk3 are attached.

  was:
I deploy a cluster of ZooKeeper with three nodes:

ofs_zk1:20.10.11.101, 30.10.11.101
ofs_zk2:20.10.11.102, 30.10.11.102
ofs_zk3:20.10.11.103, 30.10.11.103

I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.

It is supposed that the new Leader should be elected in some seconds, but the 
fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
them can become the new Leader.

I change the log level to DEBUG (the default is INFO), and restart zookeeper 
servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.

I read the log and the ZooKeeper source code, and I think I find the reason.

When the potential leader(says ofs_zk3) begins the 
election(FastLeaderElection.lookForLeader()), it will send notifications to all 
the servers. 
When it fails to receive any notification during a timeout, it will resend the 
notifications, and double the timeout. This process will repeat until any 
notification is received or the timeout reaches a max value.
The FastLeaderElection.sendNotifications() just put the notification message 
into a queue and return. The WorkerSender is responsable to send the 
notifications.

The WorkerSender just process the notifications one by one by passing the 
notifications to QuorumCnxManager. Here comes the problem, the 
QuorumCnxManager.toSend() blocks for a long time when the notification is send 
to ofs_zk2(whose network is down) and some notifications (which belongs to 
ofs_zk1) will thus be blocked for a long time. The repeated notifications by 
FastLeaderElection.sendNotifications() just make things worse.

Here is the related source code:

{code:java}
public void toSend(Long sid, 

[jira] [Updated] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2017-11-03 Thread Jiafu Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2930:
---
Description: 
I deploy a cluster of ZooKeeper with three nodes:

ofs_zk1:20.10.11.101, 30.10.11.101
ofs_zk2:20.10.11.102, 30.10.11.102
ofs_zk3:20.10.11.103, 30.10.11.103

I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.

It is supposed that the new Leader should be elected in some seconds, but the 
fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
them can become the new Leader.

I change the log level to DEBUG (the default is INFO), and restart zookeeper 
servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.

I read the log and the ZooKeeper source code, and I think I find the reason.

When the potential leader(says ofs_zk3) begins the 
election(FastLeaderElection.lookForLeader()), it will send notifications to all 
the servers. 
When it fails to receive any notification during a timeout, it will resend the 
notifications, and double the timeout. This process will repeat until any 
notification is received or the timeout reaches a max value.
The FastLeaderElection.sendNotifications() just put the notification message 
into a queue and return. The WorkerSender is responsable to send the 
notifications.

The WorkerSender just process the notifications one by one by passing the 
notifications to QuorumCnxManager. Here comes the problem, the 
QuorumCnxManager.toSend() blocks for a long time when the notification is send 
to ofs_zk2(whose network is down) and some notifications (which belongs to 
ofs_zk1) will thus be blocked for a long time. The repeated notifications by 
FastLeaderElection.sendNotifications() just make things worse.

Here is the related source code:

{code:java}
public void toSend(Long sid, ByteBuffer b) {
/*
 * If sending message to myself, then simply enqueue it (loopback).
 */
if (this.mySid == sid) {
 b.position(0);
 addToRecvQueue(new Message(b.duplicate(), sid));
/*
 * Otherwise send to the corresponding thread to send.
 */
} else {
 /*
  * Start a new connection if doesn't have one already.
  */
 ArrayBlockingQueue bq = new 
ArrayBlockingQueue(SEND_CAPACITY);
 ArrayBlockingQueue bqExisting = 
queueSendMap.putIfAbsent(sid, bq);
 if (bqExisting != null) {
 addToSendQueue(bqExisting, b);
 } else {
 addToSendQueue(bq, b);
 }
 
 // This may block!!!
 connectOne(sid);

}
}
{code}

Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
epoch ack, but in fact the ofs_zk1 does not receive the notification(which says 
the leader is ofs_zk3) because the ofs_zk3 has not send the notification(the 
notification may still in the sendqueue of WorkerSender). At last, the 
potential leader ofs_zk3 fails to receive the epoch ack in timeout, so it quit 
the leader and begins a new election. 

The log files of ofs_zk1 and ofs_zk3 are attached.

  was:
I deploy a cluster of ZooKeeper with three nodes:

ofs_zk1:20.10.11.101, 30.10.11.101
ofs_zk2:20.10.11.102, 30.10.11.102
ofs_zk3:20.10.11.103, 30.10.11.103

I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.

It is supposed that the new Leader should be elected in some seconds, but the 
fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
them can become the new Leader.

I change the log level to DEBUG (the default is INFO), and restart zookeeper 
servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.

I read the log and the ZooKeeper source code, and I think I find the reason.

When the potential leader(says ofs_zk3) begins the 
election(FastLeaderElection.lookForLeader()), it will send notifications to all 
the servers. 
When it fails to receive any notification during a timeout, it will resend the 
notifications, and double the timeout. This process will repeat until any 
notification is received or the timeout reaches a max value.
The FastLeaderElection.sendNotifications() just put the notification message 
into a queue and return. The WorkerSender is responsable to send the 
notifications.

The WorkerSender just process the notifications one by one by passing the 
notifications to QuorumCnxManager. Here comes the problem, the 
QuorumCnxManager.toSend() blocks for a long time when the notification is send 
to ofs_zk2(whose network is down) and some notifications (which belongs to 
ofs_zk1) will thus be blocked for a long time. The repeated notifications by 
FastLeaderElection.sendNotifications() just make things worse.

Here is the related source code:

{code:java}
public void toSend(Long 

[jira] [Updated] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2017-11-03 Thread Jiafu Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2930:
---
Description: 
I deploy a cluster of ZooKeeper with three nodes:

ofs_zk1:20.10.11.101, 30.10.11.101
ofs_zk2:20.10.11.102, 30.10.11.102
ofs_zk3:20.10.11.103, 30.10.11.103

I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.

It is supposed that the new Leader should be elected in some seconds, but the 
fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
them can become the new Leader.

I change the log level to DEBUG (the default is INFO), and restart zookeeper 
servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.

I read the log and the ZooKeeper source code, and I think I find the reason.

When the potential leader(says ofs_zk3) begins the 
election(FastLeaderElection.lookForLeader()), it will send notifications to all 
the servers. 
When it fails to receive any notification during a timeout, it will resend the 
notifications, and double the timeout. This process will repeat until any 
notification is received or the timeout reaches a max value.
The FastLeaderElection.sendNotifications() just put the notification message 
into a queue and return. The WorkerSender is responsable to send the 
notifications.

The WorkerSender just process the notifications one by one by passing the 
notifications to QuorumCnxManager. Here comes the problem, the 
QuorumCnxManager.toSend() blocks for a long time when the notification is send 
to ofs_zk2(whose network is down) and some notifications (which belongs to 
ofs_zk1) will thus be blocked for a long time. The repeated notifications by 
FastLeaderElection.sendNotifications() just make things worse.

Here is the related source code:

{code:java}
public void toSend(Long sid, ByteBuffer b) {
/*
 * If sending message to myself, then simply enqueue it (loopback).
 */
if (this.mySid == sid) {
 b.position(0);
 addToRecvQueue(new Message(b.duplicate(), sid));
/*
 * Otherwise send to the corresponding thread to send.
 */
} else {
 /*
  * Start a new connection if doesn't have one already.
  */
 ArrayBlockingQueue bq = new 
ArrayBlockingQueue(SEND_CAPACITY);
 ArrayBlockingQueue bqExisting = 
queueSendMap.putIfAbsent(sid, bq);
 if (bqExisting != null) {
 addToSendQueue(bqExisting, b);
 } else {
 addToSendQueue(bq, b);
 }
 
 // This may block!!!
 connectOne(sid);

}
}
{code}

Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
epoch ack, but in fact the ofs_zk1 does not receive the notification(which says 
the leader is ofs_zk1) because the ofs_zk3 have not send the notification(the 
notification may still in the sendqueue of WorkerSender). At last, the 
potential leader ofs_zk3 fails to receive the epoch ack in timeout, so it quit 
the leader and begins a new election. 

The log files of ofs_zk1 and ofs_zk3 are attached.

  was:
I deploy a cluster of ZooKeeper with three nodes:

ofs_zk1:20.10.11.101, 30.10.11.101
ofs_zk2:20.10.11.102, 30.10.11.102
ofs_zk3:20.10.11.103, 30.10.11.103

I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.

It is supposed that the new Leader should be elected in some seconds, but the 
fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
them can become the new Leader.

I change the log level to DEBUG (the default is INFO), and restart zookeeper 
servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.

I read the log and the ZooKeeper source code, and I think I find the reason.

When the potential leader(says ofs_zk3) begins the 
election(FastLeaderElection.lookForLeader()), it sends notifications to all the 
servers. 
When it fails to receive any notification during a timeout, it will resend the 
notifications, and double the timeout. This process will repeat until any 
notification is received or the timeout reaches a max value.
The FastLeaderElection.sendNotifications() just put the notification message 
into a queue and return. The WorkerSender is responsable to send the 
notifications.

The WorkerSender just process the notifications one bye one by passing the 
notifications to QuorumCnxManager. Here comes the problem, the 
QuorumCnxManager.toSend() blocks for a long time when the notification is send 
to ofs_zk2(whose network is down) and some notifications (which belongs to 
ofs_zk1) will thus be blocked for a long time. The repeated notifications by 
FastLeaderElection.sendNotifications() just make things worse.

Here is the related source code:

{code:java}
public void toSend(Long sid, 

[jira] [Updated] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2017-11-03 Thread Jiafu Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2930:
---
Summary: Leader cannot be elected due to network timeout of some members.  
(was: Leader cannot be elected due to network timeout of some member.)

> Leader cannot be elected due to network timeout of some members.
> 
>
> Key: ZOOKEEPER-2930
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.4.10
> Environment: Java 8
> ZooKeeper 3.4.11(from github)
> Centos6.5
>Reporter: Jiafu Jiang
>Priority: Major
> Attachments: zoo.cfg, zookeeper1.log, zookeeper2.log
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:20.10.11.101, 30.10.11.101
> ofs_zk2:20.10.11.102, 30.10.11.102
> ofs_zk3:20.10.11.103, 30.10.11.103
> I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.
> It is supposed that the new Leader should be elected in some seconds, but the 
> fact is, ofs_zk1 and ofs_zk2 just keep electing again and again, but none of 
> them can become the new Leader.
> I change the log level to DEBUG (the default is INFO), and restart zookeeper 
> servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.
> I read the log and the ZooKeeper source code, and I think I find the reason.
> When the potential leader(says ofs_zk3) begins the 
> election(FastLeaderElection.lookForLeader()), it sends notifications to all 
> the servers. 
> When it fails to receive any notification during a timeout, it will resend 
> the notifications, and double the timeout. This process will repeat until any 
> notification is received or the timeout reaches a max value.
> The FastLeaderElection.sendNotifications() just put the notification message 
> into a queue and return. The WorkerSender is responsable to send the 
> notifications.
> The WorkerSender just process the notifications one bye one by passing the 
> notifications to QuorumCnxManager. Here comes the problem, the 
> QuorumCnxManager.toSend() blocks for a long time when the notification is 
> send to ofs_zk2(whose network is down) and some notifications (which belongs 
> to ofs_zk1) will thus be blocked for a long time. The repeated notifications 
> by FastLeaderElection.sendNotifications() just make things worse.
> Here is the related source code:
> {code:java}
> public void toSend(Long sid, ByteBuffer b) {
> /*
>  * If sending message to myself, then simply enqueue it (loopback).
>  */
> if (this.mySid == sid) {
>  b.position(0);
>  addToRecvQueue(new Message(b.duplicate(), sid));
> /*
>  * Otherwise send to the corresponding thread to send.
>  */
> } else {
>  /*
>   * Start a new connection if doesn't have one already.
>   */
>  ArrayBlockingQueue bq = new 
> ArrayBlockingQueue(SEND_CAPACITY);
>  ArrayBlockingQueue bqExisting = 
> queueSendMap.putIfAbsent(sid, bq);
>  if (bqExisting != null) {
>  addToSendQueue(bqExisting, b);
>  } else {
>  addToSendQueue(bq, b);
>  }
>  
>  // This may block!!!
>  connectOne(sid);
> 
> }
> }
> {code}
> Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
> epoch ack, but in fact the ofs_zk1 does not receive the notification(which 
> says the leader is ofs_zk1) because the ofs_zk3 have not send the 
> notification(the notification may still in the sendqueue of WorkerSender). At 
> last, the potential leader ofs_zk3 fails to receive the epoch ack in timeout, 
> so it quit the leader and begins a new election. 
> The log files of ofs_zk1 and ofs_zk2 are attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2017-11-03 Thread Jiafu Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2930:
---
Description: 
I deploy a cluster of ZooKeeper with three nodes:

ofs_zk1:20.10.11.101, 30.10.11.101
ofs_zk2:20.10.11.102, 30.10.11.102
ofs_zk3:20.10.11.103, 30.10.11.103

I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.

It is supposed that the new Leader should be elected in some seconds, but the 
fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
them can become the new Leader.

I change the log level to DEBUG (the default is INFO), and restart zookeeper 
servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.

I read the log and the ZooKeeper source code, and I think I find the reason.

When the potential leader(says ofs_zk3) begins the 
election(FastLeaderElection.lookForLeader()), it sends notifications to all the 
servers. 
When it fails to receive any notification during a timeout, it will resend the 
notifications, and double the timeout. This process will repeat until any 
notification is received or the timeout reaches a max value.
The FastLeaderElection.sendNotifications() just put the notification message 
into a queue and return. The WorkerSender is responsable to send the 
notifications.

The WorkerSender just process the notifications one bye one by passing the 
notifications to QuorumCnxManager. Here comes the problem, the 
QuorumCnxManager.toSend() blocks for a long time when the notification is send 
to ofs_zk2(whose network is down) and some notifications (which belongs to 
ofs_zk1) will thus be blocked for a long time. The repeated notifications by 
FastLeaderElection.sendNotifications() just make things worse.

Here is the related source code:

{code:java}
public void toSend(Long sid, ByteBuffer b) {
/*
 * If sending message to myself, then simply enqueue it (loopback).
 */
if (this.mySid == sid) {
 b.position(0);
 addToRecvQueue(new Message(b.duplicate(), sid));
/*
 * Otherwise send to the corresponding thread to send.
 */
} else {
 /*
  * Start a new connection if doesn't have one already.
  */
 ArrayBlockingQueue bq = new 
ArrayBlockingQueue(SEND_CAPACITY);
 ArrayBlockingQueue bqExisting = 
queueSendMap.putIfAbsent(sid, bq);
 if (bqExisting != null) {
 addToSendQueue(bqExisting, b);
 } else {
 addToSendQueue(bq, b);
 }
 
 // This may block!!!
 connectOne(sid);

}
}
{code}

Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
epoch ack, but in fact the ofs_zk1 does not receive the notification(which says 
the leader is ofs_zk1) because the ofs_zk3 have not send the notification(the 
notification may still in the sendqueue of WorkerSender). At last, the 
potential leader ofs_zk3 fails to receive the epoch ack in timeout, so it quit 
the leader and begins a new election. 

The log files of ofs_zk1 and ofs_zk3 are attached.

  was:
I deploy a cluster of ZooKeeper with three nodes:

ofs_zk1:20.10.11.101, 30.10.11.101
ofs_zk2:20.10.11.102, 30.10.11.102
ofs_zk3:20.10.11.103, 30.10.11.103

I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.

It is supposed that the new Leader should be elected in some seconds, but the 
fact is, ofs_zk1 and ofs_zk2 just keep electing again and again, but none of 
them can become the new Leader.

I change the log level to DEBUG (the default is INFO), and restart zookeeper 
servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.

I read the log and the ZooKeeper source code, and I think I find the reason.

When the potential leader(says ofs_zk3) begins the 
election(FastLeaderElection.lookForLeader()), it sends notifications to all the 
servers. 
When it fails to receive any notification during a timeout, it will resend the 
notifications, and double the timeout. This process will repeat until any 
notification is received or the timeout reaches a max value.
The FastLeaderElection.sendNotifications() just put the notification message 
into a queue and return. The WorkerSender is responsable to send the 
notifications.

The WorkerSender just process the notifications one bye one by passing the 
notifications to QuorumCnxManager. Here comes the problem, the 
QuorumCnxManager.toSend() blocks for a long time when the notification is send 
to ofs_zk2(whose network is down) and some notifications (which belongs to 
ofs_zk1) will thus be blocked for a long time. The repeated notifications by 
FastLeaderElection.sendNotifications() just make things worse.

Here is the related source code:

{code:java}
public void toSend(Long sid, 

[jira] [Updated] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some member.

2017-11-03 Thread Jiafu Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-2930:
---
Attachment: zookeeper1.log
zookeeper2.log
zoo.cfg

zookeeper1.log : ofs_zk1
zookeeper2.log : ofs_zk3

> Leader cannot be elected due to network timeout of some member.
> ---
>
> Key: ZOOKEEPER-2930
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.4.10
> Environment: Java 8
> ZooKeeper 3.4.11(from github)
> Centos6.5
>Reporter: Jiafu Jiang
>Priority: Major
> Attachments: zoo.cfg, zookeeper1.log, zookeeper2.log
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:20.10.11.101, 30.10.11.101
> ofs_zk2:20.10.11.102, 30.10.11.102
> ofs_zk3:20.10.11.103, 30.10.11.103
> I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.
> It is supposed that the new Leader should be elected in some seconds, but the 
> fact is, ofs_zk1 and ofs_zk2 just keep electing again and again, but none of 
> them can become the new Leader.
> I change the log level to DEBUG (the default is INFO), and restart zookeeper 
> servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.
> I read the log and the ZooKeeper source code, and I think I find the reason.
> When the potential leader(says ofs_zk3) begins the 
> election(FastLeaderElection.lookForLeader()), it sends notifications to all 
> the servers. 
> When it fails to receive any notification during a timeout, it will resend 
> the notifications, and double the timeout. This process will repeat until any 
> notification is received or the timeout reaches a max value.
> The FastLeaderElection.sendNotifications() just put the notification message 
> into a queue and return. The WorkerSender is responsable to send the 
> notifications.
> The WorkerSender just process the notifications one bye one by passing the 
> notifications to QuorumCnxManager. Here comes the problem, the 
> QuorumCnxManager.toSend() blocks for a long time when the notification is 
> send to ofs_zk2(whose network is down) and some notifications (which belongs 
> to ofs_zk1) will thus be blocked for a long time. The repeated notifications 
> by FastLeaderElection.sendNotifications() just make things worse.
> Here is the related source code:
> {code:java}
> public void toSend(Long sid, ByteBuffer b) {
> /*
>  * If sending message to myself, then simply enqueue it (loopback).
>  */
> if (this.mySid == sid) {
>  b.position(0);
>  addToRecvQueue(new Message(b.duplicate(), sid));
> /*
>  * Otherwise send to the corresponding thread to send.
>  */
> } else {
>  /*
>   * Start a new connection if doesn't have one already.
>   */
>  ArrayBlockingQueue bq = new 
> ArrayBlockingQueue(SEND_CAPACITY);
>  ArrayBlockingQueue bqExisting = 
> queueSendMap.putIfAbsent(sid, bq);
>  if (bqExisting != null) {
>  addToSendQueue(bqExisting, b);
>  } else {
>  addToSendQueue(bq, b);
>  }
>  
>  // This may block!!!
>  connectOne(sid);
> 
> }
> }
> {code}
> Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
> epoch ack, but in fact the ofs_zk1 does not receive the notification(which 
> says the leader is ofs_zk1) because the ofs_zk3 have not send the 
> notification(the notification may still in the sendqueue of WorkerSender). At 
> last, the potential leader ofs_zk3 fails to receive the epoch ack in timeout, 
> so it quit the leader and begins a new election. 
> The log files of ofs_zk1 and ofs_zk2 are attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some member.

2017-11-03 Thread Jiafu Jiang (JIRA)
Jiafu Jiang created ZOOKEEPER-2930:
--

 Summary: Leader cannot be elected due to network timeout of some 
member.
 Key: ZOOKEEPER-2930
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.10
 Environment: Java 8
ZooKeeper 3.4.11(from github)
Centos6.5
Reporter: Jiafu Jiang
Priority: Major


I deploy a cluster of ZooKeeper with three nodes:

ofs_zk1:20.10.11.101, 30.10.11.101
ofs_zk2:20.10.11.102, 30.10.11.102
ofs_zk3:20.10.11.103, 30.10.11.103

I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.

It is supposed that the new Leader should be elected in some seconds, but the 
fact is, ofs_zk1 and ofs_zk2 just keep electing again and again, but none of 
them can become the new Leader.

I change the log level to DEBUG (the default is INFO), and restart zookeeper 
servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.

I read the log and the ZooKeeper source code, and I think I find the reason.

When the potential leader(says ofs_zk3) begins the 
election(FastLeaderElection.lookForLeader()), it sends notifications to all the 
servers. 
When it fails to receive any notification during a timeout, it will resend the 
notifications, and double the timeout. This process will repeat until any 
notification is received or the timeout reaches a max value.
The FastLeaderElection.sendNotifications() just put the notification message 
into a queue and return. The WorkerSender is responsable to send the 
notifications.

The WorkerSender just process the notifications one bye one by passing the 
notifications to QuorumCnxManager. Here comes the problem, the 
QuorumCnxManager.toSend() blocks for a long time when the notification is send 
to ofs_zk2(whose network is down) and some notifications (which belongs to 
ofs_zk1) will thus be blocked for a long time. The repeated notifications by 
FastLeaderElection.sendNotifications() just make things worse.

Here is the related source code:

{code:java}
public void toSend(Long sid, ByteBuffer b) {
/*
 * If sending message to myself, then simply enqueue it (loopback).
 */
if (this.mySid == sid) {
 b.position(0);
 addToRecvQueue(new Message(b.duplicate(), sid));
/*
 * Otherwise send to the corresponding thread to send.
 */
} else {
 /*
  * Start a new connection if doesn't have one already.
  */
 ArrayBlockingQueue bq = new 
ArrayBlockingQueue(SEND_CAPACITY);
 ArrayBlockingQueue bqExisting = 
queueSendMap.putIfAbsent(sid, bq);
 if (bqExisting != null) {
 addToSendQueue(bqExisting, b);
 } else {
 addToSendQueue(bq, b);
 }
 
 // This may block!!!
 connectOne(sid);

}
}
{code}

Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
epoch ack, but in fact the ofs_zk1 does not receive the notification(which says 
the leader is ofs_zk1) because the ofs_zk3 have not send the notification(the 
notification may still in the sendqueue of WorkerSender). At last, the 
potential leader ofs_zk3 fails to receive the epoch ack in timeout, so it quit 
the leader and begins a new election. 

The log files of ofs_zk1 and ofs_zk2 are attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZOOKEEPER-2923) The comment of the variable matchSyncs in class CommitProcessor has a mistake.

2017-10-22 Thread Jiafu Jiang (JIRA)
Jiafu Jiang created ZOOKEEPER-2923:
--

 Summary: The comment of the variable matchSyncs in class 
CommitProcessor has a mistake.
 Key: ZOOKEEPER-2923
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2923
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.5.3, 3.4.10
Reporter: Jiafu Jiang
Priority: Minor


The comment of the  variable matchSyncs in class CommitProcessor says:


{code:java}
/**
 * This flag indicates whether we need to wait for a response to come back 
from the
 * leader or we just let the sync operation flow through like a read. The 
flag will
 * be true if the CommitProcessor is in a Leader pipeline.
 */
boolean matchSyncs;
{code}

I search the source code and find that matchSyncs will be false if  the 
CommitProcessor is in a Leader pipeline, and it will be true if the 
CommitProcessor  is in a Follower pipeline.
Therefore I think the comment should be modified to match the code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-1626) Zookeeper C client should be tolerant of clock adjustments

2017-08-28 Thread Jiafu Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143671#comment-16143671
 ] 

Jiafu Jiang commented on ZOOKEEPER-1626:


Is this PB be fixed in 3.4.X version?

> Zookeeper C client should be tolerant of clock adjustments 
> ---
>
> Key: ZOOKEEPER-1626
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1626
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: c client
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
> Fix For: 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1366.001.patch, ZOOKEEPER-1366.002.patch, 
> ZOOKEEPER-1366.003.patch, ZOOKEEPER-1366.004.patch, ZOOKEEPER-1366.006.patch, 
> ZOOKEEPER-1366.007.patch, ZOOKEEPER-1626.patch
>
>
> The Zookeeper C client should use monotonic time when available, in order to 
> be more tolerant of time adjustments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2802) Zookeeper C client hang @wait_sync_completion

2017-08-28 Thread Jiafu Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143651#comment-16143651
 ] 

Jiafu Jiang commented on ZOOKEEPER-2802:


[~yihao]I have the same PB, have you find the solution?

> Zookeeper C client hang @wait_sync_completion
> -
>
> Key: ZOOKEEPER-2802
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2802
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.6
> Environment: DISTRIB_DESCRIPTION="Ubuntu 14.04.2 LTS"
>Reporter: yihao yang
>Priority: Critical
> Attachments: zookeeper.out.2017.05.31-10.06.23
>
>
> I was using zookeeper 3.4.6 c client to access one zookeeper server in a VM. 
> The VM environment is not stable and I get a lot of EXPIRED_SESSION_STATE 
> events. I will create another session to ZK when I get an expired event. I 
> also have a read/write lock to protect session read (get/list/... on zk) and 
> write(connect, close, reconnect zhandle).
> The problem is the session got an EXPIRED_SESSION_STATE event and when it 
> tried to hold the write lock and  reconnect the session, it found there is a 
> thread was holding the read lock (which was operating sync list on zk). See 
> the stack below:
> GDBStack:
> Thread 7 (Thread 0x7f838a43a700 (LWP 62845)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1 0x00636033 in  wait_sync_completion (sc=sc@entry=0x7f8344000af0) 
> at src/mt_adaptor.c:85
> #2 0x00633248 in zoo_wget_children2_ (zh=, 
> path=0x7f83440677a8 "/dict/objects/__services/RLS-GSE/_static_nodes", 
> watcher=0x0, watcherCtx=0x13e6310, strings=0x7f838a4397b0, 
> stat=0x7f838a4398d0) at src/zookeeper.c:3630
> #3 0x0045e6ff in ZooKeeperContext::getChildren (this=0x13e6310, 
> path=..., children=children@entry=0x7f838a439890, 
> stat=stat@entry=0x7f838a4398d0) at zookeeper_context.cpp:xxx
> This sync list didn't return a ZINVALIDSTAT but hung. Anyone know the problem?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)