[jira] [Updated] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-18 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3220:
---
Description: 
We known that ZooKeeper server will call fsync to make sure that log data has 
been successfully saved to disk. But ZooKeeper server does not call fsync to 
make sure that a snapshot has been successfully saved, which may cause 
potential problems. Since a close to a file description does not make sure that 
data is written to disk, see 
[http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.

 

If the snapshot is not successfully  saved to disk, it may lead to data 
inconsistency. Here is my example, which is also a real problem I have ever met.

1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the leader.

2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.

3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
saved to log files of both zk2(leader) and zk3(follower).

4. After zk1 restarted successfully, it found itself to be a follower, and it 
began to synchronize data with the leader. The leader sent a snapshot(records 
from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
method returned, the snapshot data was not saved to disk yet. In fact the 
snapshot file was created, but the size was 0.

5. zk1 finished the synchronization and began to accept new requests from the 
leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  saved 
to log file. With fsync zk1 could make sure log data was not lost.

6. zk1 restarted again. Since the snapshot's size was 0, it would not be used, 
therefore zk1 recovered using the log files. But the records from log(X+1) ~ 
logY were lost ! 

 

Sorry for my poor English.

 

  was:
We known that ZooKeeper server will call fsync to make sure that log data has 
been successfully saved to disk. But ZooKeeper server does not call fsync to 
make sure that a snapshot has been successfully saved, which may cause 
potential problems. Since a close to a file description does not make sure that 
data is written to disk, see 
[http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.

 

If the snapshot is not successfully  saved to disk, it may lead to data 
inconsistency. Here is my example, which is also a real problem I have ever met.

1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the leader.

2. Both zk1 and zk2 had the log records from log1~logX, X is the zxid.

3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
saved to log files of both zk2(leader) and zk3(follower).

4. After zk1 restarted successfully, it found itself to be a follower, and it 
began to synchronize log with the leader. The leader sent a snapshot(records 
from log 1 ~ log Y) to zk1, zk1 saved the snapshot to local disk by calling the 
method ZooKeeperServer.takeSnapshot. But unfortunately, when the method 
returned, the snapshot data was not saved to disk yet. If fact the snapshot 
file was created, but the size was 0.

5. zk1 finished the synchronization and began to accept new request from the 
leader. Say log(Y + 1) ~ log Z was accepted by zk1 and  saved to log file. With 
fsync zk1 can make sure log data is not lost.

6. zk1 restarted again. Since the snapshot's size was 0, it would not be used, 
therefore zk1 recovered using the log files. But the records from log(X+1) ~ 
logY were lost ! 

 

Sorry for my poor English.

 


> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both 

[jira] [Updated] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-18 Thread Jiafu Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiafu Jiang updated ZOOKEEPER-3220:
---
Summary: The snapshot is not saved to disk and may cause data 
inconsistency.  (was: Snapshot is not written to disk and cause data 
inconsistency.)

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X is the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize log with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 saved the snapshot to local disk by calling 
> the method ZooKeeperServer.takeSnapshot. But unfortunately, when the method 
> returned, the snapshot data was not saved to disk yet. If fact the snapshot 
> file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new request from the 
> leader. Say log(Y + 1) ~ log Z was accepted by zk1 and  saved to log file. 
> With fsync zk1 can make sure log data is not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)