[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729312#comment-16729312
 ] 

Jiafu Jiang commented on ZOOKEEPER-3220:


[~nixon] Thanks very much!

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729308#comment-16729308
 ] 

Jiafu Jiang commented on ZOOKEEPER-3220:


[~maoling]

 

why this situation happend? The disk is full? 

No, but the machine restarted.

Do you see some logs about *FileTxnSnapLog#save* at that time?

No any error log, if fact, during the machine reboot, some log of the follower 
was missing. But from the log of the leader, the follower had received a 
snapshot and began to received other transaction logs, so the  
*FileTxnSnapLog#save of follower must have succeed, but the data is not in 
disk!*

 

*2.Even if this situation that the size of snapshot is 0 could not cause data 
inconsistency.*

Yes, I know. Zookeeper recover it's data from both logs and snapshot.

If a ZooKeeper follower believes a snapshot is saved, it believes that the data 
in the snapshot is all in the disk(but in fact it may be not), it will begin to 
receive logs that come after the snapshot. If the snapshot is invalid, 
ZooKeeper server will recover data from logs only, but some data is missing, 
because the data is only saved in the snapshot.

 

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729134#comment-16729134
 ] 

Brian Nixon commented on ZOOKEEPER-3220:


I believe ZOOKEEPER-2872 addressed the fsyncing part of this issue and 
ZOOKEEPER-3082 added some nice cleanup around 0 size snapshot file. Neither of 
these changes were backported to 3.4 so that suggests one potential path 
forward. Note that backporting ZOOKEEPER-2872 also requires backporting 
ZOOKEEPER-2870.

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread maoling (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729095#comment-16729095
 ] 

maoling commented on ZOOKEEPER-3220:


[~jiangjiafu]

--->"*In my environment, the save method returned successfully, that means no 
exception had been thrown. But, the data was not in disk! That's the problem I 
want to report!*"

1.why this situation happend? The disk is full? 
 snapshot does not call *fsync* may be the answer.
 Do you see some logs about *FileTxnSnapLog#save* at that time?
2.Even if this situation that the size of snapshot is 0 could not cause data 
inconsistency.
 because when ZooKeeper server restarted again,the invalid snapshots will be 
skiped,if no any invalid snapshot,
 the leader can do *SNAP* to sync with the follower

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-24 Thread Jiafu Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728572#comment-16728572
 ] 

Jiafu Jiang commented on ZOOKEEPER-3220:


In my environment, the save method returned successfully, that means no 
exception had been thrown. But, the data was not in disk! That's the problem I 
want to report!

 

And yes, the snapshot with size 0 was invalid, and was skip when ZooKeeper 
server restarted again.

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-19 Thread maoling (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725617#comment-16725617
 ] 

maoling commented on ZOOKEEPER-3220:


[~jiangjiafu]
FileTxnSnapLog#save

{code:java}
/**
 * save the datatree and the sessions into a snapshot
 * @param dataTree the datatree to be serialized onto disk
 * @param sessionsWithTimeouts the session timeouts to be
 * serialized onto disk
 * @param syncSnap sync the snapshot immediately after write
 * @throws IOException
 */
public void save(DataTree dataTree,
 ConcurrentHashMap sessionsWithTimeouts,
 boolean syncSnap)
throws IOException {
long lastZxid = dataTree.lastProcessedZxid;
File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid));
LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid),
snapshotFile);
try {
snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, 
syncSnap);
} catch (IOException e) {
if (snapshotFile.length() == 0) {
/* This may be caused by a full disk. In such a case, the server
 * will get stuck in a loop where it tries to write a snapshot
 * out to disk, and ends up creating an empty file instead.
 * Doing so will eventually result in valid snapshots being
 * removed during cleanup. */
if (snapshotFile.delete()) {
LOG.info("Deleted empty snapshot file: " +
 snapshotFile.getAbsolutePath());
} else {
LOG.warn("Could not delete empty snapshot file: " +
 snapshotFile.getAbsolutePath());
}
} else {
/* Something else went wrong when writing the snapshot out to
 * disk. If this snapshot file is invalid, when restarting,
 * ZooKeeper will skip it, and find the last known good snapshot
 * instead. */
}
throw e;
}
}
{code}


> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)