[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...
Github user enixon commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/560#discussion_r208781421 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -399,8 +399,30 @@ public void save(DataTree dataTree, File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid)); LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid), snapshotFile); -snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, syncSnap); - +try { +snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, syncSnap); +} catch (IOException e) { +if (snapshotFile.length() == 0) { --- End diff -- Created https://issues.apache.org/jira/browse/ZOOKEEPER-3115 to record this idea of deleting the snapshot file on general IOException's and we can continue the conversation there. ---
[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...
Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/560 ---
[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...
Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/560#discussion_r205659448 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -399,8 +399,30 @@ public void save(DataTree dataTree, File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid)); LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid), snapshotFile); -snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, syncSnap); - +try { +snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, syncSnap); +} catch (IOException e) { +if (snapshotFile.length() == 0) { --- End diff -- >> If that sounds reasonable, I'd propose that we commit on this patch and close the hole created from this particular issue SGTM. @breed - ok with you? If so I'll commit this. ---
[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...
Github user enixon commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/560#discussion_r203474532 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -399,8 +399,30 @@ public void save(DataTree dataTree, File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid)); LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid), snapshotFile); -snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, syncSnap); - +try { +snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, syncSnap); +} catch (IOException e) { +if (snapshotFile.length() == 0) { --- End diff -- I really like the idea! Clearly this patch is aimed at closing one very particular kind of error case around snapshot writing and the idea generalizes to many more errors. My only concern with "always delete" is that I do not know how wide a net that casts and whether we might end up deleting a legitimately useful snapshot under some circumstances. Unfortunately, I don't have time to do the testing required to validate the safety of the more general approach. If that sounds reasonable, I'd propose that we commit on this patch and close the hole created from this particular issue (snapshot problems on no disk space) and I'll open a separate JIRA with details about extending the solution so someone can pick up the issue and do the necessary testing. ---
[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...
Github user breed commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/560#discussion_r200716591 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -399,8 +399,30 @@ public void save(DataTree dataTree, File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid)); LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid), snapshotFile); -snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, syncSnap); - +try { +snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, syncSnap); +} catch (IOException e) { +if (snapshotFile.length() == 0) { --- End diff -- this seems extra safe. i'm wondering if perhaps we should always delete the snapshot file on an exception. the snapshot file probably will not be valid if an exception occurred. right? ---
[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...
GitHub user enixon opened a pull request: https://github.com/apache/zookeeper/pull/560 ZOOKEEPER-3082: Fix server snapshot behavior when out of disk space You can merge this pull request into a Git repository by running: $ git pull https://github.com/enixon/zookeeper ZOOKEEPER-3082 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/560.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #560 commit 7b6d575c92a076f8f2cb48510f82e3ab840d7c35 Author: Brian Nixon Date: 2018-07-05T18:55:42Z ZOOKEEPER-3082: Fix server snapshot behavior when out of disk space ---