[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...

2018-08-08 Thread enixon
Github user enixon commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/560#discussion_r208781421
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -399,8 +399,30 @@ public void save(DataTree dataTree,
 File snapshotFile = new File(snapDir, 
Util.makeSnapshotName(lastZxid));
 LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid),
 snapshotFile);
-snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, 
syncSnap);
-
+try {
+snapLog.serialize(dataTree, sessionsWithTimeouts, 
snapshotFile, syncSnap);
+} catch (IOException e) {
+if (snapshotFile.length() == 0) {
--- End diff --

Created https://issues.apache.org/jira/browse/ZOOKEEPER-3115 to record this 
idea of deleting the snapshot file on general IOException's and we can continue 
the conversation there.


---


[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...

2018-07-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/560


---


[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...

2018-07-26 Thread hanm
Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/560#discussion_r205659448
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -399,8 +399,30 @@ public void save(DataTree dataTree,
 File snapshotFile = new File(snapDir, 
Util.makeSnapshotName(lastZxid));
 LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid),
 snapshotFile);
-snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, 
syncSnap);
-
+try {
+snapLog.serialize(dataTree, sessionsWithTimeouts, 
snapshotFile, syncSnap);
+} catch (IOException e) {
+if (snapshotFile.length() == 0) {
--- End diff --

>> If that sounds reasonable, I'd propose that we commit on this patch and 
close the hole created from this particular issue

SGTM. @breed - ok with you? If so I'll commit this.


---


[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...

2018-07-18 Thread enixon
Github user enixon commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/560#discussion_r203474532
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -399,8 +399,30 @@ public void save(DataTree dataTree,
 File snapshotFile = new File(snapDir, 
Util.makeSnapshotName(lastZxid));
 LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid),
 snapshotFile);
-snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, 
syncSnap);
-
+try {
+snapLog.serialize(dataTree, sessionsWithTimeouts, 
snapshotFile, syncSnap);
+} catch (IOException e) {
+if (snapshotFile.length() == 0) {
--- End diff --

I really like the idea! 

Clearly this patch is aimed at closing one very particular kind of error 
case around snapshot writing and the idea generalizes to many more errors. My 
only concern with "always delete" is that I do not know how wide a net that 
casts and whether we might end up deleting a legitimately useful snapshot under 
some circumstances. Unfortunately, I don't have time to do the testing required 
to validate the safety of the more general approach.

If that sounds reasonable, I'd propose that we commit on this patch and 
close the hole created from this particular issue (snapshot problems on no disk 
space) and I'll open a separate JIRA with details about extending the solution 
so someone can pick up the issue and do the necessary testing.


---


[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...

2018-07-06 Thread breed
Github user breed commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/560#discussion_r200716591
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -399,8 +399,30 @@ public void save(DataTree dataTree,
 File snapshotFile = new File(snapDir, 
Util.makeSnapshotName(lastZxid));
 LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid),
 snapshotFile);
-snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile, 
syncSnap);
-
+try {
+snapLog.serialize(dataTree, sessionsWithTimeouts, 
snapshotFile, syncSnap);
+} catch (IOException e) {
+if (snapshotFile.length() == 0) {
--- End diff --

this seems extra safe. i'm wondering if perhaps we should always delete the 
snapshot file on an exception. the snapshot file probably will not be valid if 
an exception occurred. right?


---


[GitHub] zookeeper pull request #560: ZOOKEEPER-3082: Fix server snapshot behavior wh...

2018-07-05 Thread enixon
GitHub user enixon opened a pull request:

https://github.com/apache/zookeeper/pull/560

ZOOKEEPER-3082: Fix server snapshot behavior when out of disk space



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enixon/zookeeper ZOOKEEPER-3082

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/560.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #560


commit 7b6d575c92a076f8f2cb48510f82e3ab840d7c35
Author: Brian Nixon 
Date:   2018-07-05T18:55:42Z

ZOOKEEPER-3082: Fix server snapshot behavior when out of disk space




---