[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2018-10-26 Thread Andor Molnar (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664967#comment-16664967
 ] 

Andor Molnar commented on ZOOKEEPER-2325:
-

This issue was originally targeted to 3.5 and 3.6 and fix has been committed to 
these branches.
Porting it to 3.4 is still open, but hasn't been updated for a while.

In the process of cleaning up outstanding 3.5 tickets, I'll close this one and 
consider it fixed in 3.5.

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-04-13 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968437#comment-15968437
 ] 

Brian Nixon commented on ZOOKEEPER-2325:


When a server starts up, it should always capture the state of the loaded 
database with a fresh snapshot. I don't believe it is a valid state to have a 
log file without a snapshot file.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
> Attachments: zk.patch, ZOOKEEPER-2325.001.patch, 
> ZOOKEEPER-2325-test.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-04-10 Thread Abhay Bothra (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963604#comment-15963604
 ] 

Abhay Bothra commented on ZOOKEEPER-2325:
-

We saw a scenario where the zookeeper cluster had a log.1 file, but no 
snapshot.0. Is this a possible state of the data dir? If yes, this change 
prevents Zookeeper from restoring from just a txn log

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
> Attachments: zk.patch, ZOOKEEPER-2325.001.patch, 
> ZOOKEEPER-2325-test.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827431#comment-15827431
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user rakeshadr commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/144#discussion_r96568373
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -37,6 +37,8 @@
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.Map;
--- End diff --

Good catch, @afine. This test is related to ZOOKEEPER-1558. Perhaps, we 
need to re-look the fix to see any chance of creating snapshot with uncommitted 
state.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: zk.patch, ZOOKEEPER-2325.001.patch, 
> ZOOKEEPER-2325-test.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827051#comment-15827051
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/144#discussion_r96525078
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -37,6 +37,8 @@
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.Map;
--- End diff --

nit: i don't think that this import is needed


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: zk.patch, ZOOKEEPER-2325.001.patch, 
> ZOOKEEPER-2325-test.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827050#comment-15827050
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/144#discussion_r96534854
  
--- Diff: 
src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java ---
@@ -0,0 +1,133 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.test;
+
+import java.io.IOException;
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.List;
+import java.util.LinkedList;
--- End diff --

a couple unneeded imports here as well


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: zk.patch, ZOOKEEPER-2325.001.patch, 
> ZOOKEEPER-2325-test.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827048#comment-15827048
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/144#discussion_r96526559
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -135,8 +135,22 @@ public File getSnapDir() {
  */
 public long restore(DataTree dt, Map sessions, 
 PlayBackListener listener) throws IOException {
-snapLog.deserialize(dt, sessions);
+long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+if (-1L == deserializeResult) {
+/* this means that we couldn't find any snapshot, so we need to
+ * initialize an empty database (reported in ZOOKEEPER-2325) */
+if (txnLog.getLastLoggedZxid() != -1) {
--- End diff --

would it be worth adding the -1 case to the javadoc for deserialize?


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: zk.patch, ZOOKEEPER-2325.001.patch, 
> ZOOKEEPER-2325-test.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827049#comment-15827049
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/144#discussion_r96534436
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -37,6 +37,8 @@
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.Map;
--- End diff --

this change appears to break `testDirtySnapshot`


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: zk.patch, ZOOKEEPER-2325.001.patch, 
> ZOOKEEPER-2325-test.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-08 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15810600#comment-15810600
 ] 

Brian Nixon commented on ZOOKEEPER-2325:


Thanks [~hanm]!

Glad we kept the changes for this task and 261 separate. I'll make sure that 
https://github.com/apache/zookeeper/pull/120 still commits cleanly and update 
that PR as necessary.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806362#comment-15806362
 ] 

Hudson commented on ZOOKEEPER-2325:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3227 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/3227/])
ZOOKEEPER-2325: Data inconsistency if all snapshots empty or missing (hanm: rev 
7c51b01e89acb38165553366f7e3b2a46c00aa27)
* (edit) 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
* (edit) src/java/test/org/apache/zookeeper/test/TruncateTest.java
* (add) src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java
* (edit) src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806330#comment-15806330
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

GitHub user hanm opened a pull request:

https://github.com/apache/zookeeper/pull/144

ZOOKEEPER-2325 for branch-3.4.

There was a merge conflict on file Zab1_0Test.java when cherry-picking 
commit 7c51b01e89acb38165553366f7e3b2a46c00aa27 to branch-3.4. This PR resolves 
the merge conflict (it is trivial, just two imports conflicts).

@breed @rgs1 @rakeshadr Please take a look and help committing this thanks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hanm/zookeeper ZOOKEEPER-2325

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/144.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #144


commit afd34f286c860591f1e26b1c29539b322cef7bcc
Author: Benjamin Reed 
Date:   2017-01-07T00:09:54Z

ZOOKEEPER-2325 (Data inconsistency if all snapshots empty or missing) for 
branch-3.4.
Resolved merge conflict on Zab1_0Test.java when cherry-picking 
7c51b01e89acb38165553366f7e3b2a46c00aa27.




> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-06 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806250#comment-15806250
 ] 

Michael Han commented on ZOOKEEPER-2325:


[~nixon] [~breed] merged in master and 3.5. 

I think we also want this in 3.4, but there is a merge conflict, so that merge 
will be handled separately. 

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806240#comment-15806240
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/117


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-06 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805924#comment-15805924
 ] 

Michael Han commented on ZOOKEEPER-2325:


I could help committing this one as the patch is reviewed. [~nixon] I thought 
you just need the PR associated with ZOOKEEPER-261 whose change is a superset 
of this JIRA (and committing both same time would probably save your some 
effort on rebase, if any), but probably separation of concern is better..

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-06 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805547#comment-15805547
 ] 

Brian Nixon commented on ZOOKEEPER-2325:


Any word on committing this patch? I'd love to unblock ZOOKEEPER-261.

[~fpj] ?


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-15 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752213#comment-15752213
 ] 

Michael Han commented on ZOOKEEPER-2325:


I noticed that the pull request of ZOOKEEPER-261 contains changes in 
ZOOKEEPER-2325 already: https://github.com/apache/zookeeper/pull/120/commits


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-15 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752196#comment-15752196
 ] 

Benjamin Reed commented on ZOOKEEPER-2325:
--

[~rgs] can you commit this? we need it to get ZOOKEEPER-261 in.

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720874#comment-15720874
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user breed commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/117#discussion_r90791441
  
--- Diff: 
src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java ---
@@ -0,0 +1,134 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.test;
+
+import java.io.IOException;
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.List;
+import java.util.LinkedList;
+
+import org.apache.log4j.Logger;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZKTestCase;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.ZooDefs.Ids;
+import org.apache.zookeeper.server.quorum.Leader.Proposal;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.SyncRequestProcessor;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.apache.zookeeper.server.persistence.FileTxnSnapLog;
+import org.junit.Assert;
+import org.junit.Test;
+
+/** If snapshots are corrupted to the empty file or deleted, Zookeeper 
should 
+ *  not proceed to read its transactiong log files
+ *  Test that zxid == -1 in the presence of emptied/deleted snapshots
+ */
+public class EmptiedSnapshotRecoveryTest extends ZKTestCase implements  
Watcher {
+private static final Logger LOG = 
Logger.getLogger(RestoreCommittedLogTest.class);
+private static String HOSTPORT = "127.0.0.1:" + 
PortAssignment.unique();
+private static final int CONNECTION_TIMEOUT = 3000;
+private static final int N_TRANSACTIONS = 150;
+private static final int SNAP_COUNT = 100;
+
+public void runTest(boolean leaveEmptyFile) throws Exception {
--- End diff --

i think we should skip that test. we have a fix for ZOOKEEPER-261 that 
fixes that specific scenario.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15718424#comment-15718424
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user rgs1 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/117#discussion_r90761121
  
--- Diff: 
src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java ---
@@ -0,0 +1,134 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.test;
+
+import java.io.IOException;
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.List;
+import java.util.LinkedList;
+
+import org.apache.log4j.Logger;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZKTestCase;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.ZooDefs.Ids;
+import org.apache.zookeeper.server.quorum.Leader.Proposal;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.SyncRequestProcessor;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.apache.zookeeper.server.persistence.FileTxnSnapLog;
+import org.junit.Assert;
+import org.junit.Test;
+
+/** If snapshots are corrupted to the empty file or deleted, Zookeeper 
should 
+ *  not proceed to read its transactiong log files
+ *  Test that zxid == -1 in the presence of emptied/deleted snapshots
+ */
+public class EmptiedSnapshotRecoveryTest extends ZKTestCase implements  
Watcher {
+private static final Logger LOG = 
Logger.getLogger(RestoreCommittedLogTest.class);
+private static String HOSTPORT = "127.0.0.1:" + 
PortAssignment.unique();
+private static final int CONNECTION_TIMEOUT = 3000;
+private static final int N_TRANSACTIONS = 150;
+private static final int SNAP_COUNT = 100;
+
+public void runTest(boolean leaveEmptyFile) throws Exception {
--- End diff --

@breed do you want to take @hanm's suggestion or should I merge this and we 
get that in another pass?


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713392#comment-15713392
 ] 

Hadoop QA commented on ZOOKEEPER-2325:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/96//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/96//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/96//console

This message is automatically generated.

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713018#comment-15713018
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/117
  
@breed I think I commented by started a review but not submitted - haven't 
used this github feature until now. Just submitted my comments, they should 
appear now.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713016#comment-15713016
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/117#discussion_r90516950
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -165,8 +165,22 @@ public File getSnapDir() {
  */
 public long restore(DataTree dt, Map sessions,
 PlayBackListener listener) throws IOException {
-snapLog.deserialize(dt, sessions);
+long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+if (-1L == deserializeResult) {
+/* this means that we couldn't find any snapshot, so we need to
+ * initialize an empty database */
+if (txnLog.getLastLoggedZxid() != -1) {
+throw new IOException(
+"No snapshot found, but there are log entries. " +
+"Something is broken!");
+}
+/* TODO: (br33d) we should either put a ConcurrentHashMap on 
restore()
+ *   or use Map on save() */
+save(dt, (ConcurrentHashMap)sessions);
--- End diff --

I think we need it here because if we are getting here then the zxid of 
this server must be -1, so it would not win leader election if at least one 
other server is sane (with valid snapshot/txn log to recover.), so this server 
will become a follow and sync the (none empty) snapshot from the leader. If all 
servers have empty snapshots then this save is also required to bootstrap the 
recover process.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713015#comment-15713015
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/117#discussion_r90516233
  
--- Diff: 
src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java ---
@@ -0,0 +1,134 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.test;
+
+import java.io.IOException;
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.List;
+import java.util.LinkedList;
+
+import org.apache.log4j.Logger;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZKTestCase;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.ZooDefs.Ids;
+import org.apache.zookeeper.server.quorum.Leader.Proposal;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.SyncRequestProcessor;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.apache.zookeeper.server.persistence.FileTxnSnapLog;
+import org.junit.Assert;
+import org.junit.Test;
+
+/** If snapshots are corrupted to the empty file or deleted, Zookeeper 
should 
+ *  not proceed to read its transactiong log files
+ *  Test that zxid == -1 in the presence of emptied/deleted snapshots
+ */
+public class EmptiedSnapshotRecoveryTest extends ZKTestCase implements  
Watcher {
+private static final Logger LOG = 
Logger.getLogger(RestoreCommittedLogTest.class);
+private static String HOSTPORT = "127.0.0.1:" + 
PortAssignment.unique();
+private static final int CONNECTION_TIMEOUT = 3000;
+private static final int N_TRANSACTIONS = 150;
+private static final int SNAP_COUNT = 100;
+
+public void runTest(boolean leaveEmptyFile) throws Exception {
--- End diff --

Test coverage improvement suggestion: here we don't cover the case where 
both transaction log files and snap shot files are missing (either deleted, or 
empty) - in which case the ZK server should happily recover w/o problem. 
Something like this should work: `runTest(boolean leaveEmptySnapshotFile, 
boolean leaveEmptyTxnLogFile)`.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712961#comment-15712961
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user breed commented on the issue:

https://github.com/apache/zookeeper/pull/117
  
@hanm i can't seem to find your comment :)


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712905#comment-15712905
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user breed commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/117#discussion_r90523812
  
--- Diff: 
src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java ---
@@ -0,0 +1,134 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.test;
+
+import java.io.IOException;
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.List;
+import java.util.LinkedList;
+
+import org.apache.log4j.Logger;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZKTestCase;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.ZooDefs.Ids;
+import org.apache.zookeeper.server.quorum.Leader.Proposal;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.SyncRequestProcessor;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.apache.zookeeper.server.persistence.FileTxnSnapLog;
+import org.junit.Assert;
+import org.junit.Test;
+
+/** If snapshots are corrupted to the empty file or deleted, Zookeeper 
should 
+ *  not proceed to read its transactiong log files
+ *  Test that zxid == -1 in the presence of emptied/deleted snapshots
+ */
+public class EmptiedSnapshotRecoveryTest extends ZKTestCase implements  
Watcher {
+private static final Logger LOG = 
Logger.getLogger(RestoreCommittedLogTest.class);
+private static String HOSTPORT = "127.0.0.1:" + 
PortAssignment.unique();
+private static final int CONNECTION_TIMEOUT = 3000;
+private static final int N_TRANSACTIONS = 150;
+private static final int SNAP_COUNT = 100;
+
+public void runTest(boolean leaveEmptyFile) throws Exception {
+File tmpSnapDir = ClientBase.createTmpDir();
+File tmpLogDir  = ClientBase.createTmpDir();
+ClientBase.setupTestEnv();
+ZooKeeperServer zks = new ZooKeeperServer(tmpSnapDir, tmpLogDir, 
3000);
+SyncRequestProcessor.setSnapCount(SNAP_COUNT);
+final int PORT = Integer.parseInt(HOSTPORT.split(":")[1]);
+ServerCnxnFactory f = ServerCnxnFactory.createFactory(PORT, -1);
+f.startup(zks);
+Assert.assertTrue("waiting for server being up ",
+ClientBase.waitForServerUp(HOSTPORT,CONNECTION_TIMEOUT));
+ZooKeeper zk = new ZooKeeper(HOSTPORT, CONNECTION_TIMEOUT, this);
+try {
+for (int i = 0; i< N_TRANSACTIONS; i++) {
+zk.create("/node-" + i, new byte[0], Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+}
+} finally {
+zk.close();
+}
+f.shutdown();
+zks.shutdown();
+Assert.assertTrue("waiting for server to shutdown",
+ClientBase.waitForServerDown(HOSTPORT, 
CONNECTION_TIMEOUT));
+
+// start server again with intact database
+zks = new ZooKeeperServer(tmpSnapDir, tmpLogDir, 3000);
+zks.startdata();
+long zxid = zks.getZKDatabase().getDataTreeLastProcessedZxid();
+LOG.info("After clean restart, zxid = " + zxid);
+Assert.assertTrue("zxid > 0", zxid > 0);
+zks.shutdown();
+
+// Make all snapshots empty
+FileTxnSnapLog txnLogFactory = zks.getTxnLogFactory();
+List snapshots = txnLogFactory.findNRecentSnapshots(10);
+Assert.assertTrue("We have a snapshot to corrupt", 
snapshots.size() > 0);
+for (File file: snapshots) {
+if 

[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712900#comment-15712900
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user breed commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/117#discussion_r90523518
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -165,8 +165,22 @@ public File getSnapDir() {
  */
 public long restore(DataTree dt, Map sessions,
 PlayBackListener listener) throws IOException {
-snapLog.deserialize(dt, sessions);
+long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+if (-1L == deserializeResult) {
+/* this means that we couldn't find any snapshot, so we need to
+ * initialize an empty database */
+if (txnLog.getLastLoggedZxid() != -1) {
+throw new IOException(
+"No snapshot found, but there are log entries. " +
+"Something is broken!");
+}
+/* TODO: (br33d) we should either put a ConcurrentHashMap on 
restore()
+ *   or use Map on save() */
+save(dt, (ConcurrentHashMap)sessions);
--- End diff --

yes, ZOOKEEPER-261 is still a problem. is that what you are referring to? 
brian has a patch coming that builds on this one to fix that problem.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712832#comment-15712832
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/117
  
+1, just one comment on test coverage.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712490#comment-15712490
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user rgs1 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/117#discussion_r90491384
  
--- Diff: 
src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java ---
@@ -0,0 +1,134 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.test;
+
+import java.io.IOException;
+import java.io.File;
+import java.io.PrintWriter;
+import java.util.List;
+import java.util.LinkedList;
+
+import org.apache.log4j.Logger;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZKTestCase;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.ZooDefs.Ids;
+import org.apache.zookeeper.server.quorum.Leader.Proposal;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.SyncRequestProcessor;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.apache.zookeeper.server.persistence.FileTxnSnapLog;
+import org.junit.Assert;
+import org.junit.Test;
+
+/** If snapshots are corrupted to the empty file or deleted, Zookeeper 
should 
+ *  not proceed to read its transactiong log files
+ *  Test that zxid == -1 in the presence of emptied/deleted snapshots
+ */
+public class EmptiedSnapshotRecoveryTest extends ZKTestCase implements  
Watcher {
+private static final Logger LOG = 
Logger.getLogger(RestoreCommittedLogTest.class);
+private static String HOSTPORT = "127.0.0.1:" + 
PortAssignment.unique();
+private static final int CONNECTION_TIMEOUT = 3000;
+private static final int N_TRANSACTIONS = 150;
+private static final int SNAP_COUNT = 100;
+
+public void runTest(boolean leaveEmptyFile) throws Exception {
+File tmpSnapDir = ClientBase.createTmpDir();
+File tmpLogDir  = ClientBase.createTmpDir();
+ClientBase.setupTestEnv();
+ZooKeeperServer zks = new ZooKeeperServer(tmpSnapDir, tmpLogDir, 
3000);
+SyncRequestProcessor.setSnapCount(SNAP_COUNT);
+final int PORT = Integer.parseInt(HOSTPORT.split(":")[1]);
+ServerCnxnFactory f = ServerCnxnFactory.createFactory(PORT, -1);
+f.startup(zks);
+Assert.assertTrue("waiting for server being up ",
+ClientBase.waitForServerUp(HOSTPORT,CONNECTION_TIMEOUT));
+ZooKeeper zk = new ZooKeeper(HOSTPORT, CONNECTION_TIMEOUT, this);
+try {
+for (int i = 0; i< N_TRANSACTIONS; i++) {
+zk.create("/node-" + i, new byte[0], Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+}
+} finally {
+zk.close();
+}
+f.shutdown();
+zks.shutdown();
+Assert.assertTrue("waiting for server to shutdown",
+ClientBase.waitForServerDown(HOSTPORT, 
CONNECTION_TIMEOUT));
+
+// start server again with intact database
+zks = new ZooKeeperServer(tmpSnapDir, tmpLogDir, 3000);
+zks.startdata();
+long zxid = zks.getZKDatabase().getDataTreeLastProcessedZxid();
+LOG.info("After clean restart, zxid = " + zxid);
+Assert.assertTrue("zxid > 0", zxid > 0);
+zks.shutdown();
+
+// Make all snapshots empty
+FileTxnSnapLog txnLogFactory = zks.getTxnLogFactory();
+List snapshots = txnLogFactory.findNRecentSnapshots(10);
+Assert.assertTrue("We have a snapshot to corrupt", 
snapshots.size() > 0);
+for (File file: snapshots) {
+if 

[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-12-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712489#comment-15712489
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user rgs1 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/117#discussion_r90490925
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -165,8 +165,22 @@ public File getSnapDir() {
  */
 public long restore(DataTree dt, Map sessions,
 PlayBackListener listener) throws IOException {
-snapLog.deserialize(dt, sessions);
+long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+if (-1L == deserializeResult) {
+/* this means that we couldn't find any snapshot, so we need to
+ * initialize an empty database */
--- End diff --

nit: can you add a reference to ZOOKEEPER-2325 here?


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-11-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710994#comment-15710994
 ] 

Hadoop QA commented on ZOOKEEPER-2325:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/95//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/95//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/95//console

This message is automatically generated.

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710502#comment-15710502
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/117#discussion_r90368114
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -165,8 +165,22 @@ public File getSnapDir() {
  */
 public long restore(DataTree dt, Map sessions,
 PlayBackListener listener) throws IOException {
-snapLog.deserialize(dt, sessions);
+long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+if (-1L == deserializeResult) {
+/* this means that we couldn't find any snapshot, so we need to
+ * initialize an empty database */
+if (txnLog.getLastLoggedZxid() != -1) {
+throw new IOException(
+"No snapshot found, but there are log entries. " +
+"Something is broken!");
+}
+/* TODO: (br33d) we should either put a ConcurrentHashMap on 
restore()
+ *   or use Map on save() */
+save(dt, (ConcurrentHashMap)sessions);
--- End diff --

n00b question, why we need to save here? I saw there is potential that the 
follower will sync with with leader just to send over the empty snapshot.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-11-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706861#comment-15706861
 ] 

Hadoop QA commented on ZOOKEEPER-2325:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/93//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/93//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/93//console

This message is automatically generated.

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-11-29 Thread Andrew Grasso (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706823#comment-15706823
 ] 

Andrew Grasso commented on ZOOKEEPER-2325:
--

This looks good to me. Thanks for putting the pull request together.

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-11-29 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706785#comment-15706785
 ] 

Benjamin Reed commented on ZOOKEEPER-2325:
--

hey andrew, i've merged all the patches into a pull request. can you take a 
look and make sure everything looks ok?

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706783#comment-15706783
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2325:
---

GitHub user breed opened a pull request:

https://github.com/apache/zookeeper/pull/117

ZOOKEEPER-2325: Data inconsistency if all snapshots empty or missing



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/breed/zookeeper ZOOKEEPER-2325

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #117


commit 02bf3d57786d51da205e78a070a45703da21f916
Author: Benjamin Reed 
Date:   2016-11-29T22:08:22Z

ZOOKEEPER-2325: Data inconsistency if all snapshots empty or missing




> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514078#comment-15514078
 ] 

Hadoop QA commented on ZOOKEEPER-2325:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12823508/zk.patch
  against trunk revision ec20c5434cc8a334b3fd25e27d26dccf4793c8f3.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3447//console

This message is automatically generated.

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-04-27 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261574#comment-15261574
 ] 

Flavio Junqueira commented on ZOOKEEPER-2325:
-

It sounds like this should be easy to reproduce. Do you think you can add a 
test case for this [~agrasso]?

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325.001.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2016-04-27 Thread Nicholas Wolchko (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261095#comment-15261095
 ] 

Nicholas Wolchko commented on ZOOKEEPER-2325:
-

I've seen some cases where our zookeeper servers lost data, and this looks like 
it could be the cause. Is there anything blocking this change?

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
> Attachments: ZOOKEEPER-2325.001.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)