[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664967#comment-16664967 ] Andor Molnar commented on ZOOKEEPER-2325: - This issue was originally targeted to 3.5 and 3.6 and fix has been committed to these branches. Porting it to 3.4 is still open, but hasn't been updated for a while. In the process of cleaning up outstanding 3.5 tickets, I'll close this one and consider it fixed in 3.5. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.6.0, 3.5.5 > > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968437#comment-15968437 ] Brian Nixon commented on ZOOKEEPER-2325: When a server starts up, it should always capture the state of the loaded database with a fresh snapshot. I don't believe it is a valid state to have a log file without a snapshot file. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: zk.patch, ZOOKEEPER-2325.001.patch, > ZOOKEEPER-2325-test.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963604#comment-15963604 ] Abhay Bothra commented on ZOOKEEPER-2325: - We saw a scenario where the zookeeper cluster had a log.1 file, but no snapshot.0. Is this a possible state of the data dir? If yes, this change prevents Zookeeper from restoring from just a txn log > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: zk.patch, ZOOKEEPER-2325.001.patch, > ZOOKEEPER-2325-test.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827431#comment-15827431 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user rakeshadr commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/144#discussion_r96568373 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -37,6 +37,8 @@ import java.util.ArrayList; import java.util.HashMap; import java.util.List; +import java.util.concurrent.ConcurrentHashMap; +import java.util.Map; --- End diff -- Good catch, @afine. This test is related to ZOOKEEPER-1558. Perhaps, we need to re-look the fix to see any chance of creating snapshot with uncommitted state. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.5.3, 3.6.0 > > Attachments: zk.patch, ZOOKEEPER-2325.001.patch, > ZOOKEEPER-2325-test.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827051#comment-15827051 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/144#discussion_r96525078 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -37,6 +37,8 @@ import java.util.ArrayList; import java.util.HashMap; import java.util.List; +import java.util.concurrent.ConcurrentHashMap; +import java.util.Map; --- End diff -- nit: i don't think that this import is needed > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.5.3, 3.6.0 > > Attachments: zk.patch, ZOOKEEPER-2325.001.patch, > ZOOKEEPER-2325-test.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827050#comment-15827050 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/144#discussion_r96534854 --- Diff: src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java --- @@ -0,0 +1,133 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.zookeeper.test; + +import java.io.IOException; +import java.io.File; +import java.io.PrintWriter; +import java.util.List; +import java.util.LinkedList; --- End diff -- a couple unneeded imports here as well > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.5.3, 3.6.0 > > Attachments: zk.patch, ZOOKEEPER-2325.001.patch, > ZOOKEEPER-2325-test.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827048#comment-15827048 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/144#discussion_r96526559 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -135,8 +135,22 @@ public File getSnapDir() { */ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { -snapLog.deserialize(dt, sessions); +long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +if (-1L == deserializeResult) { +/* this means that we couldn't find any snapshot, so we need to + * initialize an empty database (reported in ZOOKEEPER-2325) */ +if (txnLog.getLastLoggedZxid() != -1) { --- End diff -- would it be worth adding the -1 case to the javadoc for deserialize? > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.5.3, 3.6.0 > > Attachments: zk.patch, ZOOKEEPER-2325.001.patch, > ZOOKEEPER-2325-test.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827049#comment-15827049 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/144#discussion_r96534436 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -37,6 +37,8 @@ import java.util.ArrayList; import java.util.HashMap; import java.util.List; +import java.util.concurrent.ConcurrentHashMap; +import java.util.Map; --- End diff -- this change appears to break `testDirtySnapshot` > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.5.3, 3.6.0 > > Attachments: zk.patch, ZOOKEEPER-2325.001.patch, > ZOOKEEPER-2325-test.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15810600#comment-15810600 ] Brian Nixon commented on ZOOKEEPER-2325: Thanks [~hanm]! Glad we kept the changes for this task and 261 separate. I'll make sure that https://github.com/apache/zookeeper/pull/120 still commits cleanly and update that PR as necessary. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806362#comment-15806362 ] Hudson commented on ZOOKEEPER-2325: --- SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3227 (See [https://builds.apache.org/job/ZooKeeper-trunk/3227/]) ZOOKEEPER-2325: Data inconsistency if all snapshots empty or missing (hanm: rev 7c51b01e89acb38165553366f7e3b2a46c00aa27) * (edit) src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java * (edit) src/java/test/org/apache/zookeeper/test/TruncateTest.java * (add) src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java * (edit) src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806330#comment-15806330 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- GitHub user hanm opened a pull request: https://github.com/apache/zookeeper/pull/144 ZOOKEEPER-2325 for branch-3.4. There was a merge conflict on file Zab1_0Test.java when cherry-picking commit 7c51b01e89acb38165553366f7e3b2a46c00aa27 to branch-3.4. This PR resolves the merge conflict (it is trivial, just two imports conflicts). @breed @rgs1 @rakeshadr Please take a look and help committing this thanks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hanm/zookeeper ZOOKEEPER-2325 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/144.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #144 commit afd34f286c860591f1e26b1c29539b322cef7bcc Author: Benjamin ReedDate: 2017-01-07T00:09:54Z ZOOKEEPER-2325 (Data inconsistency if all snapshots empty or missing) for branch-3.4. Resolved merge conflict on Zab1_0Test.java when cherry-picking 7c51b01e89acb38165553366f7e3b2a46c00aa27. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806250#comment-15806250 ] Michael Han commented on ZOOKEEPER-2325: [~nixon] [~breed] merged in master and 3.5. I think we also want this in 3.4, but there is a merge conflict, so that merge will be handled separately. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806240#comment-15806240 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/117 > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805924#comment-15805924 ] Michael Han commented on ZOOKEEPER-2325: I could help committing this one as the patch is reviewed. [~nixon] I thought you just need the PR associated with ZOOKEEPER-261 whose change is a superset of this JIRA (and committing both same time would probably save your some effort on rebase, if any), but probably separation of concern is better.. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805547#comment-15805547 ] Brian Nixon commented on ZOOKEEPER-2325: Any word on committing this patch? I'd love to unblock ZOOKEEPER-261. [~fpj] ? > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752213#comment-15752213 ] Michael Han commented on ZOOKEEPER-2325: I noticed that the pull request of ZOOKEEPER-261 contains changes in ZOOKEEPER-2325 already: https://github.com/apache/zookeeper/pull/120/commits > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752196#comment-15752196 ] Benjamin Reed commented on ZOOKEEPER-2325: -- [~rgs] can you commit this? we need it to get ZOOKEEPER-261 in. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720874#comment-15720874 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user breed commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/117#discussion_r90791441 --- Diff: src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java --- @@ -0,0 +1,134 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.zookeeper.test; + +import java.io.IOException; +import java.io.File; +import java.io.PrintWriter; +import java.util.List; +import java.util.LinkedList; + +import org.apache.log4j.Logger; +import org.apache.zookeeper.CreateMode; +import org.apache.zookeeper.PortAssignment; +import org.apache.zookeeper.WatchedEvent; +import org.apache.zookeeper.Watcher; +import org.apache.zookeeper.ZKTestCase; +import org.apache.zookeeper.ZooKeeper; +import org.apache.zookeeper.ZooDefs.Ids; +import org.apache.zookeeper.server.quorum.Leader.Proposal; +import org.apache.zookeeper.server.ServerCnxnFactory; +import org.apache.zookeeper.server.SyncRequestProcessor; +import org.apache.zookeeper.server.ZooKeeperServer; +import org.apache.zookeeper.server.persistence.FileTxnSnapLog; +import org.junit.Assert; +import org.junit.Test; + +/** If snapshots are corrupted to the empty file or deleted, Zookeeper should + * not proceed to read its transactiong log files + * Test that zxid == -1 in the presence of emptied/deleted snapshots + */ +public class EmptiedSnapshotRecoveryTest extends ZKTestCase implements Watcher { +private static final Logger LOG = Logger.getLogger(RestoreCommittedLogTest.class); +private static String HOSTPORT = "127.0.0.1:" + PortAssignment.unique(); +private static final int CONNECTION_TIMEOUT = 3000; +private static final int N_TRANSACTIONS = 150; +private static final int SNAP_COUNT = 100; + +public void runTest(boolean leaveEmptyFile) throws Exception { --- End diff -- i think we should skip that test. we have a fix for ZOOKEEPER-261 that fixes that specific scenario. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15718424#comment-15718424 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user rgs1 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/117#discussion_r90761121 --- Diff: src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java --- @@ -0,0 +1,134 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.zookeeper.test; + +import java.io.IOException; +import java.io.File; +import java.io.PrintWriter; +import java.util.List; +import java.util.LinkedList; + +import org.apache.log4j.Logger; +import org.apache.zookeeper.CreateMode; +import org.apache.zookeeper.PortAssignment; +import org.apache.zookeeper.WatchedEvent; +import org.apache.zookeeper.Watcher; +import org.apache.zookeeper.ZKTestCase; +import org.apache.zookeeper.ZooKeeper; +import org.apache.zookeeper.ZooDefs.Ids; +import org.apache.zookeeper.server.quorum.Leader.Proposal; +import org.apache.zookeeper.server.ServerCnxnFactory; +import org.apache.zookeeper.server.SyncRequestProcessor; +import org.apache.zookeeper.server.ZooKeeperServer; +import org.apache.zookeeper.server.persistence.FileTxnSnapLog; +import org.junit.Assert; +import org.junit.Test; + +/** If snapshots are corrupted to the empty file or deleted, Zookeeper should + * not proceed to read its transactiong log files + * Test that zxid == -1 in the presence of emptied/deleted snapshots + */ +public class EmptiedSnapshotRecoveryTest extends ZKTestCase implements Watcher { +private static final Logger LOG = Logger.getLogger(RestoreCommittedLogTest.class); +private static String HOSTPORT = "127.0.0.1:" + PortAssignment.unique(); +private static final int CONNECTION_TIMEOUT = 3000; +private static final int N_TRANSACTIONS = 150; +private static final int SNAP_COUNT = 100; + +public void runTest(boolean leaveEmptyFile) throws Exception { --- End diff -- @breed do you want to take @hanm's suggestion or should I merge this and we get that in another pass? > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713392#comment-15713392 ] Hadoop QA commented on ZOOKEEPER-2325: -- -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/96//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/96//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/96//console This message is automatically generated. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713018#comment-15713018 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/117 @breed I think I commented by started a review but not submitted - haven't used this github feature until now. Just submitted my comments, they should appear now. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713016#comment-15713016 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/117#discussion_r90516950 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -165,8 +165,22 @@ public File getSnapDir() { */ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { -snapLog.deserialize(dt, sessions); +long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +if (-1L == deserializeResult) { +/* this means that we couldn't find any snapshot, so we need to + * initialize an empty database */ +if (txnLog.getLastLoggedZxid() != -1) { +throw new IOException( +"No snapshot found, but there are log entries. " + +"Something is broken!"); +} +/* TODO: (br33d) we should either put a ConcurrentHashMap on restore() + * or use Map on save() */ +save(dt, (ConcurrentHashMap )sessions); --- End diff -- I think we need it here because if we are getting here then the zxid of this server must be -1, so it would not win leader election if at least one other server is sane (with valid snapshot/txn log to recover.), so this server will become a follow and sync the (none empty) snapshot from the leader. If all servers have empty snapshots then this save is also required to bootstrap the recover process. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713015#comment-15713015 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/117#discussion_r90516233 --- Diff: src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java --- @@ -0,0 +1,134 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.zookeeper.test; + +import java.io.IOException; +import java.io.File; +import java.io.PrintWriter; +import java.util.List; +import java.util.LinkedList; + +import org.apache.log4j.Logger; +import org.apache.zookeeper.CreateMode; +import org.apache.zookeeper.PortAssignment; +import org.apache.zookeeper.WatchedEvent; +import org.apache.zookeeper.Watcher; +import org.apache.zookeeper.ZKTestCase; +import org.apache.zookeeper.ZooKeeper; +import org.apache.zookeeper.ZooDefs.Ids; +import org.apache.zookeeper.server.quorum.Leader.Proposal; +import org.apache.zookeeper.server.ServerCnxnFactory; +import org.apache.zookeeper.server.SyncRequestProcessor; +import org.apache.zookeeper.server.ZooKeeperServer; +import org.apache.zookeeper.server.persistence.FileTxnSnapLog; +import org.junit.Assert; +import org.junit.Test; + +/** If snapshots are corrupted to the empty file or deleted, Zookeeper should + * not proceed to read its transactiong log files + * Test that zxid == -1 in the presence of emptied/deleted snapshots + */ +public class EmptiedSnapshotRecoveryTest extends ZKTestCase implements Watcher { +private static final Logger LOG = Logger.getLogger(RestoreCommittedLogTest.class); +private static String HOSTPORT = "127.0.0.1:" + PortAssignment.unique(); +private static final int CONNECTION_TIMEOUT = 3000; +private static final int N_TRANSACTIONS = 150; +private static final int SNAP_COUNT = 100; + +public void runTest(boolean leaveEmptyFile) throws Exception { --- End diff -- Test coverage improvement suggestion: here we don't cover the case where both transaction log files and snap shot files are missing (either deleted, or empty) - in which case the ZK server should happily recover w/o problem. Something like this should work: `runTest(boolean leaveEmptySnapshotFile, boolean leaveEmptyTxnLogFile)`. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712961#comment-15712961 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user breed commented on the issue: https://github.com/apache/zookeeper/pull/117 @hanm i can't seem to find your comment :) > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712905#comment-15712905 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user breed commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/117#discussion_r90523812 --- Diff: src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java --- @@ -0,0 +1,134 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.zookeeper.test; + +import java.io.IOException; +import java.io.File; +import java.io.PrintWriter; +import java.util.List; +import java.util.LinkedList; + +import org.apache.log4j.Logger; +import org.apache.zookeeper.CreateMode; +import org.apache.zookeeper.PortAssignment; +import org.apache.zookeeper.WatchedEvent; +import org.apache.zookeeper.Watcher; +import org.apache.zookeeper.ZKTestCase; +import org.apache.zookeeper.ZooKeeper; +import org.apache.zookeeper.ZooDefs.Ids; +import org.apache.zookeeper.server.quorum.Leader.Proposal; +import org.apache.zookeeper.server.ServerCnxnFactory; +import org.apache.zookeeper.server.SyncRequestProcessor; +import org.apache.zookeeper.server.ZooKeeperServer; +import org.apache.zookeeper.server.persistence.FileTxnSnapLog; +import org.junit.Assert; +import org.junit.Test; + +/** If snapshots are corrupted to the empty file or deleted, Zookeeper should + * not proceed to read its transactiong log files + * Test that zxid == -1 in the presence of emptied/deleted snapshots + */ +public class EmptiedSnapshotRecoveryTest extends ZKTestCase implements Watcher { +private static final Logger LOG = Logger.getLogger(RestoreCommittedLogTest.class); +private static String HOSTPORT = "127.0.0.1:" + PortAssignment.unique(); +private static final int CONNECTION_TIMEOUT = 3000; +private static final int N_TRANSACTIONS = 150; +private static final int SNAP_COUNT = 100; + +public void runTest(boolean leaveEmptyFile) throws Exception { +File tmpSnapDir = ClientBase.createTmpDir(); +File tmpLogDir = ClientBase.createTmpDir(); +ClientBase.setupTestEnv(); +ZooKeeperServer zks = new ZooKeeperServer(tmpSnapDir, tmpLogDir, 3000); +SyncRequestProcessor.setSnapCount(SNAP_COUNT); +final int PORT = Integer.parseInt(HOSTPORT.split(":")[1]); +ServerCnxnFactory f = ServerCnxnFactory.createFactory(PORT, -1); +f.startup(zks); +Assert.assertTrue("waiting for server being up ", +ClientBase.waitForServerUp(HOSTPORT,CONNECTION_TIMEOUT)); +ZooKeeper zk = new ZooKeeper(HOSTPORT, CONNECTION_TIMEOUT, this); +try { +for (int i = 0; i< N_TRANSACTIONS; i++) { +zk.create("/node-" + i, new byte[0], Ids.OPEN_ACL_UNSAFE, +CreateMode.PERSISTENT); +} +} finally { +zk.close(); +} +f.shutdown(); +zks.shutdown(); +Assert.assertTrue("waiting for server to shutdown", +ClientBase.waitForServerDown(HOSTPORT, CONNECTION_TIMEOUT)); + +// start server again with intact database +zks = new ZooKeeperServer(tmpSnapDir, tmpLogDir, 3000); +zks.startdata(); +long zxid = zks.getZKDatabase().getDataTreeLastProcessedZxid(); +LOG.info("After clean restart, zxid = " + zxid); +Assert.assertTrue("zxid > 0", zxid > 0); +zks.shutdown(); + +// Make all snapshots empty +FileTxnSnapLog txnLogFactory = zks.getTxnLogFactory(); +List snapshots = txnLogFactory.findNRecentSnapshots(10); +Assert.assertTrue("We have a snapshot to corrupt", snapshots.size() > 0); +for (File file: snapshots) { +if
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712900#comment-15712900 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user breed commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/117#discussion_r90523518 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -165,8 +165,22 @@ public File getSnapDir() { */ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { -snapLog.deserialize(dt, sessions); +long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +if (-1L == deserializeResult) { +/* this means that we couldn't find any snapshot, so we need to + * initialize an empty database */ +if (txnLog.getLastLoggedZxid() != -1) { +throw new IOException( +"No snapshot found, but there are log entries. " + +"Something is broken!"); +} +/* TODO: (br33d) we should either put a ConcurrentHashMap on restore() + * or use Map on save() */ +save(dt, (ConcurrentHashMap )sessions); --- End diff -- yes, ZOOKEEPER-261 is still a problem. is that what you are referring to? brian has a patch coming that builds on this one to fix that problem. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712832#comment-15712832 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/117 +1, just one comment on test coverage. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712490#comment-15712490 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user rgs1 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/117#discussion_r90491384 --- Diff: src/java/test/org/apache/zookeeper/test/EmptiedSnapshotRecoveryTest.java --- @@ -0,0 +1,134 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.zookeeper.test; + +import java.io.IOException; +import java.io.File; +import java.io.PrintWriter; +import java.util.List; +import java.util.LinkedList; + +import org.apache.log4j.Logger; +import org.apache.zookeeper.CreateMode; +import org.apache.zookeeper.PortAssignment; +import org.apache.zookeeper.WatchedEvent; +import org.apache.zookeeper.Watcher; +import org.apache.zookeeper.ZKTestCase; +import org.apache.zookeeper.ZooKeeper; +import org.apache.zookeeper.ZooDefs.Ids; +import org.apache.zookeeper.server.quorum.Leader.Proposal; +import org.apache.zookeeper.server.ServerCnxnFactory; +import org.apache.zookeeper.server.SyncRequestProcessor; +import org.apache.zookeeper.server.ZooKeeperServer; +import org.apache.zookeeper.server.persistence.FileTxnSnapLog; +import org.junit.Assert; +import org.junit.Test; + +/** If snapshots are corrupted to the empty file or deleted, Zookeeper should + * not proceed to read its transactiong log files + * Test that zxid == -1 in the presence of emptied/deleted snapshots + */ +public class EmptiedSnapshotRecoveryTest extends ZKTestCase implements Watcher { +private static final Logger LOG = Logger.getLogger(RestoreCommittedLogTest.class); +private static String HOSTPORT = "127.0.0.1:" + PortAssignment.unique(); +private static final int CONNECTION_TIMEOUT = 3000; +private static final int N_TRANSACTIONS = 150; +private static final int SNAP_COUNT = 100; + +public void runTest(boolean leaveEmptyFile) throws Exception { +File tmpSnapDir = ClientBase.createTmpDir(); +File tmpLogDir = ClientBase.createTmpDir(); +ClientBase.setupTestEnv(); +ZooKeeperServer zks = new ZooKeeperServer(tmpSnapDir, tmpLogDir, 3000); +SyncRequestProcessor.setSnapCount(SNAP_COUNT); +final int PORT = Integer.parseInt(HOSTPORT.split(":")[1]); +ServerCnxnFactory f = ServerCnxnFactory.createFactory(PORT, -1); +f.startup(zks); +Assert.assertTrue("waiting for server being up ", +ClientBase.waitForServerUp(HOSTPORT,CONNECTION_TIMEOUT)); +ZooKeeper zk = new ZooKeeper(HOSTPORT, CONNECTION_TIMEOUT, this); +try { +for (int i = 0; i< N_TRANSACTIONS; i++) { +zk.create("/node-" + i, new byte[0], Ids.OPEN_ACL_UNSAFE, +CreateMode.PERSISTENT); +} +} finally { +zk.close(); +} +f.shutdown(); +zks.shutdown(); +Assert.assertTrue("waiting for server to shutdown", +ClientBase.waitForServerDown(HOSTPORT, CONNECTION_TIMEOUT)); + +// start server again with intact database +zks = new ZooKeeperServer(tmpSnapDir, tmpLogDir, 3000); +zks.startdata(); +long zxid = zks.getZKDatabase().getDataTreeLastProcessedZxid(); +LOG.info("After clean restart, zxid = " + zxid); +Assert.assertTrue("zxid > 0", zxid > 0); +zks.shutdown(); + +// Make all snapshots empty +FileTxnSnapLog txnLogFactory = zks.getTxnLogFactory(); +List snapshots = txnLogFactory.findNRecentSnapshots(10); +Assert.assertTrue("We have a snapshot to corrupt", snapshots.size() > 0); +for (File file: snapshots) { +if
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712489#comment-15712489 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user rgs1 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/117#discussion_r90490925 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -165,8 +165,22 @@ public File getSnapDir() { */ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { -snapLog.deserialize(dt, sessions); +long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +if (-1L == deserializeResult) { +/* this means that we couldn't find any snapshot, so we need to + * initialize an empty database */ --- End diff -- nit: can you add a reference to ZOOKEEPER-2325 here? > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710994#comment-15710994 ] Hadoop QA commented on ZOOKEEPER-2325: -- +1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/95//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/95//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/95//console This message is automatically generated. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710502#comment-15710502 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- Github user lvfangmin commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/117#discussion_r90368114 --- Diff: src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java --- @@ -165,8 +165,22 @@ public File getSnapDir() { */ public long restore(DataTree dt, Mapsessions, PlayBackListener listener) throws IOException { -snapLog.deserialize(dt, sessions); +long deserializeResult = snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); +if (-1L == deserializeResult) { +/* this means that we couldn't find any snapshot, so we need to + * initialize an empty database */ +if (txnLog.getLastLoggedZxid() != -1) { +throw new IOException( +"No snapshot found, but there are log entries. " + +"Something is broken!"); +} +/* TODO: (br33d) we should either put a ConcurrentHashMap on restore() + * or use Map on save() */ +save(dt, (ConcurrentHashMap )sessions); --- End diff -- n00b question, why we need to save here? I saw there is potential that the follower will sync with with leader just to send over the empty snapshot. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706861#comment-15706861 ] Hadoop QA commented on ZOOKEEPER-2325: -- -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/93//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/93//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/93//console This message is automatically generated. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706823#comment-15706823 ] Andrew Grasso commented on ZOOKEEPER-2325: -- This looks good to me. Thanks for putting the pull request together. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706785#comment-15706785 ] Benjamin Reed commented on ZOOKEEPER-2325: -- hey andrew, i've merged all the patches into a pull request. can you take a look and make sure everything looks ok? > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706783#comment-15706783 ] ASF GitHub Bot commented on ZOOKEEPER-2325: --- GitHub user breed opened a pull request: https://github.com/apache/zookeeper/pull/117 ZOOKEEPER-2325: Data inconsistency if all snapshots empty or missing You can merge this pull request into a Git repository by running: $ git pull https://github.com/breed/zookeeper ZOOKEEPER-2325 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/117.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #117 commit 02bf3d57786d51da205e78a070a45703da21f916 Author: Benjamin ReedDate: 2016-11-29T22:08:22Z ZOOKEEPER-2325: Data inconsistency if all snapshots empty or missing > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514078#comment-15514078 ] Hadoop QA commented on ZOOKEEPER-2325: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12823508/zk.patch against trunk revision ec20c5434cc8a334b3fd25e27d26dccf4793c8f3. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3447//console This message is automatically generated. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261574#comment-15261574 ] Flavio Junqueira commented on ZOOKEEPER-2325: - It sounds like this should be easy to reproduce. Do you think you can add a test case for this [~agrasso]? > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325.001.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261095#comment-15261095 ] Nicholas Wolchko commented on ZOOKEEPER-2325: - I've seen some cases where our zookeeper servers lost data, and this looks like it could be the cause. Is there anything blocking this change? > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso > Attachments: ZOOKEEPER-2325.001.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)