[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/120


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-12 Thread enixon
Github user enixon commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95866724
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -167,6 +175,16 @@ public long restore(DataTree dt, Map 
sessions,
 PlayBackListener listener) throws IOException {
 long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+boolean suspectEmptyDB;
--- End diff --

If `suspectEmptyDB` looks vague to you, that's enough reason to change it. 
The intent is to capture "if you find can't find a data base then treat that 
condition with paranoia and don't vote until someone provides you with one". 
The recovery aspect could be emphasized with `forceRecoverEmptyDB`. Or we could 
invert the condition and call it `trustEmptyDB`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-12 Thread enixon
Github user enixon commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95862018
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -175,11 +193,20 @@ public long restore(DataTree dt, Map 
sessions,
 "No snapshot found, but there are log entries. " +
 "Something is broken!");
 }
-/* TODO: (br33d) we should either put a ConcurrentHashMap on 
restore()
- *   or use Map on save() */
-save(dt, (ConcurrentHashMap)sessions);
-/* return a zxid of zero, since we the database is empty */
-return 0;
+
+if (suspectEmptyDB) {
+/* return a zxid of -1, since we are possibly missing data 
*/
+LOG.warn("Unexpected empty data tree, setting zxid to -1");
--- End diff --

We only reach this point if our `SnapShot` interface was not able to 
deserialize anything into the `DataTree`. In the usual case of `FileSnap`, this 
happens only when there aren't any non-corrupted snapshots to be loaded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-11 Thread eribeiro
Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95721046
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -175,11 +193,20 @@ public long restore(DataTree dt, Map 
sessions,
 "No snapshot found, but there are log entries. " +
 "Something is broken!");
 }
-/* TODO: (br33d) we should either put a ConcurrentHashMap on 
restore()
- *   or use Map on save() */
-save(dt, (ConcurrentHashMap)sessions);
-/* return a zxid of zero, since we the database is empty */
-return 0;
+
+if (suspectEmptyDB) {
+/* return a zxid of -1, since we are possibly missing data 
*/
+LOG.warn("Unexpected empty data tree, setting zxid to -1");
--- End diff --

Are we 100% sure the data tree is empty? Couldn't it be only partially 
complete? I mean the machine recorded up to transaction n, but lost 
transactions n+1, n+2, n+3, etc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-11 Thread enixon
Github user enixon commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95714487
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -167,6 +175,16 @@ public long restore(DataTree dt, Map 
sessions,
 PlayBackListener listener) throws IOException {
 long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+boolean suspectEmptyDB;
+File initFile = new File(dataDir.getParent(), "initialize");
+if (initFile.exists()) {
+if (!initFile.delete()) {
+throw new IOException("Unable to delete initialization 
file " + initFile.toString());
+}
+suspectEmptyDB = false;
+} else {
+suspectEmptyDB = !autoCreateDB;
--- End diff --

I tempted to do put the log line on the other side of the conditional since 
this side is the expected case. We should only delete an initialize file once 
in the lifecycle of a given server while the check against `autoCreateDB` will 
happen every other time the server is restarted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-11 Thread enixon
Github user enixon commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95714170
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -167,6 +175,16 @@ public long restore(DataTree dt, Map 
sessions,
 PlayBackListener listener) throws IOException {
 long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+boolean suspectEmptyDB;
+File initFile = new File(dataDir.getParent(), "initialize");
+if (initFile.exists()) {
--- End diff --

Nice optimization, I like it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-11 Thread enixon
Github user enixon commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95714094
  
--- Diff: bin/zkServer-initialize.sh ---
@@ -113,6 +113,8 @@ initialize() {
 else
 echo "No myid provided, be sure to specify it in $ZOO_DATADIR/myid 
if using non-standalone"
 fi
+
+date > "$ZOO_DATADIR/initialize"
--- End diff --

True enough, `touch` is sufficient. Using `date` is an optimization I've 
included in other scripts in the past as a way of sneaking a bit more 
information into an otherwise meaningless file but in this context it's 
probably just confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-11 Thread eribeiro
Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95711916
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -167,6 +175,16 @@ public long restore(DataTree dt, Map 
sessions,
 PlayBackListener listener) throws IOException {
 long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+boolean suspectEmptyDB;
+File initFile = new File(dataDir.getParent(), "initialize");
+if (initFile.exists()) {
--- End diff --

Disclaimer: I am not used to `Files` class so you may have to make sure it 
doesn't alter the current behaviour if you decide to use it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-11 Thread eribeiro
Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95710948
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -167,6 +175,16 @@ public long restore(DataTree dt, Map 
sessions,
 PlayBackListener listener) throws IOException {
 long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+boolean suspectEmptyDB;
+File initFile = new File(dataDir.getParent(), "initialize");
+if (initFile.exists()) {
+if (!initFile.delete()) {
+throw new IOException("Unable to delete initialization 
file " + initFile.toString());
+}
+suspectEmptyDB = false;
+} else {
+suspectEmptyDB = !autoCreateDB;
--- End diff --

IMO, it would be nice to put a `debug` (warn?) log message here. Something 
along the lines of "Initialize file doesn't found! Using autoCreateDB 
attribute."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-11 Thread eribeiro
Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95703179
  
--- Diff: bin/zkServer-initialize.sh ---
@@ -113,6 +113,8 @@ initialize() {
 else
 echo "No myid provided, be sure to specify it in $ZOO_DATADIR/myid 
if using non-standalone"
 fi
+
+date > "$ZOO_DATADIR/initialize"
--- End diff --

Nit: If the sole purpose of this file is to act as a marker, in spite of 
its content,  then a 

```touch $ZOO_DATADIR/initialize``` 

would be enough, wouldn't it?

Of course, `date` is fine as well, no problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-11 Thread eribeiro
Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95707294
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -167,6 +175,16 @@ public long restore(DataTree dt, Map 
sessions,
 PlayBackListener listener) throws IOException {
 long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+boolean suspectEmptyDB;
+File initFile = new File(dataDir.getParent(), "initialize");
+if (initFile.exists()) {
--- End diff --

As Java 7 is the default we could use the code below? The benefits are that 
it automatically throws the `IOException` if an I/O error happens or return 
`false` if the file doesn't exists.

```
if (Files.deleteIfExists(initFile.toPath()) {
suspectEmptyDB = false;
} else {
(...)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-11 Thread eribeiro
Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95707659
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) 
throws IOException {
 
 txnLog = new FileTxnLog(this.dataDir);
 snapLog = new FileSnap(this.snapDir);
+
+autoCreateDB = 
Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE,
--- End diff --

+1 with @hanm 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2017-01-11 Thread eribeiro
Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r95709489
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -167,6 +175,16 @@ public long restore(DataTree dt, Map 
sessions,
 PlayBackListener listener) throws IOException {
 long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+boolean suspectEmptyDB;
--- End diff --

Could we rename this to `recoveringDB` or `recoveringNode`? 

My rationale is: `suspectEmptyDB` looks vague to me, **plus**  __if I 
understood it right__ a node could have been shutdown and restarted after some 
time. So, not necessarily its DB will be empty, but it is in a recovering 
process so we want to avoid that it becoming the leader and messing up with 
transactions performed while it was offline, right?
Could we rename this to `recoveringDB` or `recoveringNode`? 

My rationale is: `suspectEmptyDB` looks vague to me, **plus** because __if 
I understood it right__ a node could have been shutdown and restarted after 
some time. So, not necessarily its DB will be empty, but it is in a recovering 
process so we want to avoid that it becoming the leader and messing up with 
transactions performed while it was offline, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2016-12-06 Thread hanm
Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r91235960
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) 
throws IOException {
 
 txnLog = new FileTxnLog(this.dataDir);
 snapLog = new FileSnap(this.snapDir);
+
+autoCreateDB = 
Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE,
--- End diff --

>> Is that in accord with Zookeeper style?

I see - I thought the new property was not end user facing since there is 
no associated documents added here. Since the property 
"zookeeper.db.autocreate" is exposed to user some doc could be added to 
ZooKeeperAdmin.html (similarly like how the existing 
"zookeeper.datadir.autocreate" is documented there) to describe the motivation 
/ usage of the property.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2016-12-06 Thread enixon
Github user enixon commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r91234951
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) 
throws IOException {
 
 txnLog = new FileTxnLog(this.dataDir);
 snapLog = new FileSnap(this.snapDir);
+
+autoCreateDB = 
Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE,
--- End diff --

I included `ZOOKEEPER_DB_AUTOCREATE` to allow users to opt out of the 
feature until they're ready to update their ensemble management tooling to 
support creating the new file. Is that in accord with Zookeeper style?

On the question of style, `ZOOKEEPER_DB_AUTOCREATE_DEFAULT` exists purely 
because `ZOOKEEPER_DATADIR_AUTOCREATE_DEFAULT` exists above it in the file. If 
including the defaults as static constants isn't Zookeeper style then I'm happy 
to replace it with a string literal in the constructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2016-12-06 Thread hanm
Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r91208138
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -165,8 +173,41 @@ public File getSnapDir() {
  */
 public long restore(DataTree dt, Map sessions,
 PlayBackListener listener) throws IOException {
-snapLog.deserialize(dt, sessions);
+long deserializeResult = snapLog.deserialize(dt, sessions);
 FileTxnLog txnLog = new FileTxnLog(dataDir);
+boolean suspectEmptyDB;
+File initFile = new File(dataDir.getParent(), "initialize");
+if (initFile.exists()) {
+if (!initFile.delete()) {
+LOG.warn("Unable to delete initialization file " + 
initFile.toString());
--- End diff --

It sounds pretty serious issue if the initialize file can't be cleaned up 
upon startup of a new ensemble for whatever reasons as the presence of this 
file is the key promise made here - not able to clean it up will possibly lead 
to inconsistent quorum state again that this PR is trying to fix. So, maybe 
throw an IOException here to abort server start process and let admin intervene 
instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: ZOOKEEPER-261

2016-12-06 Thread hanm
Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/120#discussion_r91208860
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java ---
@@ -132,6 +137,9 @@ public FileTxnSnapLog(File dataDir, File snapDir) 
throws IOException {
 
 txnLog = new FileTxnLog(this.dataDir);
 snapLog = new FileSnap(this.snapDir);
+
+autoCreateDB = 
Boolean.parseBoolean(System.getProperty(ZOOKEEPER_DB_AUTOCREATE,
--- End diff --

It seems that this variable `autoCreateDB` (and the property 
`ZOOKEEPER_DB_AUTOCREATE_DEFAULT` and `ZOOKEEPER_DB_AUTOCREATE`) is used solely 
for testing purpose (to change control flow and get code coverage). IIUC, maybe 
add some comments about these test only variables?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #120: Zookeeper 261

2016-12-06 Thread enixon
GitHub user enixon opened a pull request:

https://github.com/apache/zookeeper/pull/120

Zookeeper 261



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enixon/zookeeper ZOOKEEPER-261

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #120


commit 68433ae695438f6f97e740b8413b1f7aca56fbd3
Author: Benjamin Reed 
Date:   2016-11-29T22:08:22Z

ZOOKEEPER-2325: Data inconsistency if all snapshots empty or missing

commit 35ce7e275c4ec8431e5dc7775d8257d4f76ec251
Author: Benjamin Reed 
Date:   2016-12-01T23:04:52Z

address comments from rgs1

commit f67e247d9fcb222563c5e5e1f363ad0fe56dcc05
Author: Brian Nixon 
Date:   2016-11-29T22:56:07Z

ZOOKEEPER-261: Reinitialized servers should not participate in leader 
election




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---