[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-04-18 Thread revans2
Github user revans2 closed the pull request at:

https://github.com/apache/zookeeper/pull/157


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-02-03 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r99347551
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, 
OutputArchive oa,
 Assert.assertEquals(1, f.self.getAcceptedEpoch());
 Assert.assertEquals(1, f.self.getCurrentEpoch());
 
+//Wait for the transactions to be written out. The 
thread that writes them out
+// does not send anything back when it is done.
+long start = System.currentTimeMillis();
+while (createSessionZxid != 
f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) {
--- End diff --

OK Will see what I can do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-02-02 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r99242422
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, 
OutputArchive oa,
 Assert.assertEquals(1, f.self.getAcceptedEpoch());
 Assert.assertEquals(1, f.self.getCurrentEpoch());
 
+//Wait for the transactions to be written out. The 
thread that writes them out
+// does not send anything back when it is done.
+long start = System.currentTimeMillis();
+while (createSessionZxid != 
f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) {
--- End diff --

it would be nice, a good way of actually validating everything is behaving 
as expected


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-31 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r98744573
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, 
OutputArchive oa,
 Assert.assertEquals(1, f.self.getAcceptedEpoch());
 Assert.assertEquals(1, f.self.getCurrentEpoch());
 
+//Wait for the transactions to be written out. The 
thread that writes them out
+// does not send anything back when it is done.
+long start = System.currentTimeMillis();
+while (createSessionZxid != 
f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) {
--- End diff --

It does seem a bit beyond the scope of this.  But if you really want me to 
I can look into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-31 Thread hanm
Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r98734336
  
--- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java ---
@@ -507,9 +507,12 @@ public synchronized void shutdown() {
 if (firstProcessor != null) {
 firstProcessor.shutdown();
 }
-if (zkDb != null) {
-zkDb.clear();
-}
+
+// There is no need to clear the database
+//  * When a new quorum is established we can still apply the diff
+//on top of the same zkDb data
+//  * If we fetch a new snapshot from leader, the zkDb will be
+//cleared anyway before loading the snapshot
 
--- End diff --

There is one case we may still want to clear db here - when one of the 
ZooKeeper critical threads (such as * processors, session trackers) fail, 
ZooKeeper server will shutdown (see runFromConfig) and consequently invoke 
ZooKeeper#shutdown. In this case, I don't see a particular reason not to clear 
the db, though not doing it might be fine (as one could argue the server will 
be dead anyway), but I tend to lean towards the safe side on cleaning the db. 
One way to conditionally do that is to add a Boolean parameter to 
ZooKeeper#shutdown so we can have fine grained control over when to clear db in 
what code path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-30 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r98574493
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, 
OutputArchive oa,
 Assert.assertEquals(1, f.self.getAcceptedEpoch());
 Assert.assertEquals(1, f.self.getCurrentEpoch());
 
+//Wait for the transactions to be written out. The 
thread that writes them out
+// does not send anything back when it is done.
+long start = System.currentTimeMillis();
+while (createSessionZxid != 
f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) {
--- End diff --

just an idea, not sure if it is worth the effort and it may be outside the 
scope of this patch.

we could play with the test infrastructure here a little bit and do some 
dependency injection in `createFollower` that can let us track if db clearing 
and snapshotting occurs when expected.

this may help prevent hard to track race conditions with tests like this in 
the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-30 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r98574533
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Learner.java ---
@@ -321,13 +321,16 @@ protected void syncWithLeader(long newLeaderZxid) 
throws IOException, Interrupte
 QuorumPacket ack = new QuorumPacket(Leader.ACK, 0, null, null);
 QuorumPacket qp = new QuorumPacket();
 long newEpoch = ZxidUtils.getEpochFromZxid(newLeaderZxid);
-
-readPacket(qp);   
+//In the DIFF case we don't need to do a snapshot because the 
transactions will sync on top of any existing snapshot
--- End diff --

nit: I think we generally put spaces after the //


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-30 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r98449817
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Learner.java ---
@@ -364,10 +367,12 @@ else if (qp.getType() == Leader.SNAP) {
 
 long lastQueued = 0;
 
-// in V1.0 we take a snapshot when we get the NEWLEADER 
message, but in pre V1.0
+// in Zab V1.0 (ZK 3.4+) we might take a snapshot when we get 
the NEWLEADER message, but in pre V1.0
 // we take the snapshot at the UPDATE, since V1.0 also gets 
the UPDATE (after the NEWLEADER)
 // we need to make sure that we don't take the snapshot twice.
-boolean snapshotTaken = false;
+boolean isPreZAB1_0 = true;
+//If we are not going to take the snapshot be sure the edits 
are not applied in memory
+boolean writeToEditLog = !snapshotNeeded;
--- End diff --

Sorry about that I am still a bit new at the internal terminology.  I will 
update it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-28 Thread fpj
Github user fpj commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r98337403
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -839,6 +839,13 @@ public void converseWithFollower(InputArchive ia, 
OutputArchive oa,
 Assert.assertEquals(1, f.self.getAcceptedEpoch());
 Assert.assertEquals(1, f.self.getCurrentEpoch());
 
+//Wait for the edits to be written out
--- End diff --

I need to think some more whether it makes any sense to add test cases for 
this. The test cases we already have probably cover this enough given that 
there is no real change of behavior.

This change here is necessary, though. We don't really care about time in 
general in our tests because we can never be sure of the timing we will get 
across runs and with different settings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-28 Thread fpj
Github user fpj commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/157#discussion_r98336253
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Learner.java ---
@@ -364,10 +367,12 @@ else if (qp.getType() == Leader.SNAP) {
 
 long lastQueued = 0;
 
-// in V1.0 we take a snapshot when we get the NEWLEADER 
message, but in pre V1.0
+// in Zab V1.0 (ZK 3.4+) we might take a snapshot when we get 
the NEWLEADER message, but in pre V1.0
 // we take the snapshot at the UPDATE, since V1.0 also gets 
the UPDATE (after the NEWLEADER)
 // we need to make sure that we don't take the snapshot twice.
-boolean snapshotTaken = false;
+boolean isPreZAB1_0 = true;
+//If we are not going to take the snapshot be sure the edits 
are not applied in memory
+boolean writeToEditLog = !snapshotNeeded;
--- End diff --

The changes here are using `edit` to refer to `txns`. I'd rather use `txn` 
to be consistent across the project. Specifically here, you're using `EditLog` 
to refer to the `TxnLog`, please change accordingly to have it consistent 
across the project.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...

2017-01-26 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/zookeeper/pull/157

ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DB

This patch addresses recovery time when a leader is lost on a large DB.  

It does this by not clearing the DB before leader election begins, and by 
avoiding taking a snapshot as part of the SYNC phase, specifically for a DIFF 
sync. It does this by buffering the proposals and commits just like the code 
currently does for proposals/commits sent after the NEWLEADER and before the 
UPTODATE messages. 

If a SNAP is sent we cannot avoid writing out the full snapshot because 
there is no other way to make sure the disk DB is in sync with what is in 
memory.  So any edits to the edit log before a background snapshot happened 
could possibly be applied on top of an incorrect snapshot.

This same optimization should work for TRUNC too, but I opted not to do it 
for TRUNC because TRUNC is rare and TRUNC by its very nature already forces the 
DB to be reread after the edit logs are modified.  So it would still not be 
fast.

In practice this makes it so instead of taking 5+ mins for the cluster to 
recover from losing a leader it now takes about 3 seconds.

I am happy to port this to 3.5. if it looks good.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #157


commit 5aa25620e0189b28d7040305272be2fda28126fb
Author: Robert (Bobby) Evans 
Date:   2017-01-19T19:50:32Z

ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---