[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-28 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
That seems reasonable to me, I'll provide a separate PR for improving the 
connection loss thing in ZOOKEEPER-3157.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-28 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
+1 for @hanm 's concern.
Auto-retrying tests wouldn't be good here.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-28 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
>> since it's actually 'hiding' the flaky tests

This is my main concern. If we do this to all test cases, it's very likely 
there will be no follow ups or investigations on the flaky tests ever (it's 
just like a "TODO" in code that will never be done). Keep these flaky tests 
visible will remind us there are issues and keep a high bar for overall 
quality, as we've seen previously lots of flaky tests were caused by actual 
bugs. 

So I am leaning towards not introducing `junit.RetryRule` to the tests, at 
least, not to all tests (there might be exception case. for example, if there 
is a test we don't know how to fix, and community think it's fine to silent it 
after evaluating the impact of the test).



---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-28 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
@anmolnar thanks for the context, I'll provide a separate patch for 3.5 as 
well.

Also I created a JIRA to improve the test: ZOOKEEPER-3157.

As I mentioned in that JIRA, internally, we 'fixed' flaky test by adding 
junit.RetryRule in ZKTestCase, which is the base class for most of the tests. 
I'm not sure this is the right way to go, since it's actually 'hiding' the 
flaky tests, but this will help reducing the flaky tests a lot if we're not 
going to tackle it in short-term. And we can check the testing history to find 
out which tests are flaky and deal with them separately. So let me know if this 
is useful in short term, if it is I'll provide a patch to do that.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-22 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
Ooops, sorry @hanm , you're a little bit late, this one is already 
committed.
Would you or @lvfangmin mind creating a separate Jira for the flaky test 
fix?


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-20 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
@lvfangmin Errm. I would be happy to do that, but not sure how to. The 
merge script is able to cherrry-pick in the middle of the process, but I cannot 
start it over without a pull request. Again, I can go the manual way, but 
that's a lot more cumbersome than creating a new pull request quickly. Plus it 
has the benefit of automatically running pre-commit tests against the target 
branch.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-19 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
@anmolnar after #631 being merged into 3.5, this patch can be cherry-picked 
back to 3.5 without conflict, can you help cherry-pick it back?


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-18 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
@anmolnar I created a PR to port ZOOKEEPER-3127 to 3.5: #631, I'll port 
ZOOKEEPER-3125 back after that, because 3125 is depending on 3127.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-17 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
@anmolnar, sorry, just saw your comment, I'll send out a PR for 3.5 today.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-12 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
@lvfangmin Cool, thanks for the info, but unfortunately the commit script 
is quite strict and works with PRs. Maybe I could circumvent the process and 
commit the change manually, but I think it would be a lot easier to just create 
a separate PR.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-12 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
@anmolnar the conflict is due to the missing of FuzzySnapshotRelatedTest 
class introduced in ZOOKEEPER-3127, which is fixing the data inconsistency due 
to multi-op txn.

We need that in 3.5 as well, if you cherry pick that to 3.5 first, then we 
won't have conflict for this one.

Manually tested cherry pick ZOOKEEPER-3127 to 3.5 branch, there is a very 
minor conflict in QuorumPeerMainTest, it should be trivial to change while 
cherry-picking.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-12 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
Thanks @anmolnar, will create a PR for 3.5.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-12 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
@lvfangmin Committed to master. Conflicting with branch-3.5, please create 
separate pull request for that.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-11 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
Just reported another pzxid inconsistent issue related to CloseSession, I 
found that issue recently, will wait this being merged before providing that 
patch, which will make adding the unit test easier.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-11 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
Just found there is conflict to merge, I'm updating it now.

Btw, I just found 2 other potential data inconsistent issues about multi-op 
and ephemeral, will create Jiras and send out diffs soon.


---


[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-11 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/605
  
This could cause watch event not notified when setWatches because of stale 
pzxid, and all the tests have been passed, I think we should land this and port 
it back to other branches as early as possible.


---