[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/605 That seems reasonable to me, I'll provide a separate PR for improving the connection loss thing in ZOOKEEPER-3157. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/605 +1 for @hanm 's concern. Auto-retrying tests wouldn't be good here. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/605 >> since it's actually 'hiding' the flaky tests This is my main concern. If we do this to all test cases, it's very likely there will be no follow ups or investigations on the flaky tests ever (it's just like a "TODO" in code that will never be done). Keep these flaky tests visible will remind us there are issues and keep a high bar for overall quality, as we've seen previously lots of flaky tests were caused by actual bugs. So I am leaning towards not introducing `junit.RetryRule` to the tests, at least, not to all tests (there might be exception case. for example, if there is a test we don't know how to fix, and community think it's fine to silent it after evaluating the impact of the test). ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/605 @anmolnar thanks for the context, I'll provide a separate patch for 3.5 as well. Also I created a JIRA to improve the test: ZOOKEEPER-3157. As I mentioned in that JIRA, internally, we 'fixed' flaky test by adding junit.RetryRule in ZKTestCase, which is the base class for most of the tests. I'm not sure this is the right way to go, since it's actually 'hiding' the flaky tests, but this will help reducing the flaky tests a lot if we're not going to tackle it in short-term. And we can check the testing history to find out which tests are flaky and deal with them separately. So let me know if this is useful in short term, if it is I'll provide a patch to do that. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/605 Ooops, sorry @hanm , you're a little bit late, this one is already committed. Would you or @lvfangmin mind creating a separate Jira for the flaky test fix? ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/605 @lvfangmin Errm. I would be happy to do that, but not sure how to. The merge script is able to cherrry-pick in the middle of the process, but I cannot start it over without a pull request. Again, I can go the manual way, but that's a lot more cumbersome than creating a new pull request quickly. Plus it has the benefit of automatically running pre-commit tests against the target branch. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/605 @anmolnar after #631 being merged into 3.5, this patch can be cherry-picked back to 3.5 without conflict, can you help cherry-pick it back? ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/605 @anmolnar I created a PR to port ZOOKEEPER-3127 to 3.5: #631, I'll port ZOOKEEPER-3125 back after that, because 3125 is depending on 3127. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/605 @anmolnar, sorry, just saw your comment, I'll send out a PR for 3.5 today. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/605 @lvfangmin Cool, thanks for the info, but unfortunately the commit script is quite strict and works with PRs. Maybe I could circumvent the process and commit the change manually, but I think it would be a lot easier to just create a separate PR. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/605 @anmolnar the conflict is due to the missing of FuzzySnapshotRelatedTest class introduced in ZOOKEEPER-3127, which is fixing the data inconsistency due to multi-op txn. We need that in 3.5 as well, if you cherry pick that to 3.5 first, then we won't have conflict for this one. Manually tested cherry pick ZOOKEEPER-3127 to 3.5 branch, there is a very minor conflict in QuorumPeerMainTest, it should be trivial to change while cherry-picking. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/605 Thanks @anmolnar, will create a PR for 3.5. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/605 @lvfangmin Committed to master. Conflicting with branch-3.5, please create separate pull request for that. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/605 Just reported another pzxid inconsistent issue related to CloseSession, I found that issue recently, will wait this being merged before providing that patch, which will make adding the unit test easier. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/605 Just found there is conflict to merge, I'm updating it now. Btw, I just found 2 other potential data inconsistent issues about multi-op and ephemeral, will create Jiras and send out diffs soon. ---
[GitHub] zookeeper issue #605: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/605 This could cause watch event not notified when setWatches because of stale pzxid, and all the tests have been passed, I think we should land this and port it back to other branches as early as possible. ---