Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/13061 )
Change subject: IMPALA-7665: Fix unwarranted query cancellation on statestore restart ...................................................................... Patch Set 2: (15 comments) http://gerrit.cloudera.org:8080/#/c/13061/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13061/1//COMMIT_MSG@17 PS1, Line 17: Testing: > Just to confirm, this test failed before your change? yup, fails after the first statestore restart http://gerrit.cloudera.org:8080/#/c/13061/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/13061/1/be/src/service/impala-server.cc@235 PS1, Line 235: running on backends > running on Done http://gerrit.cloudera.org:8080/#/c/13061/1/be/src/service/impala-server.cc@236 PS1, Line 236: ould be large enough to g > This value should be large enough to give the statestore... Done http://gerrit.cloudera.org:8080/#/c/13061/1/be/src/statestore/statestore-subscriber.h File be/src/statestore/statestore-subscriber.h: http://gerrit.cloudera.org:8080/#/c/13061/1/be/src/statestore/statestore-subscriber.h@130 PS1, Line 130: int64 > int64_t is the standard one, I guess int64 comes from some header we've pul Done http://gerrit.cloudera.org:8080/#/c/13061/1/be/src/statestore/statestore-subscriber.h@215 PS1, Line 215: AtomicInt64 last_registration_ms_{0}; > I think last_registration_ms_ would also convey that it's a timestamp, but Done http://gerrit.cloudera.org:8080/#/c/13061/1/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/13061/1/tests/custom_cluster/test_restart_services.py@26 PS1, Line 26: > flake8: F811 redefinition of unused 'time' from line 23 Done http://gerrit.cloudera.org:8080/#/c/13061/1/tests/custom_cluster/test_restart_services.py@98 PS1, Line 98: > flake8: E131 continuation line unaligned for hanging indent Done http://gerrit.cloudera.org:8080/#/c/13061/1/tests/custom_cluster/test_restart_services.py@100 PS1, Line 100: timeout_s=SUBSCRIBER_TIMEOUT_S)) > If you didn't already, can you loop this test for a while, maybe with 'stre done. Also updated commit message accordingly http://gerrit.cloudera.org:8080/#/c/13061/1/tests/custom_cluster/test_restart_services.py@103 PS1, Line 103: l not r > nit: backend or impalad (lowercase d) Done http://gerrit.cloudera.org:8080/#/c/13061/1/tests/custom_cluster/test_restart_services.py@112 PS1, Line 112: sure query starts running. > Does this make sure that the query is actually running, or is there a small This would work too, but using wait_for_state would also help eliminate the assert under this http://gerrit.cloudera.org:8080/#/c/13061/1/tests/custom_cluster/test_restart_services.py@117 PS1, Line 117: > Might be slightly cleaner to also default CANCELLATION_GRACE_PERIOD_S above Done http://gerrit.cloudera.org:8080/#/c/13061/1/tests/custom_cluster/test_restart_services.py@117 PS1, Line 117: > flake8: E226 missing whitespace around arithmetic operator Done http://gerrit.cloudera.org:8080/#/c/13061/1/tests/custom_cluster/test_restart_services.py@119 PS1, Line 119: # Now restart statestore and kill a backend while it is down, and make sure the > We might need to reconcile this with IMPALA-2990, which adds another mechan Yup, after IMPALA-2990 we can probably just skip cancelling running queries and just reschedule the ones getting scheduled or in queue. http://gerrit.cloudera.org:8080/#/c/13061/1/tests/custom_cluster/test_restart_services.py@122 PS1, Line 122: self.cluster.statestored.kill() : self.cluster.impalads[1].kill() : self.cluster.statestored.start() > I think it would be interesting to test the case where the statestore comes done.The second one is implicitly verified since the statestore is restarted 3 times in this test and the time elapsed is verified. Let me know if you think I should add a separate test for that. http://gerrit.cloudera.org:8080/#/c/13061/1/tests/custom_cluster/test_restart_services.py@130 PS1, Line 130: > flake8: E226 missing whitespace around arithmetic operator Done -- To view, visit http://gerrit.cloudera.org:8080/13061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30b68bd8bde4bf589d58d42d6f683afb166de959 Gerrit-Change-Number: 13061 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Tue, 30 Apr 2019 00:26:17 +0000 Gerrit-HasComments: Yes
