Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9718 )
Change subject: IMPALA-6338: Fix flaky test_profile_fragment_instances ...................................................................... Patch Set 1: (10 comments) Had some comments on the overall approach. http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/common/status.h File be/src/common/status.h: http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/common/status.h@98 PS1, Line 98: CANCELLED should not be reported by a fragment Can we enforce this by checking that instances only report cancellation after the query has been cancelled by the coordinator? http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.h@119 PS1, Line 119: boost::mutex* lock() { return &lock_; } If we keep this, I think this could help prevent coding errors by handing out scoped lock guards instead. http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@237 PS1, Line 237: return num_remaining_instances_ == 0 || (!status_.ok() && !status_.IsCancelled()); Is it possible to move the !status_.IsCancelled() to where we expect it to have an effect? That way we wouldn't have to change the definition of IsDone() to exclude IsCancelled(). http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@248 PS1, Line 248: if (IsDone()) return false; Here we would add && !status.IsCancelled(). http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@249 PS1, Line 249: for (const TFragmentInstanceExecStatus& instance_exec_status: Instead of acquiring all locks in ComputeQuerySummary(), we could acquire a lock on the root profile children here. http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@251 PS1, Line 251: instance_status Can we check here that (!instance_status.IsCancelled() || status_.IsCancelled()) ? http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@284 PS1, Line 284: // We can't update this backend's profile if ReportQuerySummary() is running, This comment seems outdated. http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@314 PS1, Line 314: return IsDone(); The new definition of IsDone() will return false for cancelled backends here, whereas it returned true before. This will prevent the coordinator from decreasing num_remaining_backends_ in UpdateBackendExecStatus(). http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@397 PS1, Line 397: || status_.IsCancelled() This could then be removed. http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator.cc@1026 PS1, Line 1026: if (backend_states_.empty()) return; We could acquire a lock on the root profile of the fragment stats here instead (see my comments in the backend state). -- To view, visit http://gerrit.cloudera.org:8080/9718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f8295e4ec580b476f7668c77aec1348fb2e868c Gerrit-Change-Number: 9718 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Tue, 20 Mar 2018 00:33:30 +0000 Gerrit-HasComments: Yes
