Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )
Change subject: IMPALA-4063: Merge report of query fragment instances per executor ...................................................................... Patch Set 1: (11 comments) Had a few comments. Overall direction of the patch makes sense to me http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/coordinator-backend-state.cc@277 PS1, Line 277: lock_guard<SpinLock> l1(exec_summary->lock); Isn't this just reverting the change that the previous patch made? http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@93 PS1, Line 93: /// Called periodically to get the current status of this fragment instance. Since we're describing the usage pattern, might as well mention which thread is calling it. http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@159 PS1, Line 159: bool final_report_sent_ = false; What's the thread safety story here? http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@97 PS1, Line 97: intance instance http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@239 PS1, Line 239: nit: unnecessary blank line http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@243 PS1, Line 243: void FragmentInstanceState::GetStatusReport(FragmentInstanceExecStatusPB* instance_status, nit: could probably reduce vertical whitespace in this function. http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc@128 PS1, Line 128: void QueryExecMgr::StartQueryHelper(QueryState* qs) { I feel like we should rename this function since it now runs until the query finishes on this backend, e.g. ExecuteQueryHelper() http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@91 PS1, Line 91: /// the final report (it will not be overridden by the resulting cancellation). Is there anything short we can say here to clarify the relationship with the overall state on the coordinator? http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@418 PS1, Line 418: bool IsStatusSet() const { Maybe something like "HasErrorStatus()". It's confusing that setting overall_status_ to cancelled counts as "not set". http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@419 PS1, Line 419: return !overall_status_.ok() && !overall_status_.IsCancelled(); This is called without holding status_lock_, right? I don't think IsCancelled() is thread-safe? Since we could could overwrite msg_ with an error status from a different thread and free the previous msg_ object. I don't really like this pattern of doing unlocked reads to Status objects given that the interface of status is not documented as thread-safe - I get that .ok() works since Status is implemented as a pointer and we're on x86 but I think we need to clarify the interfaces - feels like it wouldn't take much for someone to invalidate one of the assumptions made by accident. I wonder if the optimisation of deferring acquisition of the lock really matters. If we think it's necessary, it seems like we should document which Status methods are threadsafe, and change the implementation to use atomics so that the memory ordering assumptions are explicit. http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@236 PS1, Line 236: if (overall_status_.IsCancelled()) { Can we refine this with the internal/client-driven cancellation distinction that was added here https://gerrit.cloudera.org/#/c/11464/ -- To view, visit http://gerrit.cloudera.org:8080/11615 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa Gerrit-Change-Number: 11615 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Mon, 08 Oct 2018 22:56:38 +0000 Gerrit-HasComments: Yes