Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )
Change subject: IMPALA-4063: Merge report of query fragment instances per executor ...................................................................... Patch Set 1: (11 comments) http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@93 PS1, Line 93: /// Called periodically to get the current status of this fragment instance. > Since we're describing the usage pattern, might as well mention which threa Done http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@159 PS1, Line 159: bool final_report_sent_ = false; > What's the thread safety story here? Added some comments. This is exclusively accessed by the query state thread only. http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@97 PS1, Line 97: intance > instance Done http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@239 PS1, Line 239: > nit: unnecessary blank line Done http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@243 PS1, Line 243: void FragmentInstanceState::GetStatusReport(FragmentInstanceExecStatusPB* instance_status, > nit: could probably reduce vertical whitespace in this function. Done http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc@128 PS1, Line 128: void QueryExecMgr::StartQueryHelper(QueryState* qs) { > I feel like we should rename this function since it now runs until the quer Done http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@353 PS1, Line 353: /// 'overall_status_' will be set once this is unblocked and so will 'failed_instance_id_' > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@418 PS1, Line 418: bool IsStatusSet() const { > Maybe something like "HasErrorStatus()". It's confusing that setting overal Done http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@419 PS1, Line 419: return !overall_status_.ok() && !overall_status_.IsCancelled(); > This is called without holding status_lock_, right? I don't think IsCancell Good point. I think the optimization doesn't really matter that much given it's only exercised in the error path. It's okay to have a minor bit of lock contention if multiple fragment instances hit errors at the same time. http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@a553 PS1, Line 553: : The deletion of this line warrants some explanation: In the old code, if there is any error returned from fis->Exec(), the final report for "fis" would have been sent already so the coordinator will already record the failure of "fis" before the cancellation is issued. In the new code, the reporting is done periodically by the query state thread so the final report may or may not be sent at this point after "fis" hit an error above. Calling Cancel() here may actually cause the coordinator fragment (if there is one) to prematurely set the backend status at the coordinator, causing Coordinator::BackendState::ApplyExecStatusReport() to ignore subsequent updates from "fis". As a result, the actual error status was masked. The implicit contract in the code (even before this change) is that the final report of the first fragment instance to hit an (non-cancellation) error must have been sent before initiating the cancellation. Will add a comment here. http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@236 PS1, Line 236: if (overall_status_.IsCancelled()) { > Can we refine this with the internal/client-driven cancellation distinction Sorry, I don't quite get this comment. In either cases, won't the state still transit to BackendExecState::CANCELLED ? -- To view, visit http://gerrit.cloudera.org:8080/11615 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa Gerrit-Change-Number: 11615 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Wed, 10 Oct 2018 08:11:17 +0000 Gerrit-HasComments: Yes