Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10813 )
Change subject: IMPALA-7163: Implement a state machine for the QueryState class ...................................................................... Patch Set 4: (15 comments) http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-exec-mgr.cc@128 PS4, Line 128: qs->MonitorFInstances(); This seems to fit into IMPALA-4063 better. As the code stands in this patch, we are unnecessarily blocking this thread from exiting after starting all fragment instances. It makes sense in IMPALA-4063 as this is the thread responsible for aggregating and sending the profiles for all fragment instances. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@159 PS4, Line 159: IMPALA-2990 IMPALA-4063 http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@278 PS4, Line 278: CANCELLED, No implication on whether all fragment instances have finished cancelling themselves, right ? http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@292 PS4, Line 292: IMPALA-2990 Did you mean IMPALA-4063 ? Does it mean this comment needs to be removed. May be convert the stuff which needs to be added in IMPALA-4063 as a TODO instead http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@350 PS4, Line 350: boost::scoped_ptr std::unique_ptr http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@355 PS4, Line 355: boost::scoped_ptr std::unique_ptr http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@a350 PS4, Line 350: : Was this a bug ?! Will we hang in the for loop at line 363 below if we fail to create a thread ? http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@141 PS4, Line 141: instances_prepared_barrier_ As discussed offline, the behavior during "preparation" phase of the query is that we will wait for fragment instances to finish preparing even if some of them run into errors. So, in that sense, the existing code seems to suffice unless I misunderstood something. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@225 PS4, Line 225: DCHECK(true) DCHECK(false) ?! Does this warrant a targeted BE test ? http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@230 PS4, Line 230: QueryState::HandleExecStateTransition This function seems like a no-op in this patch. Can this be added in the next patch when necessary ? http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@278 PS4, Line 278: if (!status.ok() && (!status.IsCancelled() || !IsTerminalState(old_state))) { : VLOG_QUERY << Substitute("BackendExecState: query id=$0 finstance=$1 ($2 -> $3) " : "status=$4", : PrintId(query_id()), : failed_instance != TUniqueId() ? PrintId(failed_instance) : "N/A", : BackendExecStateToString(old_state), BackendExecStateToString(new_state), : status.GetDetail()); Will this actually be duplicating the error message which we usually log already when there is an error ? http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@391 PS4, Line 391: return WaitForPrepareInternal().status_; As discussed offline, there is no need to have a different implementation for waiting for all fis to finish preparing unless we can simplify it further somehow. http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@434 PS4, Line 434: IMPALA-2990 IMPALA-4063 ? http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@502 PS4, Line 502: QueryState::MonitorFInstances() Please see comments in query-exec-mgr.cc http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@542 PS4, Line 542: // initiate cancellation if nobody has done so yet : if (!status.ok()) Cancel(); Not sure what you think about the following. It appears to me that we may not necessarily need to UpdateBackendState() above to track the explicit state switching. Instead we can do something like below in this function and some add on to Cancel(): if (!status.ok()) { Cancel(); instances_finished_barrier_->NotifyRemaining(status); } else { instances_finished_barrier_->Notify(); } So, in IMPALA-4063, the query state thread will wait on this instances_finished_barrier_ after all fragment instances are done preparing. Once the wait returns, it may consider sending the profile right away if there is no error or if there is any error, kick off a timer to wait for FIS to finish cancelling before sending the final profile. -- To view, visit http://gerrit.cloudera.org:8080/10813 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe Gerrit-Change-Number: 10813 Gerrit-PatchSet: 4 Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Comment-Date: Wed, 04 Jul 2018 01:32:00 +0000 Gerrit-HasComments: Yes
