[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

Michael Ho (Code Review) Tue, 03 Jul 2018 18:33:02 -0700

Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )


Change subject: IMPALA-7163: Implement a state machine for the QueryState class
......................................................................


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-exec-mgr.cc@128
PS4, Line 128:   qs->MonitorFInstances();
This seems to fit into IMPALA-4063 better. As the code stands in this patch, we 
are unnecessarily blocking this thread from exiting after starting all fragment 
instances. It makes sense in IMPALA-4063 as this is the thread responsible for 
aggregating and sending the profiles for all fragment instances.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@159
PS4, Line 159: IMPALA-2990
IMPALA-4063


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@278
PS4, Line 278:     CANCELLED,
No implication on whether all fragment instances have finished cancelling 
themselves, right ?


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@292
PS4, Line 292: IMPALA-2990
Did you mean IMPALA-4063 ? Does it mean this comment needs to be removed. May 
be convert the stuff which needs to be added in IMPALA-4063 as a TODO instead


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@350
PS4, Line 350: boost::scoped_ptr
std::unique_ptr


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@355
PS4, Line 355: boost::scoped_ptr
std::unique_ptr


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@a350
PS4, Line 350:
             :
Was this a bug ?! Will we hang in the for loop at line 363 below if we fail to 
create a thread ?


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@141
PS4, Line 141: instances_prepared_barrier_
As discussed offline, the behavior during "preparation" phase of the query is 
that we will wait for fragment instances to finish preparing even if some of 
them run into errors. So, in that sense, the existing code seems to suffice 
unless I misunderstood something.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@225
PS4, Line 225: DCHECK(true)
DCHECK(false) ?! Does this warrant a targeted BE test ?


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@230
PS4, Line 230: QueryState::HandleExecStateTransition
This function seems like a no-op in this patch. Can this be added in the next 
patch when necessary ?


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@278
PS4, Line 278:   if (!status.ok() && (!status.IsCancelled() || 
!IsTerminalState(old_state))) {
             :     VLOG_QUERY << Substitute("BackendExecState: query id=$0 
finstance=$1  ($2 -> $3) "
             :                              "status=$4",
             :         PrintId(query_id()),
             :         failed_instance != TUniqueId() ? 
PrintId(failed_instance) : "N/A",
             :         BackendExecStateToString(old_state), 
BackendExecStateToString(new_state),
             :         status.GetDetail());
Will this actually be duplicating the error message which we usually log 
already when there is an error ?


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@391
PS4, Line 391: return WaitForPrepareInternal().status_;
As discussed offline, there is no need to have a different implementation for 
waiting for all fis to finish preparing unless we can simplify it further 
somehow.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@434
PS4, Line 434: IMPALA-2990
IMPALA-4063 ?


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@502
PS4, Line 502: QueryState::MonitorFInstances()
Please see comments in query-exec-mgr.cc


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@542
PS4, Line 542:   // initiate cancellation if nobody has done so yet
             :   if (!status.ok()) Cancel();
Not sure what you think about the following. It appears to me that we may not 
necessarily need to UpdateBackendState() above to track the explicit state 
switching. Instead we can do something like below in this function and some add 
on to Cancel():

 if (!status.ok()) {
     Cancel();
     instances_finished_barrier_->NotifyRemaining(status);
 } else {
     instances_finished_barrier_->Notify();
 }

So, in IMPALA-4063, the query state thread will wait on this 
instances_finished_barrier_ after all fragment instances are done preparing. 
Once the wait returns, it may consider sending the profile right away if there 
is no error or if there is any error, kick off a timer to wait for FIS to 
finish cancelling before sending the final profile.



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Comment-Date: Wed, 04 Jul 2018 01:32:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

Reply via email to