[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

Michael Ho (Code Review) Fri, 20 Jul 2018 18:56:29 -0700

Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )


Change subject: IMPALA-7163: Implement a state machine for the QueryState class
......................................................................


Patch Set 6:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/fragment-instance-state.cc@a113
PS6, Line 113:
May make sense to replace this by checking the state. Doing a racy read of the 
state should be fine.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@198
PS6, Line 198:  struct FInstanceStatus {
If you take the suggestion to only keep a single 'query_status_', there doesn't 
seem to be need for this struct. We can replace it with a single class member 
to record the fragment instance ID of the first failed fragment instance.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@212
PS6, Line 212: TUniqueId finst_id_;
This is only meaningful when there is an error status, right ? Please document 
it.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@234
PS6, Line 234: FragmentInstanceState
fragment instance


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@271
PS6, Line 271: INITIALIZED,
Consider calling it "PREPARING"


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@274
PS6, Line 274: PREPARED,
Consider calling it "EXECUTING"


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@308
PS6, Line 308:   /// The overall status of the Prepare phase for this query 
state.
             :   FInstanceStatus prepare_status_;
             :
             :   /// The overall status of the Exec phase (after Prepare()) for 
this query state.
             :   FInstanceStatus exec_status_;
             :
             :   /// Protects 'prepare_status_' and 'exec_status_'.
             :   SpinLock status_lock_;
It seems simpler to keep track of a single status for the first error hit 
instead of tracking the errors in different phases.

The behavior in the existing code is that the first fragment instance which 
hits the error will report it to the coordinator directly. In theory, a 
fragment instance F1 can hit an error during exec after prepare phase is done 
while another fragment instance F2 can hit error during its prepare phase after 
F1 hits the error. The new patch seems to set query_status_ to the error of F2 
while the coordinator will get the error of F1. So, this seems to cause 
divergence with the coordinator.

So, in that sense, it's necessary to keep a single status only.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@320
PS6, Line 320:   FInstanceStatus query_status_;
Mind documenting the thread safety about this variable ?


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@238
PS6, Line 238:     case BackendExecState::INITIALIZED:
             :     case BackendExecState::PREPARED:
             :       DCHECK(query_status_.ok()) << query_status_.ToString();
             :       if (!status.ok()) {
             :         // Error while executing - go to ERROR state.
             :         query_status_ = finst_status;
             :         backend_exec_state_ = BackendExecState::ERROR;
Can this be merged with TransitionNonTerminalState() above as:

 if (query_status_.IsCancelled()) {
     new_state = BackendExecState::CANCELLED;
 } else if (!query_status_.ok()) {
     new_state = BackendExecState::ERROR;
 } else {
     new_state = state == PREPARING ? EXECUTING : FINISHED;
 }


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@250
PS6, Line 250: case BackendExecState::FINISHED
Under what circumstances would we call QueryState::UpdateBackendExecState() 
after entering a terminal state ?


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@457
PS6, Line 457: ErrorDuringPrepare(
Won't it be a problem if we call ErrorDuringPrepare() here as it only bump the 
instances_prepared_barrier_ by one instead of the number of remaining fragment 
instances ? In that sense, won't caller of WaitForPrepare() wait forever ?

Also, the thread creation failure path warrants a targeted test case. May be a 
debug action can be added in Thread::Create().


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@460
PS6, Line 460: return;
Seems wrong to return here. Don't we need to call ReportExecStatusAux() like we 
did in the past.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@463
PS6, Line 463:   all_fis_status = WaitForPrepareInternal();
             :   if (!UpdateBackendExecState(all_fis_status).ok()) return;
Seems unnecessary to be copying all_fis_status here and having two separate 
interfaces (i.e. WaitForPrepareInternal() and WaitForPrepare());

How about we just have WaitForPrepare() like the following:

 Status QueryState::WaitForPrepare() {
   instances_prepared_barrier_->Wait();
   unique_lock<SpinLock> l(status_lock_);
   return query_status_;
 }



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 6
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Comment-Date: Sat, 21 Jul 2018 01:56:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

Reply via email to