Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
......................................................................


Patch Set 3:

(21 comments)

I wasn't able to get through another entire iteration but here's some comments. 
Michael will have to take over this review.

http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@65
PS3, Line 65: query state
I think we should avoid using the term "query state" given that this class is 
called QueryState but we mean something else :) How about just saying "a state" 
and then refer to it as BackendExecState or backend exec state in future 
references?


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@65
PS3, Line 65: BackedExecState
Backend


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@68
PS3, Line 68: to the corresponding error
are there multiple error states? corresponding implies that. Or are you 
including CANCELLED here?


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@72
PS3, Line 72:  This is to avoid bugs in
            : /// the query lifecycle that were fixed as part of IMPALA-2550.
Let's not state in terms of past code. How about:
This is to simplify the query lifecycle so that Prepare() is always completed 
before handling a cancel RPC.
(I assume that's what we're talking about?)


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@149
PS3, Line 149: Blocks until all fragment instances have finished their Prepare 
phase.
put another way, I guess this blocks until threads have exited INITIALIZED?

Does it make sense to keep StartFInstances() and MonitorFInstances() separately?


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@155
PS3, Line 155:   /// Not idempotent, not thread-safe. Must only be called by a 
single thread.
it it required that this is called after StartFInstances()?


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@210
PS3, Line 210: typedef struct FInstanceStatus FInstanceStatus;
Is that needed in C++?  You'd do it for C but I think C++ effectively includes 
the typedef automatically.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@213
PS3, Line 213: Function
nit: here and below you could delete the word "Function" since that's obvious


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@266
PS3, Line 266: Cancel
Isn't Cancel() called in case of error as well? If so, does this mean we go 
through CANCELLED and ERROR in that case?
If not, should this say CancelFInstances() RPC was handled while executing.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@271
PS3, Line 271: query_exec_state_
how about backend_exec_state_ for consistency?


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@284
PS3, Line 284: &
here and elsewhere: given BackendExecState is a scalar (enum) value, should 
just pass by value.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@286
PS3, Line 286: form
from


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@305
PS3, Line 305: status
overall status


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@306
PS3, Line 306:   /// Status::OK if all the fragment instances managed by this 
QS are also Status::OK
otherwise it's the status of the first non-OK FIS?
Will this thing become a FInstancesStatus (so you can report back which FIS 
failed)?


http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.cc@437
PS2, Line 437:
             :   }
             :   return prepare_status;
             : }
             :
             : QueryState::FInstanceStatus QueryState::WaitForFinishInternal() {
             :   return instances_finished_barrier_->Wait();
             : }
             : 
             : void QueryState::StartFInstances() {
             :   VLOG_QUERY << "StartFInstances(): query_id=" << 
PrintId(query_id())
             :       << " #instances=" << 
rpc_params_.fragment_instance_ctxs.size();
             :   DCHECK_GT(refcnt_.Load(), 0);
             :   DCHECK_GT(exec_resource_refcnt_.Load(), 0) << "Should have 
been taken in Init()";
             :
> The Prepare() case is special as we historically don't let any action happe
My point was shouldn't the counting barrier take care of that for us? I.e. call 
Notify() rather than NotifyRemaining(). This loop is effectively implementing 
the barrier in yet another way. I get that you want to have the status from the 
first failed but wait for the last to finish prepare, but it still seems like 
it should be expressible using the barrier mechanism.


http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.cc@586
PS2, Line 586: }
             :
             : Status QueryState::StartSpilling(RuntimeState* runtime_state, 
MemTracker* mem_tracker) {
             :   // Return an error message with the root cause of why spilling 
is disabled.
             :   if (query_options().scratch_limit == 0) {
             :     return mem_tracker->MemLimitExceeded(
> The Release*() calls are per fragment instance whereas the state machine im
Right, the references are held by FIS for resources shared across the query.

I think we would need a (STOPPED or FINALIZED or whatever), that requires all 
FIS to pass through the corresponding barrier.

(BTW, I didn't mean to highlight the ReleaseQueryState() part - that reference 
will still be needed and is the control structure reference.

The other reference is for resources, though, and so seems like it should fit 
into the backend exec state machine. And the Cancel() call too.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@201
PS3, Line 201: #ifdef NDEBUG
             : #define IS_LEGAL_EXEC_STATE_TRANSITION(old_state, new_state) \
             :   DCheckLegalExecStateTransition(old_state, new_state)
             : #else
             : #define IS_LEGAL_EXEC_STATE_TRANSITION(old_state, new_state)
             : #endif
given that the function is inlined this is probably unnecessary. dead code 
elimination should remove the if-stmts surrounding the DCHECK on release builds.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@226
PS3, Line 226: new_state != BackendExecState::CANCELLED && new_state != 
BackendExecState::ERROR
why doesn't that include FINISHED? This check doesn't seem very intuitive to me 
- I had to think about it to understand.

Actually, can't we just DCHECK(!IsTerminalState(old_state) at the beginning?

Is there a simpler way of expressing this? Actually, this doesn't seem 
necessary since it's just checking what we already did and is non-trivial logic 
to do so (i.e. the verification code seems almost more complex than the code 
it's verifying :)


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@266
PS3, Line 266: UpdateQueryState
how about UpdateBackendExecState (since QueryState is the name of the class but 
this function is talking about the BackendExecState specifically, I think).


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@290
PS3, Line 290:     case BackendExecState::CANCELLED:
how do we get into the cancelled state?


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@457
PS3, Line 457: DCHECK
You shouldn't call stuff with side effects inside DCHECK. In release builds, 
the UpdateQueryState() call won't happen.



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Comment-Date: Sat, 30 Jun 2018 00:04:16 +0000
Gerrit-HasComments: Yes

Reply via email to