Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10813 )
Change subject: IMPALA-7163: Implement a state machine for the QueryState class ...................................................................... Patch Set 3: (21 comments) I wasn't able to get through another entire iteration but here's some comments. Michael will have to take over this review. http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@65 PS3, Line 65: query state I think we should avoid using the term "query state" given that this class is called QueryState but we mean something else :) How about just saying "a state" and then refer to it as BackendExecState or backend exec state in future references? http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@65 PS3, Line 65: BackedExecState Backend http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@68 PS3, Line 68: to the corresponding error are there multiple error states? corresponding implies that. Or are you including CANCELLED here? http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@72 PS3, Line 72: This is to avoid bugs in : /// the query lifecycle that were fixed as part of IMPALA-2550. Let's not state in terms of past code. How about: This is to simplify the query lifecycle so that Prepare() is always completed before handling a cancel RPC. (I assume that's what we're talking about?) http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@149 PS3, Line 149: Blocks until all fragment instances have finished their Prepare phase. put another way, I guess this blocks until threads have exited INITIALIZED? Does it make sense to keep StartFInstances() and MonitorFInstances() separately? http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@155 PS3, Line 155: /// Not idempotent, not thread-safe. Must only be called by a single thread. it it required that this is called after StartFInstances()? http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@210 PS3, Line 210: typedef struct FInstanceStatus FInstanceStatus; Is that needed in C++? You'd do it for C but I think C++ effectively includes the typedef automatically. http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@213 PS3, Line 213: Function nit: here and below you could delete the word "Function" since that's obvious http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@266 PS3, Line 266: Cancel Isn't Cancel() called in case of error as well? If so, does this mean we go through CANCELLED and ERROR in that case? If not, should this say CancelFInstances() RPC was handled while executing. http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@271 PS3, Line 271: query_exec_state_ how about backend_exec_state_ for consistency? http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@284 PS3, Line 284: & here and elsewhere: given BackendExecState is a scalar (enum) value, should just pass by value. http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@286 PS3, Line 286: form from http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@305 PS3, Line 305: status overall status http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@306 PS3, Line 306: /// Status::OK if all the fragment instances managed by this QS are also Status::OK otherwise it's the status of the first non-OK FIS? Will this thing become a FInstancesStatus (so you can report back which FIS failed)? http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.cc@437 PS2, Line 437: : } : return prepare_status; : } : : QueryState::FInstanceStatus QueryState::WaitForFinishInternal() { : return instances_finished_barrier_->Wait(); : } : : void QueryState::StartFInstances() { : VLOG_QUERY << "StartFInstances(): query_id=" << PrintId(query_id()) : << " #instances=" << rpc_params_.fragment_instance_ctxs.size(); : DCHECK_GT(refcnt_.Load(), 0); : DCHECK_GT(exec_resource_refcnt_.Load(), 0) << "Should have been taken in Init()"; : > The Prepare() case is special as we historically don't let any action happe My point was shouldn't the counting barrier take care of that for us? I.e. call Notify() rather than NotifyRemaining(). This loop is effectively implementing the barrier in yet another way. I get that you want to have the status from the first failed but wait for the last to finish prepare, but it still seems like it should be expressible using the barrier mechanism. http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.cc@586 PS2, Line 586: } : : Status QueryState::StartSpilling(RuntimeState* runtime_state, MemTracker* mem_tracker) { : // Return an error message with the root cause of why spilling is disabled. : if (query_options().scratch_limit == 0) { : return mem_tracker->MemLimitExceeded( > The Release*() calls are per fragment instance whereas the state machine im Right, the references are held by FIS for resources shared across the query. I think we would need a (STOPPED or FINALIZED or whatever), that requires all FIS to pass through the corresponding barrier. (BTW, I didn't mean to highlight the ReleaseQueryState() part - that reference will still be needed and is the control structure reference. The other reference is for resources, though, and so seems like it should fit into the backend exec state machine. And the Cancel() call too. http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@201 PS3, Line 201: #ifdef NDEBUG : #define IS_LEGAL_EXEC_STATE_TRANSITION(old_state, new_state) \ : DCheckLegalExecStateTransition(old_state, new_state) : #else : #define IS_LEGAL_EXEC_STATE_TRANSITION(old_state, new_state) : #endif given that the function is inlined this is probably unnecessary. dead code elimination should remove the if-stmts surrounding the DCHECK on release builds. http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@226 PS3, Line 226: new_state != BackendExecState::CANCELLED && new_state != BackendExecState::ERROR why doesn't that include FINISHED? This check doesn't seem very intuitive to me - I had to think about it to understand. Actually, can't we just DCHECK(!IsTerminalState(old_state) at the beginning? Is there a simpler way of expressing this? Actually, this doesn't seem necessary since it's just checking what we already did and is non-trivial logic to do so (i.e. the verification code seems almost more complex than the code it's verifying :) http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@266 PS3, Line 266: UpdateQueryState how about UpdateBackendExecState (since QueryState is the name of the class but this function is talking about the BackendExecState specifically, I think). http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@290 PS3, Line 290: case BackendExecState::CANCELLED: how do we get into the cancelled state? http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@457 PS3, Line 457: DCHECK You shouldn't call stuff with side effects inside DCHECK. In release builds, the UpdateQueryState() call won't happen. -- To view, visit http://gerrit.cloudera.org:8080/10813 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe Gerrit-Change-Number: 10813 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Comment-Date: Sat, 30 Jun 2018 00:04:16 +0000 Gerrit-HasComments: Yes
