Dan Hecht has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc ......................................................................
Patch Set 10: (71 comments) http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: PS10, Line 327: // if we get an error while trying to get a connection to the backend, : // keep going : what is this talking about? PS10, Line 352: keep on cancelling the other fragments seems misplaced now PS10, Line 354: eturn true; should this return false? we didn't actually succeed at sending the rpc, right? http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: PS10, Line 61: status GetStatus()? http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS10, Line 178: close i'm not sure what this is talking about. is it referring to CloseConsumer() or Close()? PS10, Line 179: it should : // cancel itself when it receives an error status after reporting its profile. I also don't see what this is trying to explain. oh, after reading FIS::Cancel() I guess this is referring to that code in there that will do CloseConsumer(). This is pretty confusing since I wouldn't assume that cancel on the FIS side is going to CloseConsumer. PS10, Line 183: Grab executor what is this referring to? PS10, Line 184: order to get a reference to coord_instance_ : // so that coord_sink_ remains valid throughout query lifetime. how does getting a reference to coord_instance_ affect the validity of coord_sink_? The FIS isn't ref counted, the QES is. But we already have the QES reference, so i'm not sure what this is talking about. PS10, Line 189: at this point, the query is done with the Prepare phase, why do we know that? Oh, because GetFinstanceState() blocks until Prepare? That's pretty subtle. PS10, Line 204: GetPrepareStatus this is a dangerous side-effect to have inside of a DCHECK -- it creates a barrier only present in debug builds. while this isn't bug, it means that release builds have a pretty big difference in behavior. Maybe we should rename GetPrepareStatus() to WaitForPrepare() or similar to highlight that side-effect and prevent this from happening. Line 263: } DCHECK(!has_coord_fragment || coord_backend_idx_ != -1) ? PS10, Line 413: backend_state->GetStatus(); what is this trying to do? Oh, I guess get the status of BackendState::Exec()? But this might also pick up the execution status of the FIS if it raced ahead fast enough, right? also is it really true this won't be CANCELLED? Couldn't we have FIS_0 hit an error during QIS::Prepare(), which sets its preapred_status_, which QES notices and calls QES::CancelInternal(), which calls FIS_1::Cancel(), which cases FIS_1 to send a report with CANCELLED as the status, which could arrive before the report for FIS_0 which contains the real error? PS10, Line 424: DCHECK(query_status_.ok()); // nobody should have been able to cancel why? because the query handle hasn't yet been returned? But what about via the debug webpage? PS10, Line 964: #if 0 : do we really want this? what are you going to do with this? http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: PS10, Line 210: query query execution? (as opposed to the query being closed) PS10, Line 299: /// True if execution has completed, false otherwise. : bool execution_completed_ = false; doesn't appear to be used http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: PS10, Line 547: ReleaseResources why is this called ReleaseResources and not Close? What are we saying the difference between those two terms are? http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: Like QES, I think the lifecycle state machine could be more clearly defined for FIS, but that can be refined later. PS10, Line 95: if (!status.ok() && !status.IsCancelled() && !status.IsMemLimitExceeded()) { : // Log error message in addition to returning in Status. Queries that do not fetch : // results (e.g. insert) may not receive the message directly and can only retrieve : // the log. : runtime_state_->LogError(status.msg()); : } you should delete this when rebasing e0d1db5 PS10, Line 104: // TODO: deal with final report failure, otherwise the coordinator doesn't know : // we're done is that a regression or something that didn't work before as well? PS10, Line 213: // TODO: why is this here? FragmentComplete() already releases the thread token : //ReleaseThreadToken(); from the comment above, it sounds like it's meant to intentionally release the token early. do you plan to delete this code or re-enable it? do we know longer see the underutilization? PS10, Line 315: // TODO: do not delete mem trackers in Close()/ReleaseResources(), they are part of : // the control structures we need to preserve until the underlying QueryState : // disappears. is this todo talking about the code here or the code that deletes the mem trackers? don't we still want to free the resources attached to the row_batch_ here, even if the mem-tracker control structures are kept around? PS10, Line 320: ReleaseResources not for this change, but again we seem to be mixing terminology close and release-resources, and need to define the various states more clearly. PS10, Line 331: DCHECK_LE(other_time, total_time); IMPALA-4631 is going to hit again. what do you plan to do about that? i don't think it's worth spending more time tracking that down right now. http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: PS10, Line 52: /// Tear-down happens automatically at the end of Exec() and frees all memory allocated : /// for this fragment instance and closes all data streams. Is this referring to Close()? So in FIS case, Close() releases resources? We need to sort out the lifecycle terminology. PS10, Line 77: The remaining non-getter public functions block until the Prepare phase finishes. this comment is kinda buried. maybe better to note that at the class level, or in each func comment like in QES.h. PS10, Line 84: GetPrepareStatus like QES, the blocking side-effect isn't obvious from the function name. I think calling code would be clearer if called WaitForPrepare() or similar. PS10, Line 91: GetOpenStatus same Line 121: RuntimeState* runtime_state_; // lives in obj_pool() it'd be nice to point out the lifetimes. i.e. valid after Prepare, if successful. is that true for all members, or are there exceptions? PS10, Line 149: by Prepare() PS10, Line 153: should live in obj_pool() doesn't the second clause invalidate this. i.e. this shouldn't live in obj_pool because we don't want it to have the same lifetime as the FIS. i.e. it's not purely a control structure, it's a resource. PS10, Line 159: . ... or to the Prepare status if Prepare() failed. PS10, Line 224: Blocks until Prepare() finishes. what does that mean? if Prepare() is run synchronously w.r.t. the caller of this so how could this block? PS10, Line 227: Can handle : /// partially-finished Prepare(). Blocks until Prepare() finishes. that seems to contradict. i.e. if it blocks, then Prepare() must be finished. Maybe it means partially-succeeded Prepare? http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: Line 111: //VLOG_QUERY << "AddChildTracker(): parent=" << this << " child=" << tracker << " " << tracker->label_ << "\n" << GetStackTrace(); more of these to remove http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS10, Line 51: Prepare why doesn't this happen on the other thread? won't codegen happen here eventually? I guess we want the prepare status to be returned synchronously from this RPC, so okay to leave this but wanted to check the plan. Line 57: // handler thread (= the caller of this function) let's mention that ownership of the qs reference count is passed to the StartQueryHelper thread. PS10, Line 59: exec-fragment-instance-$0 I don't understand the choice of name for this thread and the ones started in StartFInstances() (which is exec-query-finstances-$0). It seems the thread created here should be plural since it starts multiple instances, and the ones started by StartFInstances() should be singular since they each execute a single instance. How about "start-query-finstances-$0" for this one, and "exec-query-finstance-$0" for the other? http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-exec-mgr.h File be/src/runtime/query-exec-mgr.h: Line 74: /// 'created' is set to true if it was created, false otherwise. it'd be nice to document that this increments the refcount. http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS10, Line 195: // TODO: this might flood the log : LOG(WARNING) << "Couldn't get a client for " << query_ctx().coord_address : <<"\tReason: " << coord_status.GetDetail(); how do we plan to address that? what used to happen in this case? PS10, Line 208: status.SetTStatus(¶ms); why does 'status' get set into the params.status if it's specific to the fis? the comment in the thrift says this is only for the "overall status". PS10, Line 255: // TODO: should we try to keep rpc_status for the final report? (but the final : // report, following this Cancel(), may not succeed anyway.) what used to happen in this case? did we used to send the rpc_status back in some cases? are we in danger of ending the query early but not returning failure? what failure is returned as the ultimate query status in these cases where getting a client back to the coordinator fails? PS10, Line 306: exec-query-finstances-$0 as mentioned in query-exec-mgr, seems this should be not plural: "exec-query-finstance-$0" PS10, Line 311: don't return until every instance is prepared why is that important? we're already running on an async thread, so why should it matter if we return? i think maybe this is really talking about implementing the QES::prepared_promise_ barrier, in which case it would be good to just say that. Line 319: prepare_status = instance_status; It might simplify reasoning about lifetimes if we set some rules on when instances will start checking for cancellation. i.e. make it a rule that they don't check until after their Prepare(). (And thus couldn't possible return CANCELLED from GetPreparedStatus()). PS10, Line 323: returned status what is this referring to? PS10, Line 342: CancelInternal(); I guess this can happen before all instances have gotten through their Prepare. Is it necessary to not just call Cancel() (which would block), and get rid of CancelInternal()? That would reduce the combinations of possible concurrent phases and possible state transitions of the QES phases. It would even be nice to define that state machine of QES phases, which then simplifies reasoning about and documenting the lifetimes of QES fields (control-structures and resources). Line 343: ExecEnv::GetInstance()->query_exec_mgr()->ReleaseQueryState(this); would be nice to comment that this reference was taken in StartFInstances(). PS10, Line 353: if (!is_cancelled_.CompareAndSwap(0, 1)) return; cancellation is a bit more difficult to reason about now that CancelInternal() can return before the cancellation side-effects (at least those that are on the synchronous path) actually occur (when another thread won the race to setting is_cancelled_). FInstanceState::Cancel() also needs to be thread-safe, right? so why not just always call down and let the lower level synchronization take care of it? having this flag means we have two level of asynchronisity on the cancellation path. http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.h File be/src/runtime/query-state.h: I think defining a state machine for the lifecycle of this (and FIS) would go a long way, and then each field and operation can be documented against those states. Don't have to do that all now, but i have noted some places where maybe we could tighten up the transition into the cancelled state a bit. PS10, Line 54: all query state (contained : /// either in this class or accessible through this class, such as the : /// FragmentInstanceStates) is guaranteed to be alive. That's not the long term plan though, right? i.e. resources may be freed even when there is a reference (say, after all instances have completed). If so, how about a TODO to clarify the direction. PS10, Line 59: cancelled I think what it means to be "cancelled" is a bit vague, given the asynchronous nature of cancellation. maybe we can spell out the actual lifetime guarantee more explicitly. PS10, Line 106: Uses few cycles where will we eventually put codegen? Also, I think it's a bit confusing that the QES Prepare phase and QIS Prepare phase are unrelated. PS10, Line 112: Blocks until all fragment instances have finished : /// their Prepare phase. why is that important given that this runs on an async thread itself? PS10, Line 119: GetPrepareStatus This name is confusing given that it doesn't get the status for QES Prepare(). Also, as mentioned elsewhere, I think it would be good to highlight the barrier semantics by using the term "Block" or "Wait" in the name PS10, Line 132: query is complete what does that mean? when all fragment instances are complete, or when all references are released, or something else? We don't have to get bogged down on this change, but I think we should refine this soon to avoid confusion and set a clear direction. Line 146: /// including its profile. Add: Not legal to call until after Prepare is complete. PS10, Line 159: const TQueryCtx query_ctx_; why is this split out from rpc_params_? oh, I guess for the coordinator, hmm. let's document that this is set in the ctor, i.e. always valid. Line 174: DescriptorTbl* desc_tbl_ = nullptr; For all these fields that are set outside the ctor, how does an outsider know whether the fields are valid yet or not? I guess the assumption is that they are accessed only by FIS threads? Or are they guaranteed to be valid only after prepared_promise_ is set? Would be nice to spell out the lifetimes for an asynchronous access (even if that's to say that they can only be accessed by FIS threads). Line 177: MemTracker* query_mem_tracker_; would be nice to group the fields based on their lifetime, i.e. move this adjacent to query_ctx_ (these two field are always valid since set in ctor). PS10, Line 184: prepared_promise_ when reading the code it's hard to remember that this is a barrier for all FIS Prepare, not this object's Prepare(), given the field name. How about calling it instances_prepared_promise_ or similar? PS10, Line 194: ) Created in Prepare(). Line 199: /// Temporary files for this query (owned by obj_pool_) created in Prepare(). Line 203: For all these fields that are created after the ctor, how does an outsider know whether the fields are valid yet or not? I guess the assumption is that they are accessed only by FIS threads? PS10, Line 222: latter former? PS10, Line 228: Cancels Initiates cancellation ... since the is_cancelled_ bit can cause this thread to race ahead and return before the thread that set is_cancelled_ actually does anything. Also, is it legal to call this before prepare_promise_ is set, or not? I guess it is and that's why it's split out from Cancel()? Is that really required (more comments about that in the cc file)? http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: PS10, Line 335: DCHECK(query_state_ != nullptr); why? (was this meant to go in query_mem_tracker()?) http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS10, Line 99: Return the query's ObjectPool i guess that means only control structures, not resources, should be put in there. Is that currently true? http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/service/impala-internal-service.cc File be/src/service/impala-internal-service.cc: PS10, Line 44: // are you removing these or putting them back? http://gerrit.cloudera.org:8080/#/c/6535/10/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: PS10, Line 339: TExecQueryFragmentsParams ? PS10, Line 398: s delete 's' -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
