[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

Dan Hecht (Code Review) Fri, 28 Apr 2017 17:26:37 -0700

Dan Hecht has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
......................................................................



Patch Set 10:

(71 comments)

http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

PS10, Line 327:  // if we get an error while trying to get a connection to the 
backend,
              :   // keep going
              :  
what is this talking about?


PS10, Line 352: keep on cancelling the other fragments
seems misplaced now


PS10, Line 354: eturn true;
should this return false? we didn't actually succeed at sending the rpc, right?


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

PS10, Line 61: status
GetStatus()?


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS10, Line 178: close
i'm not sure what this is talking about. is it referring to CloseConsumer() or 
Close()?


PS10, Line 179: it should
              :   // cancel itself when it receives an error status after 
reporting its profile.
I also don't see what this is trying to explain.
oh, after reading FIS::Cancel() I guess this is referring to that code in there 
that will do CloseConsumer(). This is pretty confusing since I wouldn't assume 
that cancel on the FIS side is going to CloseConsumer.


PS10, Line 183:  Grab executor 
what is this referring to?


PS10, Line 184: order to get a reference to coord_instance_
              :   // so that coord_sink_ remains valid throughout query 
lifetime.
how does getting a reference to coord_instance_ affect the validity of 
coord_sink_? The FIS isn't ref counted, the QES is. But we already have the QES 
reference, so i'm not sure what this is talking about.


PS10, Line 189: at this point, the query is done with the Prepare phase,
why do we know that? Oh, because GetFinstanceState() blocks until Prepare? 
That's pretty subtle.


PS10, Line 204: GetPrepareStatus
this is a dangerous side-effect to have inside of a DCHECK -- it creates a 
barrier only present in debug builds. while this isn't bug, it means that 
release builds have a pretty big difference in behavior.

Maybe we should rename GetPrepareStatus() to WaitForPrepare() or similar to 
highlight that side-effect and prevent this from happening.


Line 263:   }
DCHECK(!has_coord_fragment || coord_backend_idx_ != -1) ?


PS10, Line 413:  backend_state->GetStatus();
what is this trying to do? Oh, I guess get the status of BackendState::Exec()? 

But this might also pick up the execution status of the FIS if it raced ahead 
fast enough, right?  

also is it really true this won't be CANCELLED?  Couldn't we have FIS_0 hit an 
error during QIS::Prepare(), which sets its preapred_status_, which QES notices 
and calls QES::CancelInternal(), which calls FIS_1::Cancel(), which cases FIS_1 
to send a report with CANCELLED as the status, which could arrive before the 
report for FIS_0 which contains the real error?


PS10, Line 424: DCHECK(query_status_.ok()); // nobody should have been able to 
cancel
why? because the query handle hasn't yet been returned?  But what about via the 
debug webpage?


PS10, Line 964: #if 0
              : do we really want this?
what are you going to do with this?


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

PS10, Line 210: query 
query execution? (as opposed to the query being closed)


PS10, Line 299: /// True if execution has completed, false otherwise.
              :   bool execution_completed_ = false;
doesn't appear to be used


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

PS10, Line 547: ReleaseResources
why is this called ReleaseResources and not Close? What are we saying the 
difference between those two terms are?


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

Like QES, I think the lifecycle state machine could be more clearly defined for 
FIS, but that can be refined later.


PS10, Line 95:   if (!status.ok() && !status.IsCancelled() && 
!status.IsMemLimitExceeded()) {
             :     // Log error message in addition to returning in Status. 
Queries that do not fetch
             :     // results (e.g. insert) may not receive the message 
directly and can only retrieve
             :     // the log.
             :     runtime_state_->LogError(status.msg());
             :   }
you should delete this when rebasing e0d1db5


PS10, Line 104:   // TODO: deal with final report failure, otherwise the 
coordinator doesn't know
              :   // we're done
is that a regression or something that didn't work before as well?


PS10, Line 213:     // TODO: why is this here? FragmentComplete() already 
releases the thread token
              :     //ReleaseThreadToken();
from the comment above, it sounds like it's meant to intentionally release the 
token early. do you plan to delete this code or re-enable it?
do we know longer see the underutilization?


PS10, Line 315:   // TODO: do not delete mem trackers in 
Close()/ReleaseResources(), they are part of
              :   // the control structures we need to preserve until the 
underlying QueryState
              :   // disappears.
is this todo talking about the code here or the code that deletes the mem 
trackers? don't we still want to free the resources attached to the row_batch_ 
here, even if the mem-tracker control structures are kept around?


PS10, Line 320: ReleaseResources
not for this change, but again we seem to be mixing terminology close and 
release-resources, and need to define the various states more clearly.


PS10, Line 331: DCHECK_LE(other_time, total_time);
IMPALA-4631 is going to hit again. what do you plan to do about that? i don't 
think it's worth spending more time tracking that down right now.


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

PS10, Line 52: /// Tear-down happens automatically at the end of Exec() and 
frees all memory allocated
             : /// for this fragment instance and closes all data streams.
Is this referring to Close()?  So in FIS case, Close() releases resources? We 
need to sort out the lifecycle terminology.


PS10, Line 77: The remaining non-getter public functions block until the 
Prepare phase finishes.
this comment is kinda buried. maybe better to note that at the class level, or 
in each func comment like in QES.h.


PS10, Line 84: GetPrepareStatus
like QES, the blocking side-effect isn't obvious from the function name. I 
think calling code would be clearer if called WaitForPrepare() or similar.


PS10, Line 91: GetOpenStatus
same


Line 121:   RuntimeState* runtime_state_;  // lives in obj_pool()
it'd be nice to point out the lifetimes. i.e. valid after Prepare, if 
successful. is that true for all members, or are there exceptions?


PS10, Line 149:  
by Prepare()


PS10, Line 153: should live in obj_pool()
doesn't the second clause invalidate this. i.e. this shouldn't live in obj_pool 
because we don't want it to have the same lifetime as the FIS. i.e. it's not 
purely a control structure, it's a resource.


PS10, Line 159: .
... or to the Prepare status if Prepare() failed.


PS10, Line 224: Blocks until Prepare() finishes.
what does that mean? if Prepare() is run synchronously w.r.t. the caller of 
this so how could this block?


PS10, Line 227:  Can handle
              :   /// partially-finished Prepare(). Blocks until Prepare() 
finishes.
that seems to contradict. i.e. if it blocks, then Prepare() must be finished. 
Maybe it means partially-succeeded Prepare?


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

Line 111:   //VLOG_QUERY << "AddChildTracker(): parent=" << this << " child=" 
<< tracker << " " << tracker->label_ << "\n" << GetStackTrace();
more of these to remove


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

PS10, Line 51: Prepare
why doesn't this happen on the other thread? won't codegen happen here 
eventually? 

I guess we want the prepare status to be returned synchronously from this RPC, 
so okay to leave this but wanted to check the plan.


Line 57:   // handler thread (= the caller of this function)
let's mention that ownership of the qs reference count is passed to the 
StartQueryHelper thread.


PS10, Line 59: exec-fragment-instance-$0
I don't understand the choice of name for this thread and the ones started in 
StartFInstances() (which is exec-query-finstances-$0).

It seems the thread created here should be plural since it starts multiple 
instances, and the ones started by StartFInstances() should be singular since 
they each execute a single instance.  

How about "start-query-finstances-$0" for this one, and 
"exec-query-finstance-$0" for the other?


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

Line 74:   /// 'created' is set to true if it was created, false otherwise.
it'd be nice to document that this increments the refcount.


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS10, Line 195:     // TODO: this might flood the log
              :     LOG(WARNING) << "Couldn't get a client for " << 
query_ctx().coord_address
              :         <<"\tReason: " << coord_status.GetDetail();
how do we plan to address that? what used to happen in this case?


PS10, Line 208: status.SetTStatus(&params);
why does 'status' get set into the params.status if it's specific to the fis? 
the comment in the thrift says this is only for the "overall status".


PS10, Line 255:     // TODO: should we try to keep rpc_status for the final 
report? (but the final
              :     // report, following this Cancel(), may not succeed anyway.)
what used to happen in this case? did we used to send the rpc_status back in 
some cases? are we in danger of ending the query early but not returning 
failure? what failure is returned as the ultimate query status in these cases 
where getting a client back to the coordinator fails?


PS10, Line 306: exec-query-finstances-$0
as mentioned in query-exec-mgr, seems this should be not plural: 
"exec-query-finstance-$0"


PS10, Line 311: don't return until every instance is prepared
why is that important? we're already running on an async thread, so why should 
it matter if we return? i think maybe this is really talking about implementing 
the QES::prepared_promise_ barrier, in which case it would be good to just say 
that.


Line 319:       prepare_status = instance_status;
It might simplify reasoning about lifetimes if we set some rules on when 
instances will start checking for cancellation. i.e. make it a rule that they 
don't check until after their Prepare().  (And thus couldn't possible return 
CANCELLED from GetPreparedStatus()).


PS10, Line 323: returned status
what is this referring to?


PS10, Line 342: CancelInternal();
I guess this can happen before all instances have gotten through their Prepare. 
Is it necessary to not just call Cancel() (which would block), and get rid of 
CancelInternal()?  That would reduce the combinations of possible concurrent 
phases and possible state transitions of the QES phases. It would even be nice 
to define that state machine of QES phases, which then simplifies reasoning 
about and documenting the lifetimes of QES fields (control-structures and 
resources).


Line 343:   ExecEnv::GetInstance()->query_exec_mgr()->ReleaseQueryState(this);
would be nice to comment that this reference was taken in StartFInstances().


PS10, Line 353: if (!is_cancelled_.CompareAndSwap(0, 1)) return;
cancellation is a bit more difficult to reason about now that CancelInternal() 
can return before the cancellation side-effects (at least those that are on the 
synchronous path) actually occur (when another thread won the race to setting 
is_cancelled_).  

FInstanceState::Cancel() also needs to be thread-safe, right? so why not just 
always call down and let the lower level synchronization take care of it?  
having this flag means we have two level of asynchronisity on the cancellation 
path.


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

I think defining a state machine for the lifecycle of this (and FIS) would go a 
long way, and then each field and operation can be documented against those 
states. Don't have to do that all now, but i have noted some places where maybe 
we could tighten up the transition into the cancelled state a bit.


PS10, Line 54: all query state (contained
             : /// either in this class or accessible through this class, such 
as the
             : /// FragmentInstanceStates) is guaranteed to be alive.
That's not the long term plan though, right? i.e. resources may be freed even 
when there is a reference (say, after all instances have completed). If so, how 
about a TODO to clarify the direction.


PS10, Line 59: cancelled
I think what it means to be "cancelled" is a bit vague, given the asynchronous 
nature of cancellation. maybe we can spell out the actual lifetime guarantee 
more explicitly.


PS10, Line 106: Uses few cycles
where will we eventually put codegen?
Also, I think it's a bit confusing that the QES Prepare phase and QIS Prepare 
phase are unrelated.


PS10, Line 112: Blocks until all fragment instances have finished
              :   /// their Prepare phase.
why is that important given that this runs on an async thread itself?


PS10, Line 119: GetPrepareStatus
This name is confusing given that it doesn't get the status for QES Prepare(). 
Also, as mentioned elsewhere, I think it would be good to highlight the barrier 
semantics by using the term "Block" or "Wait" in the name


PS10, Line 132: query is complete
what does that mean? when all fragment instances are complete, or when all 
references are released, or something else?  We don't have to get bogged down 
on this change, but I think we should refine this soon to avoid confusion and 
set a clear direction.


Line 146:   /// including its profile.
Add: Not legal to call until after Prepare is complete.


PS10, Line 159: const TQueryCtx query_ctx_;
why is this split out from rpc_params_?  oh, I guess for the coordinator, hmm.

let's document that this is set in the ctor, i.e. always valid.


Line 174:   DescriptorTbl* desc_tbl_ = nullptr;
For all these fields that are set outside the ctor, how does an outsider know 
whether the fields are valid yet or not? I guess the assumption is that they 
are accessed only by FIS threads? Or are they guaranteed to be valid only after 
prepared_promise_ is set? Would be nice to spell out the lifetimes for an 
asynchronous access (even if that's to say that they can only be accessed by 
FIS threads).


Line 177:   MemTracker* query_mem_tracker_;
would be nice to group the fields based on their lifetime, i.e. move this 
adjacent to query_ctx_ (these two field are always valid since set in ctor).


PS10, Line 184: prepared_promise_
when reading the code it's hard to remember that this is a barrier for all FIS 
Prepare, not this object's Prepare(), given the field name. How about calling 
it instances_prepared_promise_ or similar?


PS10, Line 194: )
Created in Prepare().


Line 199:   /// Temporary files for this query (owned by obj_pool_)
created in Prepare().


Line 203: 
For all these fields that are created after the ctor, how does an outsider know 
whether the fields are valid yet or not? I guess the assumption is that they 
are accessed only by FIS threads?


PS10, Line 222: latter
former?


PS10, Line 228: Cancels
Initiates cancellation ...
since the is_cancelled_ bit can cause this thread to race ahead and return 
before the thread that set is_cancelled_ actually does anything.

Also, is it legal to call this before prepare_promise_ is set, or not? I guess 
it is and that's why it's split out from Cancel()?  Is that really required 
(more comments about that in the cc file)?


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

PS10, Line 335: DCHECK(query_state_ != nullptr);
why? (was this meant to go in query_mem_tracker()?)


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS10, Line 99: Return the query's ObjectPool
i guess that means only control structures, not resources, should be put in 
there.  Is that currently true?


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/service/impala-internal-service.cc
File be/src/service/impala-internal-service.cc:

PS10, Line 44: //
are you removing these or putting them back?


http://gerrit.cloudera.org:8080/#/c/6535/10/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

PS10, Line 339: TExecQueryFragmentsParams
?


PS10, Line 398: s
delete 's'


-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

Reply via email to