Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10060 )

Change subject: IMPALA-5216: Make admission control queuing async
......................................................................


Patch Set 11:

(29 comments)

That looks good. My review comments are mostly to clarify some of the code 
comments.

http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/scheduling/admission-controller.h@44
PS9, Line 44: if
add "... or the caller wants to cancel" or something like that about 
cancellation.


http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/scheduling/admission-controller.h@204
PS9, Line 204: Returns immediately if rejected
let's say what the return value and promise value are for each case, i.e. 
something like:
If rejected, returns immediately with an error status indicating the reason and 
sets 'admit_outcome' to REJECTED_OR_TIMED_OUT.  ... and same type of 
explanation for timed out, cancel and admitted case. That will also make it 
explicit that by the time this returns, the promise is always set.


http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/scheduling/admission-controller.h@206
PS9, Line 206: ANCELLED
i.e. explain status in this case.


http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/scheduling/admission-controller.h@212
PS9, Line 212: Promise<AdmissionOutcome, PromiseMode::MULTIPLE_PRODUCER>
could create a typedef for that since it appears in several places and is kinda 
long, but okay to ignore.


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/scheduling/admission-controller.cc@581
PS11, Line 581: result is set or the query is cancelled or it
              :   // times out.
the "result is set" also happens in the cancel case. To clarify, how about 
something like:
Block in Get() up to the time out waiting for the promise to be set when the 
query is admitted or cancelled.


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/scheduling/admission-controller.cc@590
PS11, Line 590:   SleepIfSetInDebugOptions(schedule->query_options(), 
SLEEP_AFTER_ADMISSION_OUTCOME_MS);
what's interesting about this time interval? Is it to give the dequeue loop 
time to notice the cancellation?


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/scheduling/admission-controller.cc@607
PS11, Line 607:  stats->Dequeue(*schedule, true);
nit: it was a bit hard to see that this block is really the same as the else 
case (which just different arguments) because things are done in a slightly 
different order. for consistency, how about moving this to after line 604?


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@70
PS11, Line 70:   /// the query to the Admission controller for asynchronous 
admission control.
is there some post condition around the operation state we can document? like 
that on success, this sets operation state to RUNNING_STATE or PENDING_STATE.


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@131
PS11, Line 131:  and sets the
              :   /// query status to CANCELLE
I thought you decided to hold off on that change.


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@153
PS11, Line 153: T
missing space before


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@286
PS11, Line 286: dequeuing thread
from this context, not clear what thread this is talking about. How about:
... set asynchronously by the admission controller (to admit).

Also the "admission control thread" is probably referring to what's now called 
the async exec thread?

This comment has a lot of admission controller details, rather than the high 
level overview of what this promise is. How about something like:

Promise used by the admission controller. AdmissionController:AdmitQuery() will 
block on this promise until the query is either rejected, admitted, times out, 
or is cancelled. Can be set to CANCELLED by the ClientRequestState in order to 
cancel, but otherwise is set by AdmissionController with the admission decision.


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@442
PS11, Line 442: rocessing a query or dml execution request and submitting it to 
the
              :   /// admission controller
this is confusing since it doesn't actually do that work (it happens 
asynchronously.). I think this comment could use more of a rewrite given the 
new code structure.


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.h@508
PS11, Line 508:
              :   /// Submits the QuerySchedule to the admission controller and 
on successful admission,
              :   /// starts up the coordinator execution, makes it accessible 
by setting
              :   /// 'coord_exec_started_' to true and advances 
operation_state_ to RUNNING. Handles
              :   /// async cancellation of queries and cleans up state if 
needed.
              :   void FinishExecQueryOrDmlRequest();
let's move that to be right after ExecAsyncQueryOrDmlRequest() since it's so 
closely related.


http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@153
PS9, Line 153: GetCoord
I think you ended up calling it GetCoordinator(), either update the comment or 
the method name.


http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@173
PS9, Line 173:   /// Only set for 'QUERY' and 'DML' statement types and is 
available only after Exec()
little hard to understand what "set" and "available" mean from just the 
interface. how about:
Returns the Coordinator for 'QUERY' and 'DML' requests once Coordinator::Exec() 
completes successfully. Otherwise returns null.


http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@176
PS9, Line 176: coord_exec_started_
I think I was a bit confused about the "started" in the name until I read the 
code since it kinda sounds like this is true once Exec() starts, but i think 
the name really means once async execution has been started. From the 
perspective of Exec(), this should be coord_exec_done_, right? but I think 
that's also confusing since execution is potentially still in progress. So I 
guess I don't have a great suggestion.


http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@182
PS9, Line 182: uses_admission_control
how about: has_async_exec_thread()? does it need to be public?


http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@325
PS9, Line 325:
             :   /// Not set for ddl queries. It should only be acce
I think phrasing this in terms of why may help clarify this:

... accessed through GetCoordinator() since most Coordinator methods can be 
called only after Coordinator::Exec() returns success.


http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@327
PS9, Line 327:  makes sure that it can onl
FinishExecQueryOrDmlRequest()


http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/service/client-request-state.h@330
PS9, Line 330: rted_' to true, and
             :   /// UpdateBacken
I think clearer to say:

... can be called before Coordinator::Exec() returns.


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@233
PS11, Line 233:     
UpdateNonErrorOperationState(TOperationState::PENDING_STATE);
The async-exec thread could race ahead and set this to RUNNING before we get 
here. But I guess UpdateNonErrorOperationState() ensures we don't go 
"backwards" in states? We could consider doing this as the first thing in the 
async-exec thread, though I suppose having it here is also nice because it 
means that we're guaranteeing that ExecStmt() RPC will always have transitioned 
away from INITIALIZING before returning. Just thinking aloud.


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@751
PS11, Line 751: operation_state_ < new_state
at this point, is it clearer to just list the possible states? it's just 
INITIALIZED, PENDING, RUNNING right? And you can dcheck that new_state != 
operation_state_ if you wanted to preserve that check.


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@915
PS11, Line 915: uses_admission_control()
once renamed, this won't make as much sense. I think we should just 
unconditionally do this Set(). That should be fine, right?


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@1151
PS11, Line 1151: void ClientRequestState::FinishExecQueryOrDmlRequest() {
like the decl, would be nice to move this up to be adjacent to 
AsyncExecQueryOrDmlRequest() since it's closely related.


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@1176
PS11, Line 1176: that coord_ is
               :       // visible to the cancellation thread if the query is 
marked cancelled after exiting
               :       // this critical section.
How about being explicit about why that's important rather than stating just 
the fact that it must be visible:

Once the lock is dropped, any future calls to Cancel() are responsible for 
calling Coordinator::Cancel(), so while holding the lock we need to both 
perform a check for cancellation and make the coord_ visible.


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/client-request-state.cc@1199
PS11, Line 1199: updates are
               : // only received after Exec() has been called successfully on 
it
that's not quite accurate. It's more that we these are safe to call 
concurrently with Coordinator::Exec().


http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/10060/11/be/src/service/impala-server.h@429
PS11, Line 429: It represents the set of queries that
              :     /// can be cancelled: either the query is queued or has 
started executing.
That's kind of secondary. The primary purpose is to have a list of queries that 
needs to be closed if the session is closed or expires. maybe just revert that 
comment.


http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/util/debug-util.h
File be/src/util/debug-util.h:

http://gerrit.cloudera.org:8080/#/c/10060/9/be/src/util/debug-util.h@117
PS9, Line 117: sleep_option
this is really the code location label or identifier, rather than the option 
for the sleep, right?  maybe call it 'sleep_label' or 'sleep_location'?


http://gerrit.cloudera.org:8080/#/c/10060/11/www/query_backends.tmpl
File www/query_backends.tmpl:

http://gerrit.cloudera.org:8080/#/c/10060/11/www/query_backends.tmpl@90
PS11, Line 90: has completed, or has no backends or has not started
             : any backends, yet.
the comma locations are kinda weird in this user facing message. rather than 
tacking something to the end, let's word it in a way that makes sense.



--
To view, visit http://gerrit.cloudera.org:8080/10060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf
Gerrit-Change-Number: 10060
Gerrit-PatchSet: 11
Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 May 2018 22:32:11 +0000
Gerrit-HasComments: Yes

Reply via email to