Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10060 )

Change subject: IMPALA-5216: Make admission control queuing async
......................................................................


Patch Set 7:

(27 comments)

I need to dig into the details some more but some initial comments/questions.

http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/scheduling/admission-controller.h@44
PS7, Line 44: ClientRequestState::admit_outcome_
if this enum is meant to be consumed publicly, then it's best to not refer to 
this private variable when explaining it. let's try to explain it in terms of 
the public portion of the AdmissionController abstraction.


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/scheduling/admission-controller.h@201
PS7, Line 201: admit_status
admit_outcome?


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/scheduling/admission-controller.h@204
PS7, Line 204: admit_status
admit_outcome


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.h@72
PS7, Line 72:   /// Must *not* be called with lock_ held.
given that the coordinator object is (unfortunately) current exposed by this 
class, let's document when, and in which cases, that gets created.


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.h@114
PS7, Line 114: non-error states (RUNNING_STATE and FINISHED_STATE)
is PENDING_STATE one of those?


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.h@152
PS7, Line 152: coord_
let's try not to talk about private members when documenting the public 
interface (so that we can try to make sense of the abstractions without knowing 
all the implementation details).


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.h@172
PS7, Line 172:   Coordinator* coord() const {
             :     return coord_exec_started_.Load() ? coord_.get() : nullptr;
             :   }
it's confusing that coord() is not actually an accessor given it's name. If we 
really need this then we should name it GetCoord() or something.

But, it seems weird that we need this. We really need to cleanup the CRS 
states, but in the mean time, why can't we just defer setting coord_ until 
after coord->Exec() returns (and maybe hold the lock while setting coord_ if 
that makes the synchronization simpler)?


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/client-request-state.h@323
PS7, Line 323: Exec() has been successfully called on it
why do we bother creating the coordinator object ahead of when we call Exec()? 
i.e. what use is it before Exec() is called?


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-hs2-server.cc@448
PS7, Line 448:   DCHECK(status.ok() || 
!request_state->uses_admission_control());
why is that the case, and why does this code care?


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-hs2-server.cc@464
PS7, Line 464:     if (!status.ok()) goto return_error;
if we hit an error here, how does the WaitAsync thread finish?


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-http-handler.cc@769
PS7, Line 769:  == nullptr
not your change, and let's not fix it here, but given that the coord_ ptr is 
set outside of holding the lock, these checks aren't really technically safe. 
This is another example of why we should tighten up the specification of the 
CRS states.


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-http-handler.cc@769
PS7, Line 769: uses_admission_control(
should that check for PENDING_STATE instead, to be more specific?


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-http-handler.cc@778
PS7, Line 778: request_state->coord() != nullptr
this would also be more about the states, but okay to leave it for now.


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-http-handler.cc@779
PS7, Line 779:         request_state->coord()->GetTExecSummary(&summary);
I guess PrintExecSummary() is okay with an empty summary? have you tested that? 
I think previously the summary would at least be initialized (though given that 
the CRS lock isn't held across setting the coord_ ptr and calling Exec(), there 
was a small window previously that could also lead to this empty case).


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-server.h@432
PS7, Line 432: It represents the set of queries that
             :     /// have either started execution or are queued and can be 
cancelled.
if this is accurate, maybe say instead:

It represents the set of queries that can be cancelled: either the query is 
queued or has started executing.


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-server.h@434
PS7, Line 434: submitted_queries
alternatively, if we clean up the state machine and how cancel works, it seems 
like we might be able to get rid of this altogether? not for this commit though.


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-server.h@499
PS7, Line 499: changes when the query
             :   /// is set in-flight.
preexisting, but I'm not really sure what this is saying.


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-server.cc@a910
PS7, Line 910:
is it the right thing to delay this? by delaying it it means that that the 
cluster membership topic update won't cancel queries that are queued to run 
against these hosts. I guess I can see arguments either way, but it seems safer 
to keep the behavior of cancelling them proactively, no?


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-server.cc@663
PS7, Line 663:         return Status::OK();
What does this look like in the various places that get the summary? Should it 
instead return a status message saying the query is current queued in admission 
control?


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/service/impala-server.cc@875
PS7, Line 875:     if (FLAGS_stress_metadata_loading_pause_injection_ms > 0) {
             :       
SleepForMs(FLAGS_stress_metadata_loading_pause_injection_ms);
             :     }
this should probably use your new debug action thingy, but okay to do that in a 
follow on commit.


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/util/debug-util.h
File be/src/util/debug-util.h:

http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/util/debug-util.h@112
PS7, Line 112: "sleep_option"
and then you could change:
"sleep_option" input variable matches
to just:
'sleep_option' matches


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/util/debug-util.h@112
PS7, Line 112: sleep_option:sleep_time_ms
since we usually use single quotes to document function args, this might be 
clearer as:

... <sleep_option>:<sleep_time_ms> where <sleep_option> is a string and 
<sleep_time_ms> is an integer ...


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/util/debug-util.h@114
PS7, Line 114: 'sleep_time_ms
then also make this: <sleep_time_ms>


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/util/debug-util.h@115
PS7, Line 115: string
const string&


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/util/promise.h
File be/src/util/promise.h:

http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/util/promise.h@73
PS7, Line 73: (
nit: missing space


http://gerrit.cloudera.org:8080/#/c/10060/7/be/src/util/promise.h@77
PS7, Line 77:     return val_;
given the subtlty of the lock as commented above, maybe combine these into a 
SetWork<ALLOW_MULTIPLE> where you just put a DCHECK(ALLOW_MULTIPLE) in the 
if-branch. I don't feel too strongly about it though.


http://gerrit.cloudera.org:8080/#/c/10060/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/10060/7/common/thrift/ImpalaService.thrift@90
PS7, Line 90: "sleep_option:sleep_time_ms",
to keep with the a consistent syntax as #1, I think this would be:

"<sleep option>:<sleep time ms>"



--
To view, visit http://gerrit.cloudera.org:8080/10060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf
Gerrit-Change-Number: 10060
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Wed, 09 May 2018 23:34:00 +0000
Gerrit-HasComments: Yes

Reply via email to