Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16369 )

Change subject: IMPALA-9636: Don't run retried query on the blacklisted nodes
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/runtime/query-driver.cc
File be/src/runtime/query-driver.cc:

http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/runtime/query-driver.cc@128
PS8, Line 128:     // set blacklisted_backend_ids for client_request_state of 
the original query.
             :     if (blacklisted_backend_ids != nullptr && 
!blacklisted_backend_ids->empty()) {
             :       
client_request_state_->SetBlacklistedBackendIds(*blacklisted_backend_ids);
             :     }
             :
             :     // Triggering a retry from the INITIALIZED phase is 
possible: the
             :     /
instead of passing blacklisted_executor_addresses into this method, it seems 
cleaner to just call 'parent_request_state->SetBlacklistedExecutorAddresses()' 
inside coordinator.cc - in fact, you can replace 
SetBlacklistedExecutorAddresses with AddBlacklistedExecutorAddresses that way 
you don't have to built up the unordered_set of addresses.


http://gerrit.cloudera.org:8080/#/c/16369/4/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/16369/4/be/src/scheduling/admission-controller.cc@1572
PS4, Line 1572: chedule_executor_group != executor_group)
> Done
right now, if a query is retried and all executors are blacklisted, the query 
will get queued by the admission controller and will eventually fail with the 
error:

"Admission for query exceeded timeout 60000ms in pool default-pool. Queued 
reason: Waiting for executors to start. Only DDL queries and queries scheduled 
only on the coordinator (either NUM_NODES set to 1 or when small query 
optimization is triggered) can currently run. Additional Details: Not 
Applicable"

if the blacklisting here removes all executors then the query will fail with 
"Cannot schedule query: no registered backends available", correct? is it 
possible to keep this consistent with the other blacklisting logic and print 
out "Admission for query exceeded timeout 60000ms in pool default-pool ...".


http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/admission-controller.cc@1120
PS8, Line 1120:   if (!request.blacklisted_backend_ids.empty()) {
              :     // This debug_action is to allow delay to be injected for 
the admission of a retried
              :     // query, which is triggered by blacklisted node, before 
getting membership snapshot.
              :     DebugActionNoFail(FLAGS_debug_actions, 
"ADMISSION_DELAY_FOR_RETRIED_QUERY");
              :   }
can you use CRS_BEFORE_ADMISSION instead?


http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/scheduler.cc@163
PS8, Line 163:       DCHECK(executor_config.group.NumExecutors() > 0 || 
exec_at_coord);
             :
             :       const TPlanNode& node = stat
not sure I fully understand why the DCHECK needs to be removed, could you 
expand on this a bit please?


http://gerrit.cloudera.org:8080/#/c/16369/8/tests/custom_cluster/test_query_retries.py
File tests/custom_cluster/test_query_retries.py:

http://gerrit.cloudera.org:8080/#/c/16369/8/tests/custom_cluster/test_query_retries.py@287
PS8, Line 287:  statestored_args="--statestore_heartbeat
nice! I think this actually solves IMPALA-9227 as well.



--
To view, visit http://gerrit.cloudera.org:8080/16369
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00bc1b5026efbd0670ffbe57bcebc457d34cb105
Gerrit-Change-Number: 16369
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Tue, 08 Sep 2020 17:56:21 +0000
Gerrit-HasComments: Yes

Reply via email to