Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12299 )

Change subject: IMPALA-2990: timeout unresponsive queries in coordinator
......................................................................


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator-backend-state.cc@174
PS7, Line 174:   const auto trigger = MakeScopeExitTrigger(
             :       [this]() { last_report_time_ms_ = 
GenerateReportTimestamp(); });
Does it implicitly rely on the LIFO order for local variables' destruction in 
C++ so "notifier" won't be destroyed before "trigger" ? Otherwise, we could 
have notified 'exec_complete_barrier' without setting 'last_report_time_ms'.

I may be paranoid but can you please add a comment about this so future changes 
won't accidentally swap their orders ? I suppose we can skip the use of 
NotifyBarrierOnExit here and just notify 'exec_complete_barrier' in 'trigger' 
after setting 'last_report_tims_ms_'.


http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc@549
PS7, Line 549: DCHECK(
DCHECK_LE


http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc@664
PS7, Line 664: DCHECK(
DCHECK_LE


http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@398
PS4, Line 398:
> k. We could deprecate it in a separate patch I suppose. Fine with this as i
As Thomas said, this config option was useful when users faced instability with 
large Impala cluster due to IMPALA-2990 (which this patch is fixing). We can 
consider deprecating this flag once we are convinced that this patch makes 
enough improvement that the instability with large cluster is mostly mitigated.

The periodic reporting matters for long running queries. The default reporting 
interval of 5 seconds could be longer than most of the short running query's 
duration. For long running queries, it probably doesn't really matter to get a 
report every 5 seconds vs say 30s or 60s. If we ever deprecate this flag, 
please consider bumping the the default reporting interval to something larger.


http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@2288
PS4, Line 2288:   return Status::OK();
              : }
              :
              : void ImpalaServer::ExpireQuery(ClientRequestState* crs, const 
Status& status) {
              :   DCHECK(!status.ok());
              :   cancellation_thread_pool_->Offer(
> k, would be interested what Michael thinks about that idea
The executor's reporting path currently can tolerate up to 
FLAGS_status_report_max_retries number of failures before giving up and 
declaring the coordinator as inaccessible. It's intentionally not a time-based 
approach and instead relies on the periodic reporting to handle the retry.

Stepping back a bit, my understanding for why we want to approximate the 
maximum retry time is that the coordinator may not want to prematurely cancel a 
query due to a missing report from an executor if it knows that the executor 
*may* still be retrying to send a report. However, as pointed out elsewhere by 
Todd, if an Impala can experience random pauses or random scheduling delay for 
the reporting thread, no amount of wait time will guarantee that an executor is 
not in the middle of retrying to send a report when the query is cancelled. I 
suppose that's not an end-goal we try to achieve.

This prompts me to think whether it's any different if we just do the following:
- expose a coordinator's maximum tolerable lag time as a knob set to arbitrary 
number
- expose an executor's maximum tolerable retry time as a knob set to some 
arbitrary number
- deprecate FLAGS_status_report_max_retries

An alternative would be to:
- expose an executor's maximum tolerable retry time as a knob set to some 
arbitrary number
- deprecate FLAGS_status_report_max_retries
- set coordinator's maximum tolerable lag time as (100 + x)% of the maximum 
tolerable retry time where x *may be* tunable in case the default value causes 
any trouble.

Todd / Thomas, what do you think ?


http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/service/impala-server.cc@218
PS7, Line 218: DEFINE_int32(status_report_max_retries, 3,
> Is our timeout too aggressive here? If one of the backends goes into some k
I agree that the coordinator should be more lenient in handling lost reports to 
avoid false positive. On the other hand, no amount of timeout can save us from 
random pauses in Impalad so we should work on fixing IMPALA-2800 if possible at 
all.



--
To view, visit http://gerrit.cloudera.org:8080/12299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987
Gerrit-Change-Number: 12299
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Tue, 02 Apr 2019 01:09:07 +0000
Gerrit-HasComments: Yes

Reply via email to