[Impala-ASF-CR] PROTOTYPE: IMPALA-14271: Reapply the core piece of IMPALA-6984

Joe McDonnell (Code Review) Wed, 06 Aug 2025 23:46:35 -0700

Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/23264



Change subject: PROTOTYPE: IMPALA-14271: Reapply the core piece of IMPALA-6984
......................................................................

PROTOTYPE: IMPALA-14271: Reapply the core piece of IMPALA-6984

IMPALA-6984 changed the behavior to cancel backends when the
query reaches the RETURNED_RESULTS state. This ran into a regression
on large clusters where a query would end up waiting 10 seconds.
IMPALA-10047 reverted the core piece of the change.

For tuple caching, we found that a scan node can get stuck waiting
for a global runtime filter. It turns out that the coordinator will
not send out global runtime filters if the query is in a terminal
state. Tuple caching was causing queries to reach the RETURNED_RESULTS
phase before the runtime filter could be sent out. Reenabling the core
part of IMPALA-6984 sends out a cancel as soon as the query transitions
to RETURNED_RESULTS and wakes up any fragment instances waiting on
runtime filters.

The underlying cause of IMPALA-10047 is a tangle of locks that causes
us to exhaust the RPC threads. The coordinator is holding a lock on the
backend state while it sends the cancel synchronously. Other backends
that complete during that time run 
Coordinator::BackendState::LogFirstInProgress(),
which iterates through backend states to find the first that is not done.
The check to see if a backend state is done takes a lock on the backend
state. The problem case is that the coordinator may be sending a cancel
to a backend on itself. In that case, it needs an RPC thread on the coordinator
to be available to process the cancel. If all of the RPC threads are
processing updates, they can all call LogFirstInProgress() and get stuck
on the backend state lock for the coordinator's fragment. In that case,
it becomes a temporary deadlock as the cancel can't be processed and the
coordinator won't release the lock. It only gets resolved by the RPC timing
out.

To resolve this, this introduces an atomic is_done_ variable on
Coordinator::BackendState that is set to true when reaching a terminal state.
This introduces a IsDoneLockless() method to fetch the state from that atomic
variable without getting the lock and switches LogFirstInProgress() to use it.
This prevents the deadlock, as the RPC threads don't need to get a lock on
another backend to make progress.

Testing:
 - Hand tested with 30 impalads and control_service_num_svc_threads=1
   Without the fix, it reproduces easily after reverting IMPALA-10047.
   With the fix, it doesn't reproduce.

Change-Id: Ia058b03c72cc4bb83b0bd0a19ff6c8c43a647974
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
3 files changed, 42 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/23264/1
--
To view, visit http://gerrit.cloudera.org:8080/23264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia058b03c72cc4bb83b0bd0a19ff6c8c43a647974
Gerrit-Change-Number: 23264
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>

[Impala-ASF-CR] PROTOTYPE: IMPALA-14271: Reapply the core piece of IMPALA-6984

Reply via email to