[ 
https://issues.apache.org/jira/browse/IMPALA-6362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6362.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0

IMPALA-6362: avoid Reservation/MemTracker deadlock

Avoid the circular dependency between ReservationTracker::lock_ and
MemTracker::child_trackers_lock_ by not acquiring
ReservationTracker::lock_ in GetReservation(), where an atomic
operation is sufficient.

Testing:
Added a unit test that reproed the deadlock.

Change-Id: Id7adbe961a925075422c685690dd3d1609779ced
Reviewed-on: http://gerrit.cloudera.org:8080/8933
Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com>
Tested-by: Impala Public Jenkins
---

> Queries don't make progress due to what seems like a memory reservation 
> deadlock while running the stress tests 
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-6362
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6362
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.12.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: hang
>             Fix For: Impala 2.12.0
>
>         Attachments: 
> stress_debug_without_krpc_vd1304.halxg.cloudera.com_1.txt.zip, 
> stress_debug_without_krpc_vd1304.halxg.cloudera.com_2.txt.zip
>
>
> Queries stopped making progress, many of the fragment threads are trying to 
> increase or decrease memory reservation and non of those threads is making 
> progress.
> Did some quick analysis on the threads and I couldn't find any thread making 
> progress, so this might be a deadlock. 
> cat stress_debug_without_krpc_vd1304.halxg.cloudera.com_1.txt  | grep 
> 0x0000000001b01006 -B 4 | awk '{print $4}' | sort -nr | uniq -c | sort -nr
>    1312 impala::SpinLock::lock()
>    1312 impala::ReservationTracker::IncreaseReservationInternalLocked(long,
>    1312 boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&)
>    1312 base::SpinLock::SlowLock()
>    1312 base::SpinLock::Lock()
>    1311
> cat stress_debug_without_krpc_vd1304.halxg.cloudera.com_1.txt  | grep 
> 0x0000000001b017c6 -B 4 | awk '{print $4}' | sort -nr | uniq -c | sort -nr
>     688 impala::ReservationTracker::DecreaseReservation(long,
>     688 impala::ReservationTracker::DecreaseReservationLocked(long,
>     400 impala::SpinLock::lock()
>     400 boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&)
>     400 base::SpinLock::Lock()
>     399
> {code}
> #0  0x0000000003bd6944 in sys_futex ()
> #1  0x0000000003bd6a85 in base::internal::SpinLockDelay(int volatile*, int, 
> int) ()
> #2  0x0000000003bd6835 in base::SpinLock::SlowLock() ()
> #3  0x00000000015f75fd in base::SpinLock::Lock() ()
> #4  0x00000000015f7672 in impala::SpinLock::lock() ()
> #5  0x00000000015f8d4c in 
> boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&) ()
> #6  0x0000000001b015bf in 
> impala::ReservationTracker::DecreaseReservation(long, bool) ()
> #7  0x0000000001b017c6 in 
> impala::ReservationTracker::DecreaseReservationLocked(long, bool) ()
> #8  0x0000000001b015d6 in 
> impala::ReservationTracker::DecreaseReservation(long, bool) ()
> #9  0x0000000001b017c6 in 
> impala::ReservationTracker::DecreaseReservationLocked(long, bool) ()
> #10 0x0000000001b015d6 in 
> impala::ReservationTracker::DecreaseReservation(long, bool) ()
> #11 0x00000000018aabf0 in 
> impala::ReservationTracker::DecreaseReservation(long) ()
> #12 0x00000000018aaaee in 
> impala::InitialReservations::Return(impala::BufferPool::ClientHandle*, long) 
> ()
> #13 0x0000000001b5e8e9 in impala::ExecNode::Close(impala::RuntimeState*) ()
> #14 0x000000000293ef2c in 
> impala::BlockingJoinNode::Close(impala::RuntimeState*) ()
> #15 0x00000000028d639f in 
> impala::PartitionedHashJoinNode::Close(impala::RuntimeState*) ()
> #16 0x00000000018a51aa in impala::FragmentInstanceState::Close() ()
> #17 0x00000000018a24b8 in impala::FragmentInstanceState::Exec() ()
> #18 0x000000000188afe6 in 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) ()
> #19 0x0000000001889886 in 
> impala::QueryState::StartFInstances()::{lambda()#1}::operator()() const ()
> #20 0x000000000188bc25 in 
> boost::detail::function::void_function_obj_invoker0<impala::QueryState::StartFInstances()::{lambda()#1},
>  void>::invoke(boost::detail::function::function_buffer&) ()
> {code}
> {code}
> #0  0x0000000003bd6944 in sys_futex ()
> #1  0x0000000003bd6a85 in base::internal::SpinLockDelay(int volatile*, int, 
> int) ()
> #2  0x0000000003bd6835 in base::SpinLock::SlowLock() ()
> #3  0x00000000015f75fd in base::SpinLock::Lock() ()
> #4  0x00000000015f7672 in impala::SpinLock::lock() ()
> #5  0x00000000015f8d4c in 
> boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&) ()
> #6  0x0000000001b01006 in 
> impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, 
> bool, impala::Status*) ()
> #7  0x0000000001b01031 in 
> impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, 
> bool, impala::Status*) ()
> #8  0x0000000001b01031 in 
> impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, 
> bool, impala::Status*) ()
> #9  0x0000000001b006f5 in 
> impala::ReservationTracker::IncreaseReservationToFit(long, impala::Status*) ()
> #10 0x0000000001af738e in 
> impala::BufferPool::ClientHandle::IncreaseReservationToFit(long) ()
> #11 0x0000000002c66574 in impala::BufferedTupleStream::AdvanceWritePage(long, 
> bool*) ()
> #12 0x0000000002c692d9 in 
> impala::BufferedTupleStream::AddRowCustomBeginSlow(long, impala::Status*) ()
> #13 0x0000000002c69111 in 
> impala::BufferedTupleStream::AddRowSlow(impala::TupleRow*, impala::Status*) ()
> #14 0x0000000002c69b5e in 
> impala::BufferedTupleStream::AddRow(impala::TupleRow*, impala::Status*) ()
> #15 0x00007f059f628148 in impala::PhjBuilder::ProcessBuildBatch ()
> #16 0x000000000295e10c in impala::PhjBuilder::Send(impala::RuntimeState*, 
> impala::RowBatch*) ()
> #17 0x0000000002941fda in impala::Status 
> impala::BlockingJoinNode::SendBuildInputToSink<false>(impala::RuntimeState*, 
> impala::DataSink*) ()
> #18 0x000000000293fe59 in 
> impala::BlockingJoinNode::ProcessBuildInputAndOpenProbe(impala::RuntimeState*,
>  impala::DataSink*) ()
> #19 0x00000000028d58cf in 
> impala::PartitionedHashJoinNode::Open(impala::RuntimeState*) ()
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to