[ https://issues.apache.org/jira/browse/IMPALA-6362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-6362. ----------------------------------- Resolution: Fixed Fix Version/s: Impala 2.12.0 IMPALA-6362: avoid Reservation/MemTracker deadlock Avoid the circular dependency between ReservationTracker::lock_ and MemTracker::child_trackers_lock_ by not acquiring ReservationTracker::lock_ in GetReservation(), where an atomic operation is sufficient. Testing: Added a unit test that reproed the deadlock. Change-Id: Id7adbe961a925075422c685690dd3d1609779ced Reviewed-on: http://gerrit.cloudera.org:8080/8933 Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com> Tested-by: Impala Public Jenkins --- > Queries don't make progress due to what seems like a memory reservation > deadlock while running the stress tests > ---------------------------------------------------------------------------------------------------------------- > > Key: IMPALA-6362 > URL: https://issues.apache.org/jira/browse/IMPALA-6362 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.12.0 > Reporter: Mostafa Mokhtar > Assignee: Tim Armstrong > Priority: Critical > Labels: hang > Fix For: Impala 2.12.0 > > Attachments: > stress_debug_without_krpc_vd1304.halxg.cloudera.com_1.txt.zip, > stress_debug_without_krpc_vd1304.halxg.cloudera.com_2.txt.zip > > > Queries stopped making progress, many of the fragment threads are trying to > increase or decrease memory reservation and non of those threads is making > progress. > Did some quick analysis on the threads and I couldn't find any thread making > progress, so this might be a deadlock. > cat stress_debug_without_krpc_vd1304.halxg.cloudera.com_1.txt | grep > 0x0000000001b01006 -B 4 | awk '{print $4}' | sort -nr | uniq -c | sort -nr > 1312 impala::SpinLock::lock() > 1312 impala::ReservationTracker::IncreaseReservationInternalLocked(long, > 1312 boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&) > 1312 base::SpinLock::SlowLock() > 1312 base::SpinLock::Lock() > 1311 > cat stress_debug_without_krpc_vd1304.halxg.cloudera.com_1.txt | grep > 0x0000000001b017c6 -B 4 | awk '{print $4}' | sort -nr | uniq -c | sort -nr > 688 impala::ReservationTracker::DecreaseReservation(long, > 688 impala::ReservationTracker::DecreaseReservationLocked(long, > 400 impala::SpinLock::lock() > 400 boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&) > 400 base::SpinLock::Lock() > 399 > {code} > #0 0x0000000003bd6944 in sys_futex () > #1 0x0000000003bd6a85 in base::internal::SpinLockDelay(int volatile*, int, > int) () > #2 0x0000000003bd6835 in base::SpinLock::SlowLock() () > #3 0x00000000015f75fd in base::SpinLock::Lock() () > #4 0x00000000015f7672 in impala::SpinLock::lock() () > #5 0x00000000015f8d4c in > boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&) () > #6 0x0000000001b015bf in > impala::ReservationTracker::DecreaseReservation(long, bool) () > #7 0x0000000001b017c6 in > impala::ReservationTracker::DecreaseReservationLocked(long, bool) () > #8 0x0000000001b015d6 in > impala::ReservationTracker::DecreaseReservation(long, bool) () > #9 0x0000000001b017c6 in > impala::ReservationTracker::DecreaseReservationLocked(long, bool) () > #10 0x0000000001b015d6 in > impala::ReservationTracker::DecreaseReservation(long, bool) () > #11 0x00000000018aabf0 in > impala::ReservationTracker::DecreaseReservation(long) () > #12 0x00000000018aaaee in > impala::InitialReservations::Return(impala::BufferPool::ClientHandle*, long) > () > #13 0x0000000001b5e8e9 in impala::ExecNode::Close(impala::RuntimeState*) () > #14 0x000000000293ef2c in > impala::BlockingJoinNode::Close(impala::RuntimeState*) () > #15 0x00000000028d639f in > impala::PartitionedHashJoinNode::Close(impala::RuntimeState*) () > #16 0x00000000018a51aa in impala::FragmentInstanceState::Close() () > #17 0x00000000018a24b8 in impala::FragmentInstanceState::Exec() () > #18 0x000000000188afe6 in > impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) () > #19 0x0000000001889886 in > impala::QueryState::StartFInstances()::{lambda()#1}::operator()() const () > #20 0x000000000188bc25 in > boost::detail::function::void_function_obj_invoker0<impala::QueryState::StartFInstances()::{lambda()#1}, > void>::invoke(boost::detail::function::function_buffer&) () > {code} > {code} > #0 0x0000000003bd6944 in sys_futex () > #1 0x0000000003bd6a85 in base::internal::SpinLockDelay(int volatile*, int, > int) () > #2 0x0000000003bd6835 in base::SpinLock::SlowLock() () > #3 0x00000000015f75fd in base::SpinLock::Lock() () > #4 0x00000000015f7672 in impala::SpinLock::lock() () > #5 0x00000000015f8d4c in > boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&) () > #6 0x0000000001b01006 in > impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, > bool, impala::Status*) () > #7 0x0000000001b01031 in > impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, > bool, impala::Status*) () > #8 0x0000000001b01031 in > impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, > bool, impala::Status*) () > #9 0x0000000001b006f5 in > impala::ReservationTracker::IncreaseReservationToFit(long, impala::Status*) () > #10 0x0000000001af738e in > impala::BufferPool::ClientHandle::IncreaseReservationToFit(long) () > #11 0x0000000002c66574 in impala::BufferedTupleStream::AdvanceWritePage(long, > bool*) () > #12 0x0000000002c692d9 in > impala::BufferedTupleStream::AddRowCustomBeginSlow(long, impala::Status*) () > #13 0x0000000002c69111 in > impala::BufferedTupleStream::AddRowSlow(impala::TupleRow*, impala::Status*) () > #14 0x0000000002c69b5e in > impala::BufferedTupleStream::AddRow(impala::TupleRow*, impala::Status*) () > #15 0x00007f059f628148 in impala::PhjBuilder::ProcessBuildBatch () > #16 0x000000000295e10c in impala::PhjBuilder::Send(impala::RuntimeState*, > impala::RowBatch*) () > #17 0x0000000002941fda in impala::Status > impala::BlockingJoinNode::SendBuildInputToSink<false>(impala::RuntimeState*, > impala::DataSink*) () > #18 0x000000000293fe59 in > impala::BlockingJoinNode::ProcessBuildInputAndOpenProbe(impala::RuntimeState*, > impala::DataSink*) () > #19 0x00000000028d58cf in > impala::PartitionedHashJoinNode::Open(impala::RuntimeState*) () > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)