Bharath Vissapragada has uploaded this change for review. (
http://gerrit.cloudera.org:8080/10696
Change subject: [DRAFT] IMPALA-6189: Add thread watchdogs for HDFS IO calls
......................................................................
[DRAFT] IMPALA-6189: Add thread watchdogs for HDFS IO calls
This commit adds scoped thread watchdogs around blocking HDFS
IO calls. These dump a user (native + JVM) and kernel stack trace
to the logs if the blocking function call runs for greater than
'threshold' milliseconds. Having such traces will help debug issues
with query hangs due to hangs in external components.
The threshold can be configured using --thread_watchdog_threshold_ms
If stack unwinding tends to be CPU heavy and causing performance
problems, the watchdogs can be disabled by setting
--thread_watchdog_threshold_ms=0 (default = 5000ms)
Usage:
-----
This patch adds a macro called SCOPED_WATCHDOG(). Following
is a sample usage:
{
SCOPED_WATCHDOG();
hdfsOpenFile(fs, fname, O_RDONLY, 0, 0, 0)
}
The above code logs kernel and user level stacks for the current thread
if hdfsOpenFile() takes more than 5000 milliseconds.
Sample output:
-------------
Native Stacks:
==============
Injected a manual sleep using SleepForMs() before hdfsOpenFile() to
simulate a hang and the following is logged in the impalad logs.
W0611 16:21:01.602687 347 kernel_stack_watchdog.cc:146] Thread 383
stuck at /home/bharath/Impala/be/src/runtime/io/handle-cache.inline.h:34
for 5036ms:
Kernel stack:
[<0000000000000000>] hrtimer_nanosleep+0xbb/0x180
[<0000000000000000>] SyS_nanosleep+0x66/0x80
[<0000000000000000>] system_call_fastpath+0x1a/0x1f
[<0000000000000000>] 0xffffffffffffffff
User stack:
@ 0x7f808e7fed40 (unknown)
@ 0x7f808eb9db9d __libc_nanosleep
@ 0x1f460e6 std::this_thread::sleep_for<>()
@ 0x1f44edd impala::SleepForMs()
@ 0x2fef2ca impala::io::HdfsFileHandle::HdfsFileHandle()
@ 0x2fef5f7
impala::io::CachedHdfsFileHandle::CachedHdfsFileHandle()
@ 0x2fefe78 impala::io::FileHandleCache::GetFileHandle()
@ 0x2ff3e3f impala::io::DiskIoMgr::GetCachedHdfsFileHandle()
@ 0x300fad7 impala::io::ScanRange::Read()
@ 0x300bca4 impala::io::ScanRange::DoRead()
@ 0x2ff3207 impala::io::DiskQueue::DiskThreadLoop()
@ 0x2ffac32 boost::_mfi::mf1<>::operator()()
@ 0x2ffa79f boost::_bi::list2<>::operator()<>()
@ 0x2ff9fbd boost::_bi::bind_t<>::operator()()
@ 0x2ff970a
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x1b4e388 boost::function0<>::operator()()
JVM Stack:
Thread not attached to the JVM.
JVM Stacks:
===========
If a JVM Thread is attached for a given hung native thread, the the
stack trace output looks like follows.
W0618 20:55:45.988883 1220 kernel_stack_watchdog.cc:147] Thread 1381
stuck at be/src/service/impala-server.cc:882 for 5083ms:
Kernel stack:
[<0000000000000000>] futex_wait_queue_me+0xde/0x140
[<0000000000000000>] futex_wait+0x182/0x290
[<0000000000000000>] do_futex+0xde/0x6a0
[<0000000000000000>] SyS_futex+0x71/0x150
[<0000000000000000>] system_call_fastpath+0x1a/0x1f
[<0000000000000000>] 0xffffffffffffffff
User stack:
@ 0x7fee18315d40 (unknown)
@ 0x7fee186b17ce __pthread_cond_timedwait
@ 0x7fee1b1d6a9f os::PlatformEvent::park()
@ 0x7fee1b1c2c3f ObjectMonitor::wait()
@ 0x7fee1afd5ed2 JVM_MonitorWait
@ 0x7fee03cb6ca4 (unknown)
JVM Stack:
Ljava/lang/Object;wait()
Lorg/apache/impala/catalog/ImpaladCatalog;waitForCatalogUpdate(Line:239)
Lorg/apache/impala/analysis/StmtMetadataLoader;loadTables(Line:136)
Lorg/apache/impala/analysis/StmtMetadataLoader;loadTables(Line:115)
Lorg/apache/impala/service/Frontend;createExecRequest(Line:996)
Lorg/apache/impala/service/JniFrontend;createExecRequest(Line:152)
Change-Id: I28e918761c120043d332b034450208eaf34e3e2b
---
M be/src/common/global-flags.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/kudu/util/debug-util.cc
M be/src/kudu/util/kernel_stack_watchdog.cc
M be/src/runtime/io/handle-cache.inline.h
M be/src/runtime/io/scan-range.cc
M be/src/service/CMakeLists.txt
A be/src/service/jvmti-agent.cc
A be/src/service/jvmti-agent.h
M be/src/util/CMakeLists.txt
A be/src/util/jvm-stack-trace.cc
A be/src/util/jvm-stack-trace.h
A be/src/util/scoped-watchdog.h
M bin/impala-config.sh
14 files changed, 498 insertions(+), 27 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/10696/3
--
To view, visit http://gerrit.cloudera.org:8080/10696
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I28e918761c120043d332b034450208eaf34e3e2b
Gerrit-Change-Number: 10696
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Balazs Jeszenszky <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>