Bharath Vissapragada has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10696
Change subject: [DRAFT] IMPALA-6189: Add thread watchdogs for HDFS IO calls ...................................................................... [DRAFT] IMPALA-6189: Add thread watchdogs for HDFS IO calls This commit adds scoped thread watchdogs around blocking HDFS IO calls. These dump a user (native + JVM) and kernel stack trace to the logs if the blocking function call runs for greater than 'threshold' milliseconds. Having such traces will help debug issues with query hangs due to hangs in external components. The threshold can be configured using --thread_watchdog_threshold_ms If stack unwinding tends to be CPU heavy and causing performance problems, the watchdogs can be disabled by setting --thread_watchdog_threshold_ms=0 (default = 5000ms) Usage: ----- This patch adds a macro called SCOPED_WATCHDOG(). Following is a sample usage: { SCOPED_WATCHDOG(); hdfsOpenFile(fs, fname, O_RDONLY, 0, 0, 0) } The above code logs kernel and user level stacks for the current thread if hdfsOpenFile() takes more than 5000 milliseconds. Sample output: ------------- Native Stacks: ============== Injected a manual sleep using SleepForMs() before hdfsOpenFile() to simulate a hang and the following is logged in the impalad logs. W0611 16:21:01.602687 347 kernel_stack_watchdog.cc:146] Thread 383 stuck at /home/bharath/Impala/be/src/runtime/io/handle-cache.inline.h:34 for 5036ms: Kernel stack: [<0000000000000000>] hrtimer_nanosleep+0xbb/0x180 [<0000000000000000>] SyS_nanosleep+0x66/0x80 [<0000000000000000>] system_call_fastpath+0x1a/0x1f [<0000000000000000>] 0xffffffffffffffff User stack: @ 0x7f808e7fed40 (unknown) @ 0x7f808eb9db9d __libc_nanosleep @ 0x1f460e6 std::this_thread::sleep_for<>() @ 0x1f44edd impala::SleepForMs() @ 0x2fef2ca impala::io::HdfsFileHandle::HdfsFileHandle() @ 0x2fef5f7 impala::io::CachedHdfsFileHandle::CachedHdfsFileHandle() @ 0x2fefe78 impala::io::FileHandleCache::GetFileHandle() @ 0x2ff3e3f impala::io::DiskIoMgr::GetCachedHdfsFileHandle() @ 0x300fad7 impala::io::ScanRange::Read() @ 0x300bca4 impala::io::ScanRange::DoRead() @ 0x2ff3207 impala::io::DiskQueue::DiskThreadLoop() @ 0x2ffac32 boost::_mfi::mf1<>::operator()() @ 0x2ffa79f boost::_bi::list2<>::operator()<>() @ 0x2ff9fbd boost::_bi::bind_t<>::operator()() @ 0x2ff970a boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x1b4e388 boost::function0<>::operator()() JVM Stack: Thread not attached to the JVM. JVM Stacks: =========== If a JVM Thread is attached for a given hung native thread, the the stack trace output looks like follows. W0618 20:55:45.988883 1220 kernel_stack_watchdog.cc:147] Thread 1381 stuck at be/src/service/impala-server.cc:882 for 5083ms: Kernel stack: [<0000000000000000>] futex_wait_queue_me+0xde/0x140 [<0000000000000000>] futex_wait+0x182/0x290 [<0000000000000000>] do_futex+0xde/0x6a0 [<0000000000000000>] SyS_futex+0x71/0x150 [<0000000000000000>] system_call_fastpath+0x1a/0x1f [<0000000000000000>] 0xffffffffffffffff User stack: @ 0x7fee18315d40 (unknown) @ 0x7fee186b17ce __pthread_cond_timedwait @ 0x7fee1b1d6a9f os::PlatformEvent::park() @ 0x7fee1b1c2c3f ObjectMonitor::wait() @ 0x7fee1afd5ed2 JVM_MonitorWait @ 0x7fee03cb6ca4 (unknown) JVM Stack: Ljava/lang/Object;wait() Lorg/apache/impala/catalog/ImpaladCatalog;waitForCatalogUpdate(Line:239) Lorg/apache/impala/analysis/StmtMetadataLoader;loadTables(Line:136) Lorg/apache/impala/analysis/StmtMetadataLoader;loadTables(Line:115) Lorg/apache/impala/service/Frontend;createExecRequest(Line:996) Lorg/apache/impala/service/JniFrontend;createExecRequest(Line:152) Change-Id: I28e918761c120043d332b034450208eaf34e3e2b --- M be/src/common/global-flags.cc M be/src/exec/hdfs-table-sink.cc M be/src/kudu/util/debug-util.cc M be/src/kudu/util/kernel_stack_watchdog.cc M be/src/runtime/io/handle-cache.inline.h M be/src/runtime/io/scan-range.cc M be/src/service/CMakeLists.txt A be/src/service/jvmti-agent.cc A be/src/service/jvmti-agent.h M be/src/util/CMakeLists.txt A be/src/util/jvm-stack-trace.cc A be/src/util/jvm-stack-trace.h A be/src/util/scoped-watchdog.h M bin/impala-config.sh 14 files changed, 498 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/10696/3 -- To view, visit http://gerrit.cloudera.org:8080/10696 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I28e918761c120043d332b034450208eaf34e3e2b Gerrit-Change-Number: 10696 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Balazs Jeszenszky <jes...@gmail.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com>