Bharath Vissapragada has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10696


Change subject: [DRAFT] IMPALA-6189: Add thread watchdogs for HDFS IO calls
......................................................................

[DRAFT] IMPALA-6189: Add thread watchdogs for HDFS IO calls

This commit adds scoped thread watchdogs around blocking HDFS
IO calls. These dump a user (native + JVM) and kernel stack trace
to the logs if the blocking function call runs for greater than
'threshold' milliseconds. Having such traces will help debug issues
with query hangs due to hangs in external components.

The threshold can be configured using --thread_watchdog_threshold_ms

If stack unwinding tends to be CPU heavy and causing performance
problems, the watchdogs can be disabled by setting

--thread_watchdog_threshold_ms=0 (default = 5000ms)

Usage:
-----
This patch adds a macro called SCOPED_WATCHDOG(). Following
is a sample usage:

{
  SCOPED_WATCHDOG();
  hdfsOpenFile(fs, fname, O_RDONLY, 0, 0, 0)
}

The above code logs kernel and user level stacks for the current thread
if hdfsOpenFile() takes more than 5000 milliseconds.

Sample output:
-------------

Native Stacks:
==============

Injected a manual sleep using SleepForMs() before hdfsOpenFile() to
simulate a hang and the following is logged in the impalad logs.

W0611 16:21:01.602687   347 kernel_stack_watchdog.cc:146] Thread 383
stuck at /home/bharath/Impala/be/src/runtime/io/handle-cache.inline.h:34
for 5036ms:
Kernel stack:
[<0000000000000000>] hrtimer_nanosleep+0xbb/0x180
[<0000000000000000>] SyS_nanosleep+0x66/0x80
[<0000000000000000>] system_call_fastpath+0x1a/0x1f
[<0000000000000000>] 0xffffffffffffffff

User stack:
    @     0x7f808e7fed40  (unknown)
    @     0x7f808eb9db9d  __libc_nanosleep
    @          0x1f460e6  std::this_thread::sleep_for<>()
    @          0x1f44edd  impala::SleepForMs()
    @          0x2fef2ca  impala::io::HdfsFileHandle::HdfsFileHandle()
    @          0x2fef5f7  
impala::io::CachedHdfsFileHandle::CachedHdfsFileHandle()
    @          0x2fefe78  impala::io::FileHandleCache::GetFileHandle()
    @          0x2ff3e3f  impala::io::DiskIoMgr::GetCachedHdfsFileHandle()
    @          0x300fad7  impala::io::ScanRange::Read()
    @          0x300bca4  impala::io::ScanRange::DoRead()
    @          0x2ff3207  impala::io::DiskQueue::DiskThreadLoop()
    @          0x2ffac32  boost::_mfi::mf1<>::operator()()
    @          0x2ffa79f  boost::_bi::list2<>::operator()<>()
    @          0x2ff9fbd  boost::_bi::bind_t<>::operator()()
    @          0x2ff970a  
boost::detail::function::void_function_obj_invoker0<>::invoke()
    @          0x1b4e388  boost::function0<>::operator()()

JVM Stack:
Thread not attached to the JVM.

JVM Stacks:
===========

If a JVM Thread is attached for a given hung native thread, the the
stack trace output looks like follows.

W0618 20:55:45.988883  1220 kernel_stack_watchdog.cc:147] Thread 1381
stuck at be/src/service/impala-server.cc:882 for 5083ms:
Kernel stack:
[<0000000000000000>] futex_wait_queue_me+0xde/0x140
[<0000000000000000>] futex_wait+0x182/0x290
[<0000000000000000>] do_futex+0xde/0x6a0
[<0000000000000000>] SyS_futex+0x71/0x150
[<0000000000000000>] system_call_fastpath+0x1a/0x1f
[<0000000000000000>] 0xffffffffffffffff

User stack:
    @     0x7fee18315d40  (unknown)
    @     0x7fee186b17ce  __pthread_cond_timedwait
    @     0x7fee1b1d6a9f  os::PlatformEvent::park()
    @     0x7fee1b1c2c3f  ObjectMonitor::wait()
    @     0x7fee1afd5ed2  JVM_MonitorWait
    @     0x7fee03cb6ca4  (unknown)

JVM Stack:
    Ljava/lang/Object;wait()
    Lorg/apache/impala/catalog/ImpaladCatalog;waitForCatalogUpdate(Line:239)
    Lorg/apache/impala/analysis/StmtMetadataLoader;loadTables(Line:136)
    Lorg/apache/impala/analysis/StmtMetadataLoader;loadTables(Line:115)
    Lorg/apache/impala/service/Frontend;createExecRequest(Line:996)
    Lorg/apache/impala/service/JniFrontend;createExecRequest(Line:152)

Change-Id: I28e918761c120043d332b034450208eaf34e3e2b
---
M be/src/common/global-flags.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/kudu/util/debug-util.cc
M be/src/kudu/util/kernel_stack_watchdog.cc
M be/src/runtime/io/handle-cache.inline.h
M be/src/runtime/io/scan-range.cc
M be/src/service/CMakeLists.txt
A be/src/service/jvmti-agent.cc
A be/src/service/jvmti-agent.h
M be/src/util/CMakeLists.txt
A be/src/util/jvm-stack-trace.cc
A be/src/util/jvm-stack-trace.h
A be/src/util/scoped-watchdog.h
M bin/impala-config.sh
14 files changed, 498 insertions(+), 27 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/10696/3
--
To view, visit http://gerrit.cloudera.org:8080/10696
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I28e918761c120043d332b034450208eaf34e3e2b
Gerrit-Change-Number: 10696
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Balazs Jeszenszky <jes...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>

Reply via email to