Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10696 )
Change subject: [DRAFT] IMPALA-6189: Add thread watchdogs for HDFS IO calls ...................................................................... Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/exec/hdfs-table-sink.cc@410 PS3, Line 410: hdfsFileInfo* info = hdfsGetPathInfo(output_partition->hdfs_connection, > This is an RPC too. Should we be watching it? Done http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc File be/src/kudu/util/debug-util.cc: http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc@62 PS3, Line 62: DEFINE_int32(stack_trace_sig_num, SIGRTMIN, "Signal used by the thread watchdog to " > Why is this configurable? Firstly, these changes in Kudu's util code are expected to be pushed to the Kudu repo and then I plan to cherry-pick them back to Impala to avoid divergence. Now, coming to your question, Kudu uses SIGUSR2 for the signal handler invocation which unfortunately does not work for Impala since JVM already uses it for something else[1]. The idea is to make it configurable in the Kudu code so that Impala can override it. If Kudu project also agrees to use a different signal handler, I suppose we don't need to make it configurable then. [1] https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/signals006.html http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc@121 PS3, Line 121: // Stack trace collected from the JVM thread (if one exists). : // TODO: Implement this as a callback rather than polluting Kudu's util code? : impala::JVMStackTrace jstack; > Please apply those changes to Kudu code base if possible. This avoids diver Yep, I need to discuss this with Lars and will try to push as much as I can into Kudu's code base. http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc@212 PS3, Line 212: << "Kudu will not produce thread stack traces."; > s/Kudu/Impala? Will try to make this more generic when I push a Kudu patch for review. Thanks for pointing it out. http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc@288 PS3, Line 288: ret << g_comm.jstack.Stringify(); > This is new code. I think it may make sense to feature flag it (--kernel_wa We do have a kill switch for this whole watchdog feature. Do we need to have one specially for the JVM part? http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/runtime/io/scan-range.cc@632 PS3, Line 632: fs_, hdfs_file, position_in_file > as mentioned before, we'll want to capture the argument values since that w I plan to fix this in the same patch. Kudu's logging assumes we are only passing filename:line_number (snippet below). lock_guard<simple_spinlock> l(log_lock_); LOG_STRING(WARNING, log_collector_.get()) << "Thread " << p << " stuck at " << frame->status_ << " for " << paused_ms << "ms" << ":\n" << "Kernel stack:\n" << kernel_stack << "\n" << "User stack:\n" << user_stack; } I will send out a consolidated Kudu patch (with all the required Kudu changes from the current patch) and then include this contextual information during rebase (after cherry-picking it onto Impala). http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/runtime/io/scan-range.cc@642 PS3, Line 642: current_bytes_read = hdfsRead(fs_, hdfs_file, buffer + *bytes_read, > Isn't this the main thing we're interested in? Done http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/service/CMakeLists.txt File be/src/service/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/service/CMakeLists.txt@53 PS3, Line 53: # Jvmti agent library that registers with the JVM on load. > As I mentioned elsewhere (perhaps later), I'm interested in whether this ad I don't think there is much overhead in the binary size. I'm not sure I understand the rest of your comment. May be can you point some docs to the workaround you are talking about? Without patch: -rwxrwxr-x 1 bharath bharath 380M Jun 22 11:35 libfesupport.so -rwxrwxr-x 1 bharath bharath 376M Jun 22 11:35 impalad -rwxrwxr-x 1 bharath bharath 375M Jun 22 11:36 session-expiry-test -rwxrwxr-x 1 bharath bharath 373M Jun 22 11:36 query-options-test -rwxrwxr-x 1 bharath bharath 372M Jun 22 11:36 hs2-util-test -rw-rw-r-- 1 bharath bharath 71M Jun 22 11:32 libService.a With Patch: -rwxrwxr-x 1 bharath bharath 380M Jun 22 11:24 libfesupport.so -rwxrwxr-x 1 bharath bharath 377M Jun 22 11:24 impalad -rwxrwxr-x 1 bharath bharath 376M Jun 17 09:57 session-expiry-test -rwxrwxr-x 1 bharath bharath 374M Jun 17 09:57 query-options-test -rwxrwxr-x 1 bharath bharath 373M Jun 17 09:57 hs2-util-test -rw-rw-r-- 1 bharath bharath 71M Jun 22 11:23 libService.a -rwxrwxr-x 1 bharath bharath 225K Jun 16 16:49 libjvmtiagent.so http://gerrit.cloudera.org:8080/#/c/10696/3/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/10696/3/bin/impala-config.sh@619 PS3, Line 619: LIBHDFS_OPTS+=" -agentpath:$IMPALA_HOME/be/build/latest/service/libjvmtiagent.so" I commented about the binary sizes in the other file. How is this increasing the reliance on libhdfs/jni_helper.c? It is not included anywhere. May be I'm not following you. Could you expand on your comment? > How do these settings work when we're running on clusters? We need to do the same thing. We already configure a lot of JVM parameters like heap size etc, we should do something similar I guess. -- To view, visit http://gerrit.cloudera.org:8080/10696 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28e918761c120043d332b034450208eaf34e3e2b Gerrit-Change-Number: 10696 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Balazs Jeszenszky <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Zoram Thanga <[email protected]> Gerrit-Comment-Date: Fri, 22 Jun 2018 19:03:23 +0000 Gerrit-HasComments: Yes
