Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10696 )

Change subject: [DRAFT] IMPALA-6189: Add thread watchdogs for HDFS IO calls
......................................................................


Patch Set 3:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/exec/hdfs-table-sink.cc@410
PS3, Line 410:     hdfsFileInfo* info = 
hdfsGetPathInfo(output_partition->hdfs_connection,
> This is an RPC too. Should we be watching it?
Done


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc
File be/src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc@62
PS3, Line 62: DEFINE_int32(stack_trace_sig_num, SIGRTMIN, "Signal used by the 
thread watchdog to "
> Why is this configurable?
Firstly, these changes in Kudu's util code are expected to be pushed to the 
Kudu repo and then I plan to cherry-pick them back to Impala to avoid 
divergence. Now, coming to your question,

Kudu uses SIGUSR2 for the signal handler invocation which unfortunately does 
not work for Impala since JVM already uses it for something else[1]. The idea 
is to make it configurable in the Kudu code so that Impala can override it. If 
Kudu project also agrees to use a different signal handler, I suppose we don't 
need to make it configurable then.

[1] 
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/signals006.html


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc@121
PS3, Line 121:   // Stack trace collected from the JVM thread (if one exists).
             :   // TODO: Implement this as a callback rather than polluting 
Kudu's util code?
             :   impala::JVMStackTrace jstack;
> Please apply those changes to Kudu code base if possible. This avoids diver
Yep, I need to discuss this with Lars and will try to push as much as I can 
into Kudu's code base.


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc@212
PS3, Line 212:                    << "Kudu will not produce thread stack 
traces.";
> s/Kudu/Impala?
Will try to make this more generic when I push a Kudu patch for review. Thanks 
for pointing it out.


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc@288
PS3, Line 288:       ret << g_comm.jstack.Stringify();
> This is new code. I think it may make sense to feature flag it (--kernel_wa
We do have a kill switch for this whole watchdog feature. Do we need to have 
one specially for the JVM part?


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/runtime/io/scan-range.cc@632
PS3, Line 632: fs_, hdfs_file, position_in_file
> as mentioned before, we'll want to capture the argument values since that w
I plan to fix this in the same patch. Kudu's logging assumes we are only 
passing filename:line_number (snippet below).

lock_guard<simple_spinlock> l(log_lock_);
          LOG_STRING(WARNING, log_collector_.get())
              << "Thread " << p << " stuck at " << frame->status_
              << " for " << paused_ms << "ms" << ":\n"
              << "Kernel stack:\n" << kernel_stack << "\n"
              << "User stack:\n" << user_stack;
        }

I will send out a consolidated Kudu patch (with all the required Kudu changes 
from the current patch) and then include this contextual information during 
rebase (after cherry-picking it onto Impala).


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/runtime/io/scan-range.cc@642
PS3, Line 642:               current_bytes_read = hdfsRead(fs_, hdfs_file, 
buffer + *bytes_read,
> Isn't this the main thing we're interested in?
Done


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/service/CMakeLists.txt
File be/src/service/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/service/CMakeLists.txt@53
PS3, Line 53: # Jvmti agent library that registers with the JVM on load.
> As I mentioned elsewhere (perhaps later), I'm interested in whether this ad
I don't think there is much overhead in the binary size. I'm not sure I 
understand the rest of your comment. May be can you point some docs to the 
workaround you are talking about?

Without patch:

-rwxrwxr-x 1 bharath bharath 380M Jun 22 11:35 libfesupport.so
-rwxrwxr-x 1 bharath bharath 376M Jun 22 11:35 impalad
-rwxrwxr-x 1 bharath bharath 375M Jun 22 11:36 session-expiry-test
-rwxrwxr-x 1 bharath bharath 373M Jun 22 11:36 query-options-test
-rwxrwxr-x 1 bharath bharath 372M Jun 22 11:36 hs2-util-test
-rw-rw-r-- 1 bharath bharath  71M Jun 22 11:32 libService.a


With Patch:
-rwxrwxr-x 1 bharath bharath 380M Jun 22 11:24 libfesupport.so
-rwxrwxr-x 1 bharath bharath 377M Jun 22 11:24 impalad
-rwxrwxr-x 1 bharath bharath 376M Jun 17 09:57 session-expiry-test
-rwxrwxr-x 1 bharath bharath 374M Jun 17 09:57 query-options-test
-rwxrwxr-x 1 bharath bharath 373M Jun 17 09:57 hs2-util-test
-rw-rw-r-- 1 bharath bharath  71M Jun 22 11:23 libService.a
-rwxrwxr-x 1 bharath bharath 225K Jun 16 16:49 libjvmtiagent.so


http://gerrit.cloudera.org:8080/#/c/10696/3/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/10696/3/bin/impala-config.sh@619
PS3, Line 619: LIBHDFS_OPTS+=" 
-agentpath:$IMPALA_HOME/be/build/latest/service/libjvmtiagent.so"
I commented about the binary sizes in the other file.

How is this increasing the reliance on libhdfs/jni_helper.c? It is not included 
anywhere. May be I'm not following you. Could you expand on your comment?

> How do these settings work when we're running on clusters?

We need to do the same thing. We already configure a lot of JVM parameters like 
heap size etc, we should do something similar I guess.



--
To view, visit http://gerrit.cloudera.org:8080/10696
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28e918761c120043d332b034450208eaf34e3e2b
Gerrit-Change-Number: 10696
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Balazs Jeszenszky <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Zoram Thanga <[email protected]>
Gerrit-Comment-Date: Fri, 22 Jun 2018 19:03:23 +0000
Gerrit-HasComments: Yes

Reply via email to