Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10696 )

Change subject: [DRAFT] IMPALA-6189: Add thread watchdogs for HDFS IO calls
......................................................................


Patch Set 3:

(7 comments)

Hi! Thanks for doing this.

The big question for me is what our policy is here. Should we be doing this for 
all blocking system calls? All blocking RPCs? How do we maintain that we have 
decent coverage?

For the Java stuff, an alternative approach is to call out via our regular 
Thrift-y/JNI-y route to ask Java to get the stack trace using management beans. 
I'm pretty sure you can match native thread ids to Java ones, though based on 
reading 
https://gist.github.com/rednaxelafx/843622/eb0b0877ff4aac77c76e5a50f7621dc32ea451eb
 it looks like it's hard. (In jstack, it's the nid=... but you need to do a hex 
to decimal conversion for pids. But it looks like it's not readily available 
out of Java.)

I think it may also make sense to expose a counter on how often this happens. A 
monitoring tool would want to alert if this is happening a lot, and it won't 
want to grep the logs. Even more interesting would be to write down when it 
happens on behalf of a query in the profile, but that's not always possible.

http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/exec/hdfs-table-sink.cc@410
PS3, Line 410:     hdfsFileInfo* info = 
hdfsGetPathInfo(output_partition->hdfs_connection,
This is an RPC too. Should we be watching it?


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc
File be/src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc@62
PS3, Line 62: DEFINE_int32(stack_trace_sig_num, SIGRTMIN, "Signal used by the 
thread watchdog to "
Why is this configurable?


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc@212
PS3, Line 212:                    << "Kudu will not produce thread stack 
traces.";
s/Kudu/Impala?


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/kudu/util/debug-util.cc@288
PS3, Line 288:       ret << g_comm.jstack.Stringify();
This is new code. I think it may make sense to feature flag it 
(--kernel_watchdog_jstack_enabled) so that we can turn it off if we find that 
it's wrong.


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/runtime/io/scan-range.cc@642
PS3, Line 642:               current_bytes_read = hdfsRead(fs_, hdfs_file, 
buffer + *bytes_read,
Isn't this the main thing we're interested in?


http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/service/CMakeLists.txt
File be/src/service/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/10696/3/be/src/service/CMakeLists.txt@53
PS3, Line 53: # Jvmti agent library that registers with the JVM on load.
As I mentioned elsewhere (perhaps later), I'm interested in whether this adds a 
lot to our binary. We're already launching the JVM from a statically linked 
binary; do we need to point it to another file to get methods? (There's a way 
around this for normal JNI libraries in JDK1.8+ though I'd have to look it up.)


http://gerrit.cloudera.org:8080/#/c/10696/3/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/10696/3/bin/impala-config.sh@619
PS3, Line 619: LIBHDFS_OPTS+=" 
-agentpath:$IMPALA_HOME/be/build/latest/service/libjvmtiagent.so"
It surprises me that we have to do this with options, rather than being able to 
do it directly. What's going on is that we're doubling down here on our 
reliance on hadoop-hdfs/src/main/native/libhdfs/jni_helper.c. It's definitely 
not a public API that we should be depending on.

Joe ran into 
https://issues.apache.org/jira/browse/HDFS-12628?jql=reporter%3Djoemcdonnell%20and%20project%3Dhdfs
 which is somewhat related.

It's also problematic that we're introducing a new .so. We shouldn't have to 
since we statically link things. What's the size of the new .SO? Are we adding 
100MB to everyone's builds?

impala-config.sh is read in our build process. How do these settings work when 
we're running on clusters?



--
To view, visit http://gerrit.cloudera.org:8080/10696
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28e918761c120043d332b034450208eaf34e3e2b
Gerrit-Change-Number: 10696
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Balazs Jeszenszky <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Zoram Thanga <[email protected]>
Gerrit-Comment-Date: Fri, 22 Jun 2018 15:23:48 +0000
Gerrit-HasComments: Yes

Reply via email to