Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10696 )
Change subject: [DRAFT] IMPALA-6189: Add thread watchdogs for HDFS IO calls ...................................................................... Patch Set 4: Forgot to respond to the top-level comment. > The big question for me is what our policy is here. Should we be doing this > for all blocking system calls? All blocking RPCs? How do we maintain that we > have decent coverage? I think we should start this with HDFS calls since they are notorious in this regarding especially around not timing out. If this is committed and the macro is available, we can find out other problematic blocking calls and cover them too. > For the Java stuff, an alternative approach is to call out via our regular > Thrift-y/JNI-y route to ask Java to get the stack trace using management > beans. I'm pretty sure you can match native thread ids to Java ones, though > based on reading > https://gist.github.com/rednaxelafx/843622/eb0b0877ff4aac77c76e5a50f7621dc32ea451eb > it looks like it's hard. (In jstack, it's the nid=... but you need to do a > hex to decimal conversion for pids. But it looks like it's not readily > available out of Java.) Tried this, but I couldn't find a way to map the nid <-> tid using the API. Thread API in Java does not expose the nid of the thread (which is weird). Of course we could do it the hacky way by parsing the jstack output and extracting the nid field but I don't think that is a reasonable approach. Not sure if you have any other ideas to achieve the same. > I think it may also make sense to expose a counter on how often this happens. > A monitoring tool would want to alert if this is happening a lot, and it > won't want to grep the logs. Even more interesting would be to write down > when it happens on behalf of a query in the profile, but that's not always > possible. Yea, I thought about this. Let me think a bit more and get back to you, flushing out the comments meanwhile. -- To view, visit http://gerrit.cloudera.org:8080/10696 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28e918761c120043d332b034450208eaf34e3e2b Gerrit-Change-Number: 10696 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Balazs Jeszenszky <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Zoram Thanga <[email protected]> Gerrit-Comment-Date: Fri, 22 Jun 2018 22:26:15 +0000 Gerrit-HasComments: No
