Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11021 )
Change subject: IMPALA-6214: Determine and warn about stuck fragment instances. ...................................................................... Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/11021/2/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/11021/2/be/src/runtime/krpc-data-stream-recvr.cc@238 PS2, Line 238: VLOG_QUERY << "wait arrival fragment_instance_id=" VLOG_QUERY is on by default, so this would become very very noisy, once per wait, no? I think we'd only want to log if we have hit a timeout from the below CV wait. Also, I don't know the context of this code quite well enough, but isn't it normal to sometimes wait for minutes on a sender? For example, if the upstream node is a full sort, or a join with a lot of slow parents then the receiver side may block for minutes or even hours before making progress. In that case, I can see surfacing this kind of information in the profile or in some query-scoped log but maybe not in the global impalad log? I think I don't quite understand the end goal of this JIRA well enough to evaluate whether this change is a net help. Why doesn't the existing data_wait_timer already tell us that this node is the blocking culprit, and from there we can just look at the fragment graph to understand what was blocking it? -- To view, visit http://gerrit.cloudera.org:8080/11021 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I260a1d0a3477e5c6a46094e664500c3e2ed7de62 Gerrit-Change-Number: 11021 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Tue, 24 Jul 2018 23:36:23 +0000 Gerrit-HasComments: Yes
