Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 )
Change subject: IMPALA-10178 Run-time profile shall report skews ...................................................................... Patch Set 15: (7 comments) http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@22 PS15, Line 22: 3. RowsReturned in GroupingAggregator profile > It would be good to add this for sort operations too. We have SortDataSize Good point. Added the following: {"EXCHANGE_NODE", "RowsReturned"} and {"SORT_NODE", "RowsReturned"}. Since sort does not drop tuples, I guess RowsReturned should be OK. http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@36 PS15, Line 36: skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) > I thought a bit about whether using the info string was the right approach Yeah. The skew summary follows the current model by adding some extra info strings to the aggregated profile. http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@415 PS15, Line 415: o. > can you explicitly say that it's returned in 'details'. Done http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@416 PS15, Line 416: > nit: convention is to use pointer for output args. Done http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc@1969 PS15, Line 1969: int num_valid_values = NumValidValues(); > I don't think this quite works since valid values could be added concurrent Done http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h File be/src/util/stat-util.h: http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28 PS15, Line 28: /// Computes standard deviation given mean > Is this the population standard deviation or the sample standard deviation? Added some comments to clarify that it is the population version that is computed. Also Add 'P' in the function name. http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28 PS15, Line 28: /// Computes standard deviation given mean > I guess this documentation was already missing but would be good to fix Done -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 15 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Wed, 23 Sep 2020 17:51:29 +0000 Gerrit-HasComments: Yes
