Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 )
Change subject: IMPALA-10178 Run-time profile shall report skews ...................................................................... Patch Set 21: (3 comments) http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG@28 PS21, Line 28: 2. In each corresponding operator in the averaged profile, the name : of the counter, the list of values of the counter across the : impalad backend processes, and the stddev value. I'm a bit confused as to whether this just detects skew across all fragment instances on a single node, or does detect skew across all fragment instances across all nodes? http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h@202 PS21, Line 202: // Generate a string enumerating profiles rooted at this. : std::string DebugString(int indent = 0); where is this used? http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928 PS21, Line 1928: if (stddev > 5.0) { how well does this work as the number of rows processed by a counter increases? e.g. if there are nodes processing billions of rows, a std-dev of more than 5 doesn't seem that statistically significant I'm not entirely sure how it works but the single_node_perf_benchmark.py uses various tests to check if a difference in runtime profile counters are statistically significant. see report_benchmark_results.py which refers to things like "ttest t-value" and the "Mann-Whitney Z-value". I'm not stats expert but, simply hardcoding the threshold to 5 seems odd. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 24 Sep 2020 17:59:59 +0000 Gerrit-HasComments: Yes
