Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
......................................................................


Patch Set 21:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG@28
PS21, Line 28:   2. In each corresponding operator in the averaged profile, the 
name
             :      of the counter, the list of values of the counter across the
             :      impalad backend processes, and the stddev value.
I'm a bit confused as to whether this just detects skew across all fragment 
instances on a single node, or does detect skew across all fragment instances 
across all nodes?


http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h@202
PS21, Line 202:   // Generate a string enumerating profiles rooted at this.
              :   std::string DebugString(int indent = 0);
where is this used?


http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928
PS21, Line 1928:   if (stddev > 5.0) {
how well does this work as the number of rows processed by a counter increases? 
e.g. if there are nodes processing billions of rows, a std-dev of more than 5 
doesn't seem that statistically significant

I'm not entirely sure how it works but the single_node_perf_benchmark.py uses 
various tests to check if a difference in runtime profile counters are 
statistically significant. see report_benchmark_results.py which refers to 
things like "ttest t-value" and the "Mann-Whitney Z-value".

I'm not stats expert but, simply hardcoding the threshold to 5 seems odd.



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 24 Sep 2020 17:59:59 +0000
Gerrit-HasComments: Yes

Reply via email to