Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 )
Change subject: IMPALA-10178 Run-time profile shall report skews ...................................................................... Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928 PS21, Line 1928: if (stddev > 5.0) { > In the past, stddev with a threshold of 5 served the purpose well. Would be interested into seeing what evidence we have to support this. Might be worth running this logic against some larger runtime profiles and see what comes out. With Parquet encodings, Parquet filter pushdown, page skipping, runtime filters, etc. I wouldn't expect the number of rows read by a scan node to be that close together. just want to make sure that the skew flag doesn't start popping up on every runtime profile we get, in which case folks will start to ignore it. another benchmark might be to see how many TPC-DS or TPC-H profiles the skew flag pops up on. maybe 30 GB would be enough scale, not sure. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 24 Sep 2020 20:32:43 +0000 Gerrit-HasComments: Yes
