Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
......................................................................


Patch Set 15:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@22
PS15, Line 22:   3. RowsReturned in GroupingAggregator profile
> It would be good to add this for sort operations too. We have SortDataSize
Good point.

Added the following:
{"EXCHANGE_NODE", "RowsReturned"} and {"SORT_NODE", "RowsReturned"}.

Since sort does not drop tuples, I guess RowsReturned should be OK.


http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@36
PS15, Line 36:         skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE 
(id=0)
> I thought a bit about whether using the info string was the right approach
Yeah. The skew summary follows the current model by adding some extra info 
strings to the aggregated profile.


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@415
PS15, Line 415: o.
> can you explicitly say that it's returned in 'details'.
Done


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@416
PS15, Line 416:
> nit: convention is to use pointer for output args.
Done


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc@1969
PS15, Line 1969:   int num_valid_values = NumValidValues();
> I don't think this quite works since valid values could be added concurrent
Done


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h
File be/src/util/stat-util.h:

http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28
PS15, Line 28:   /// Computes standard deviation given mean
> Is this the population standard deviation or the sample standard deviation?
Added some comments to clarify that it is the population version that is 
computed. Also Add 'P' in the function name.


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28
PS15, Line 28:   /// Computes standard deviation given mean
> I guess this documentation was already missing but would be good to fix
Done



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Wed, 23 Sep 2020 17:51:29 +0000
Gerrit-HasComments: Yes

Reply via email to