[Impala-ASF-CR] IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate

Aman Sinha (Code Review) Mon, 04 Jan 2021 18:24:46 -0800

Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16842 )


Change subject: IMPALA-10377 Improve the accuracy of resource estimation 
PlanNode does not consider some factors when estimating memory, this will cause 
a large error rate
......................................................................


Patch Set 10:

(5 comments)

Thanks for removing the extraneous test files.  I am sending some initial 
comments based on a quick look.  Will send more later.

http://gerrit.cloudera.org:8080/#/c/16842/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16842/10//COMMIT_MSG@7
PS10, Line 7: IMPALA-10377 Improve the accuracy of resource estimation
Few general suggestions on the commit message:
 -  The 'IMPALA-xxxx' should be followed by a colon ':'.
 - The lines are wider than the 90 chars limit that checkstyle enforces..I am 
not sure if this passed the checkstyle but would be useful to re-format to 
conform to the per line limit.
 - Add a short summary of the Testing changes


http://gerrit.cloudera.org:8080/#/c/16842/10//COMMIT_MSG@25
PS10, Line 25: 1.Estimate memory by scanning all columns,the estimated memory 
is much larger than the actual use.
Is this describing the original issue or the nature of the fix itself ? I 
assume it is the original issue. .. in that case can you rephrase to avoid 
confusion ?  Something like 'Current estimate is based on all columns being 
read which over-estimates the memory. This patch fixes it by ...'   Actually, 
the descriptions of the other nodes above could also benefit by clarifying the 
old and new behavior.


http://gerrit.cloudera.org:8080/#/c/16842/10/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/16842/10/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@66
PS10, Line 66:   private final static long MIN_AGG_MEM = 100L * 1024L;
It seems if we are fixing this estimate for plain aggregates (i.e with no 
group-by), even 100KB seems kind of large.  If we have SELECT count(*), min(a), 
sum(b) ...  each of these is a scalar value of BIGINT type and consumes 8 bytes 
each, so if on average we have 10 plain aggregates in a query, it is only 80 
bytes.  Can we make this estimate a function of the number of plain aggregate 
functions in an AggregationNode ?

Also,  the name MIN_AGG_MEM may be generic ..since this is specifically for the 
no group-by case. One suggestion is MIN_PLAIN_AGG_MEM.  Other suggestions ?


http://gerrit.cloudera.org:8080/#/c/16842/10/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@538
PS10, Line 538:             // A skew factor of 1.5 was added to account for 
data skew among
nit: you need not mention the 1.5 value here because it is already a static 
final constant.  If someone changes the value in the code, this comment will 
become inconsistent.


http://gerrit.cloudera.org:8080/#/c/16842/10/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16842/10/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@324
PS10, Line 324:     for (SlotDescriptor slot: desc_.getSlots()) {
You could add a new method numMaterializedSlots() to TupleDescriptor because it 
could be potentially used in other plan nodes.



--
To view, visit http://gerrit.cloudera.org:8080/16842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01db168ff2c6d6de33ee553a8175599f035d7a1
Gerrit-Change-Number: 16842
Gerrit-PatchSet: 10
Gerrit-Owner: liuyao <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: liuyao <[email protected]>
Gerrit-Comment-Date: Tue, 05 Jan 2021 02:24:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate

Reply via email to