[Impala-ASF-CR] IMPALA-6679,IMPALA-6678: reduce scan reservation

Dan Hecht (Code Review) Tue, 17 Apr 2018 15:54:57 -0700

Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9757 )


Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation
......................................................................


Patch Set 6:

(17 comments)

Will send backend comments separately.

http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG@37
PS6, Line 37: ideal reservation
above you say we no longer need a concept of ideal reservation -- so what is 
this referring to? I guess here you are talking about the ideal determined at 
runtime once we examine the footer?


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@150
PS6, Line 150: backend
IOMgr is the terminology we use below (and I think that's clearer).


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@157
PS6, Line 157: FOOTER_SIZE
maybe name both the same so that a grep will also find both.


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1264
PS6, Line 1264: fileFormats_.contains(HdfsFileFormat.PARQUET)
> will estimates be biased towards parquet if there are multiple formats?
Do we have enough testing for the multiple format case when it comes to 
reservations?


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1320
PS6, Line 1320: Bound the reservation
              :    *  based on:
is the following list a lower bound, upper bound, or both?


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1326
PS6, Line 1326: .
and is available. or .. try to increase ..


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1337
PS6, Line 1337: columnReservations != null
since != and == have same precedence, let's parenthesize this for clarity.


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1373
PS6, Line 1373:
memory
(in the future we could imagine other resource reservations)


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1374
PS6, Line 1374:  Returns one value per column.
like you say below, these values aren't yet quantized the buffer sizes, right?


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1378
PS6, Line 1378:    * logic for non-shredded columnar formats if added.
generally, do we need to worry about orc for this patch?


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1394
PS6, Line 1394:             // Not a top-level column, e.g. a nested collection 
- no stats available.
what does this case mean? scalar but not top level but also different than the 
case at line 1400? is this the array/list case?


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1417
PS6, Line 1417: compute
since this one appends (rather than returns like the other functions) maybe 
calll it addMinColumnReservation() or appendMinColumnReservation().


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1440
PS6, Line 1440: column
scalar column?


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1448
PS6, Line 1448: computeMinColumnReservation
should this be computeMinColumnReservationForScalar() or similar?


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1469
PS6, Line 1469:     return estimate;
have you verified how close this comes out to the actual size for various 
datatypes for some real column chunks? is it worth adding some counter to the 
profile for the actual size to debug cases where the reservation estimate was 
way off, or is that overkill?


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1472
PS6, Line 1472:   private static long roundUpToIoBuffer(long bytes, long 
maxIoBufferSize) {
add a quick function comment.


http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1475
PS6, Line 1475: BitUtil.roundUpToPowerOf2Factor(bytes, maxIoBufferSize);
why is it the right thing to always have a multiple of maxIoBufferSize in this 
case? i think that's worth explaining briefly.



--
To view, visit http://gerrit.cloudera.org:8080/9757
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Gerrit-Change-Number: 9757
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Apr 2018 22:54:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6679,IMPALA-6678: reduce scan reservation

Reply via email to