Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9757 )
Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation ...................................................................... Patch Set 6: (17 comments) Will send backend comments separately. http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG@37 PS6, Line 37: ideal reservation above you say we no longer need a concept of ideal reservation -- so what is this referring to? I guess here you are talking about the ideal determined at runtime once we examine the footer? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@150 PS6, Line 150: backend IOMgr is the terminology we use below (and I think that's clearer). http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@157 PS6, Line 157: FOOTER_SIZE maybe name both the same so that a grep will also find both. http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1264 PS6, Line 1264: fileFormats_.contains(HdfsFileFormat.PARQUET) > will estimates be biased towards parquet if there are multiple formats? Do we have enough testing for the multiple format case when it comes to reservations? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1320 PS6, Line 1320: Bound the reservation : * based on: is the following list a lower bound, upper bound, or both? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1326 PS6, Line 1326: . and is available. or .. try to increase .. http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1337 PS6, Line 1337: columnReservations != null since != and == have same precedence, let's parenthesize this for clarity. http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1373 PS6, Line 1373: memory (in the future we could imagine other resource reservations) http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1374 PS6, Line 1374: Returns one value per column. like you say below, these values aren't yet quantized the buffer sizes, right? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1378 PS6, Line 1378: * logic for non-shredded columnar formats if added. generally, do we need to worry about orc for this patch? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1394 PS6, Line 1394: // Not a top-level column, e.g. a nested collection - no stats available. what does this case mean? scalar but not top level but also different than the case at line 1400? is this the array/list case? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1417 PS6, Line 1417: compute since this one appends (rather than returns like the other functions) maybe calll it addMinColumnReservation() or appendMinColumnReservation(). http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1440 PS6, Line 1440: column scalar column? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1448 PS6, Line 1448: computeMinColumnReservation should this be computeMinColumnReservationForScalar() or similar? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1469 PS6, Line 1469: return estimate; have you verified how close this comes out to the actual size for various datatypes for some real column chunks? is it worth adding some counter to the profile for the actual size to debug cases where the reservation estimate was way off, or is that overkill? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1472 PS6, Line 1472: private static long roundUpToIoBuffer(long bytes, long maxIoBufferSize) { add a quick function comment. http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1475 PS6, Line 1475: BitUtil.roundUpToPowerOf2Factor(bytes, maxIoBufferSize); why is it the right thing to always have a multiple of maxIoBufferSize in this case? i think that's worth explaining briefly. -- To view, visit http://gerrit.cloudera.org:8080/9757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Gerrit-Change-Number: 9757 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Tue, 17 Apr 2018 22:54:21 +0000 Gerrit-HasComments: Yes