Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11001 )
Change subject: IMPALA-7234: Improve memory estimates produced by the Planner ...................................................................... Patch Set 6: (4 comments) Just a few nits, otherwise looks good and will be some valuable cleanp http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1364 PS6, Line 1364: int perHostScanRanges; : if (fileFormats_.contains(HdfsFileFormat.PARQUET) : || fileFormats_.contains(HdfsFileFormat.ORC)) { I think it would be clearer if we iterated over the formats, computed the perHostScanRanges for each and took the max - this would match the intent described in the comment more obviously. I don't think perHostScanRanges for non-columnar formats is guaranteed to be lower anyway since the formula is non-trivial. http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1370 PS6, Line 1370: the scan ranges should be allocated based : // on column reservations. This is a bit misleading since this calculation is purely an estimate and doesn't affect the behaviour of the query at all. I would just say something like "From the resource management purview, we want to conservatively estimate memory consumption based on the partition with the highest memory requirements." http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@130 PS6, Line 130: we still need to reserve : // 1GB of buffer for insertion. We're not really reserving anything based on this estimate for now - maybe just something like "even if there are non-Parquet partitions, we want to be conservative make a high memory estimate.". http://gerrit.cloudera.org:8080/#/c/11001/6/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@137 PS6, Line 137: return 100L * 1024L; Yeah these estimates are pretty bogus :). We will revisit them at some point. -- To view, visit http://gerrit.cloudera.org:8080/11001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0666ae3d45fbd8615d3fa9a8626ebd29cf94fb4b Gerrit-Change-Number: 11001 Gerrit-PatchSet: 6 Gerrit-Owner: Pooja Nilangekar <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Pooja Nilangekar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 30 Jul 2018 19:19:03 +0000 Gerrit-HasComments: Yes
