Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/9757 )
Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation ...................................................................... Patch Set 6: (20 comments) > This isn't ready for review yet, but since you were interested. > I've been able to confirm that the mem_usage_scaling regressions > were purely because of reserving more memory than needed. I haven't > investigated if these changes are sufficient to solve the stress > test regressions. did you get time to investigate how this affects the stress test regression? http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG@13 PS6, Line 13: When starting each scan range, check to see how big the initial scan : range is (the full thing for row-based formats, the footer for : Parquet) and determine whether more reservation would be useful. : : For Parquet, check the actual column layout of the file before : increasing reservation to avoid getting reservation that we : can't actually use for the files that we're scanning. This also : means we can avoid estimating an "ideal" reservation in the planner. do we have tests that verify that the reservation is increased to expected levels after a low planner estimate? both for parquet and other formats http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG@43 PS6, Line 43: Added counters to track when threads were not spawned due to reservation : and to track when reservation increases are requested and denied. add tests to verify that reservations were denied if enough was not available http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-parquet-scanner-test.cc File be/src/exec/hdfs-parquet-scanner-test.cc: http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-parquet-scanner-test.cc@45 PS6, Line 45: // Should round up to nearest power-of-two buffer size if < max scan range buffer maybe add a case that is somewhere in between min and max buffer http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-parquet-scanner.cc@1699 PS6, Line 1699: ideal_reservation += min(BitUtil::RoundUpToPowerOf2(len, max_buffer_size), : DiskIoMgr::IDEAL_MAX_SIZED_BUFFERS_PER_SCAN_RANGE * max_buffer_size); maybe mention here briefly why we increment in steps of max_buffer_size if len>max_buffer_size. Similar to description in line 1750 -1753 http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-scan-node-base.cc@527 PS6, Line 527: int64_t target = BitUtil::RoundUpToPowerOf2(*reservation, max_buffer_size); : if (*reservation == target) { : // If it's already an I/O buffer multiple, increase by an I/O buffer or to ideal. : target = min(*reservation + max_buffer_size, ideal_scan_range_reservation); : } how about int64_t target = BitUtil::RoundUpToPowerOf2( *reservation + 1, max_buffer_size); this will always result an increase to an I/O buffer multiple upto the ideal_scan_range_reservation (since *reservation < ideal_scan_range_reservation) http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-scan-node.h@178 PS6, Line 178: /// this thread with DeductReservationForScannerThread(). maybe mention that it releases Reservation and only holds on to minimum reservation. http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-scan-node.h@204 PS6, Line 204: or nit: return all maybe mention that it holds on to minimum reservation http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-scan-node.cc@226 PS6, Line 226: spare_reservation_ what if between line 225 and 226, some scanner thread releases its reservation, which results in spare reservation being more than resource_profile_.min_reservation and 'increase' being negative. does that mean that buffer_pool_client_.IncreaseReservation would actually let go of reservation? If yes, is that the intended behaviour? http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-scan-node.cc@392 PS6, Line 392: // Take a snapshot of num_unqueued_files_ before calling GetNextUnstartedRange(). : // We don't want num_unqueued_files_ to go to zero between the return from : // GetNextUnstartedRange() and the check for when all ranges are complete. update comment: replace GetNextUnstartedRange with StartNextScanRange. Here and at line 434 http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-scan-node.cc@481 PS6, Line 481: *scanner_thread_reservation, partition, filter_ctxs, expr_results_pool); : context.AddStream(scan_range, *scanner_thread_reservation); reservation passed twice. can we remove redundancy by using context->total_reservation_ inside AddStream()? http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/hdfs-scan-node.cc@518 PS6, Line 518: // Reservation may have been increased by the scanner. maybe mention, that this happens only in case of column formats like parquet scanner http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: http://gerrit.cloudera.org:8080/#/c/9757/6/be/src/exec/scanner-context.h@379 PS6, Line 379: o nit: to allow http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@150 PS6, Line 150: // Default reservation for a backend scan range for a column in columnar formats like should we mention the units just to ward off any ambiguity? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@154 PS6, Line 154: // TODO: is it worth making this a tunable query option? not a huge fan of adding more knobs that the user doesnt know how to tune. If the purpose of providing this as a query option is to have a workaround, what kindof scenario do you think this would be useful in? a lot of small files without stats? would a better solution be to calculate stats instead? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@158 PS6, Line 158: FOOTER_READ_SIZE nit: how about PARQUET_FOOTER_READ_SIZE, just to make it explicit http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@752 PS6, Line 752: maxScanRangeBytes_ confusing at first when you look at this and maxScanRangeLength, maybe rename maxScanRangeLength to maxScanRangeLengthLimit OR rename maxScanRangeLength to maxScanRangeLengthUsed ? one is enforced limit and other is the actual max length used. http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1264 PS6, Line 1264: fileFormats_.contains(HdfsFileFormat.PARQUET) will estimates be biased towards parquet if there are multiple formats? does it even matter if it does? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1443 PS6, Line 1443: estimates nit: repeated word http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1444 PS6, Line 1444: In that case the reservation : * may still be usable for a different column. you mean in the case where we underestimate it for another column but try to reserve more during the scanner reservation increase phase in the backend? http://gerrit.cloudera.org:8080/#/c/9757/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1465 PS6, Line 1465: encodedDataSize nit: encodedDataByteSize -- To view, visit http://gerrit.cloudera.org:8080/9757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Gerrit-Change-Number: 9757 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 16 Apr 2018 22:23:47 +0000 Gerrit-HasComments: Yes
