Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9757 )
Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation ...................................................................... Patch Set 10: (9 comments) http://gerrit.cloudera.org:8080/#/c/9757/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9757/10//COMMIT_MSG@42 PS10, Line 42: Observability: > Is there something simple to be said about having stats vs. not having stat I extended the commit message to address this. The heuristics are pretty conservative so it's unlikely that they'll grossly underestimate things unless the stats are actually wrong. If the query is memory-constrained (the only case where a runtime memory increase beyond the planner estimate doesn't solve the problem), then the best way to improve query performance is by giving it more memory. Also, if running without stats and the query is memory-constrained, it's reasonably likely the memory for a complex query will be wasted elsewhere because of over-reservation for joins, aggs, scans, etc, so generally we'd expect computing stats to increase the amount of memory available to operators that need it. http://gerrit.cloudera.org:8080/#/c/9757/10/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/9757/10/be/src/exec/hdfs-parquet-scanner.cc@1519 PS10, Line 1519: min_buffer_size, max_buffer_size > nit: if this is always going to be the min/max buffer_size taken from io_mg It makes the test in HdfsParquetScannerTest a bit easier to follow and also avoids us getting the buffer sizes from DiskIoMgr in this function and the caller, so removing the input parameters didn't seem to be a net simplification. http://gerrit.cloudera.org:8080/#/c/9757/10/be/src/exec/hdfs-parquet-scanner.cc@1526 PS10, Line 1526: int64_t HdfsParquetScanner::ComputeIdealReservation( : const vector<int64_t>& col_range_lengths) { : DiskIoMgr* io_mgr = ExecEnv::GetInstance()->disk_io_mgr(); : int64_t ideal_reservation = 0; : for (int64_t len : col_range_lengths) { : ideal_reservation += io_mgr->ComputeIdealBufferReservation(len); : } : return ideal_reservation; : } > nit: now that we have factored out the logic to io_mgr and made this method I factored it in this way so it could be tested in isolation by HdfsParquetScannerTest. http://gerrit.cloudera.org:8080/#/c/9757/10/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/9757/10/be/src/exec/hdfs-scan-node-base.cc@538 PS10, Line 538: } > I think you forgot to refactor this loop? or decided it wasn't worth it? Ah I got mixed up with what I had done and not done - sorry about that. http://gerrit.cloudera.org:8080/#/c/9757/10/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: http://gerrit.cloudera.org:8080/#/c/9757/10/be/src/exec/hdfs-scan-node.h@67 PS10, Line 67: IncreaseReservationForExtraThread() > outdated comment Done - good catch http://gerrit.cloudera.org:8080/#/c/9757/10/be/src/exec/hdfs-scan-node.h@67 PS10, Line 67: scanner thread is responsible : /// for staying within the reservation handed off to it. > mention that the scanner thread can scale up reservation in case of columna Done http://gerrit.cloudera.org:8080/#/c/9757/10/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/9757/10/be/src/exec/hdfs-scan-node.cc@228 PS10, Line 228: buffer_pool_client_.GetReservation() > curr_reservation Done http://gerrit.cloudera.org:8080/#/c/9757/10/be/src/exec/hdfs-scan-node.cc@276 PS10, Line 276: // The node's min reservation is for the first thread so we don't need to check > nit: maybe move this comment down a line Done http://gerrit.cloudera.org:8080/#/c/9757/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/9757/10/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@781 PS10, Line 781: partitionBytes += fileDesc.getFileLength(); > nit: can move this to the same loop at L786 if you feel it dosent impact co I remembered why I did this. We use partitionBytes in the call to updateMaxScanRangeNumRows() below, so we need to compute it before the first iteration of the loop. I factored out the function and made the variable final to document the intent a bit better. -- To view, visit http://gerrit.cloudera.org:8080/9757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Gerrit-Change-Number: 9757 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 26 Apr 2018 00:49:51 +0000 Gerrit-HasComments: Yes
