Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9757 )
Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation ...................................................................... Patch Set 8: (15 comments) Addressed comments and fixed some tests. http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG@13 PS6, Line 13: When starting each scan range, check to see how big the initial scan : range is (the full thing for row-based formats, the footer for : Parquet) and determine whether more reservation would be useful. : : For Parquet, base the ideal reservation on the actual column layout : of each file. This avoids reserving memory that we won't use for : the actual files that we're scanning. This also avoid the need to : estimate ideal reservation in the planner. > do we have tests that verify that the reservation is increased to expected Done http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG@43 PS6, Line 43: Added counters to track when threads were not spawned due to reservation : and to track when reservation increases are requested and denied. > add tests to verify that reservations were denied if enough was not availab Done http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.h@502 PS8, Line 502: compute > computed? Done http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.cc@519 PS8, Line 519: small footer range only so we won't request an increase. > is that always true? i.e. is the footer size really bounded? Yeah if the footer turns out to be larger we allocate a larger buffer (with unreserved memory) and create a new scan range. http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.cc@537 PS8, Line 537: reservation_increase_success_counter_ > will this counter be useful? Or does the ideal and actual reservation tell We didn't have actual/ideal for the initial ranges, only parquet. But I think that's more useful so I did away with the increase counter and replaced it with actual/ideal for initial ranges. http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.cc@522 PS8, Line 522: // Check if we could use at least one more max-sized I/O buffer for this range. Don't : // increase in smaller increments since we may not be able to use additional smaller : // buffers. : while (*reservation < ideal_scan_range_reservation) { : // Increase to the next I/O buffer multiple or to the ideal reservation. : int64_t target = min(ideal_scan_range_reservation, : BitUtil::RoundUpToPowerOf2(*reservation + 1, io_mgr->max_buffer_size())); : DCHECK_LT(*reservation, target); : COUNTER_ADD(reservation_increase_request_counter_, 1); : bool increased = buffer_pool_client_.IncreaseReservation(target - *reservation); : VLOG_FILE << "Increasing reservation from " : << PrettyPrinter::PrintBytes(*reservation) << " to " : << PrettyPrinter::PrintBytes(target) << " " : << (increased ? "succeeded" : "failed"); : if (!increased) break; : COUNTER_ADD(reservation_increase_success_counter_, 1); : *reservation = target; : } > is that the same loop as in ScannerContext? If so, maybe factor it into a s Refactored into a DiskIoMgr helper function. http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node.h@189 PS8, Line 189: /// Return true if there is enough reservation to start an extra scanner thread. : /// Tries to increase reservation if enough is not already available in : /// 'spare_reservation_'. 'lock_' must be held via 'lock' : bool EnoughReservationForExtraThread(const boost::unique_lock<boost::mutex>& lock); > now that ReturnReservationFromScannerThread() eagerly decreases the reserva This function was removed as part of the spare_reservation_ cleanup. http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node.h@203 PS8, Line 203: return or all > ? Done http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node.cc@310 PS8, Line 310: // must be available. > now that we return reservation eagerly and we hold the lock, will spare_res The exception was if getting the thread token failed, but that wasn't particularly desirable anyway. I cut out as much code as possible, including spare_reservation_. http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/scanner-context.h@376 PS8, Line 376: reservation' is the amount of reservation that : /// is given to this stream for allocating I/O buffers > how does this reservation relate to total_reservation_? I assume this is a Done http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/scanner-context.cc@78 PS8, Line 78: } > where does the reservation get released? who owns the reservation? I documented this in the comment of this method -seemed like the place it would be easiest to find. http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/util/runtime-profile.h@204 PS8, Line 204: Returns > space before Done http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_mem_usage_scaling.py File tests/query_test/test_mem_usage_scaling.py: http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_mem_usage_scaling.py@133 PS8, Line 133: 48 > that's still a bit higher than on master. I looked into it. This query is special because there were no reservation-aware operators priority to this patch series. The scan uses ~24MB of memory on master. After this change the scan can run with ~17MB of memory (16MB reservation). The regression comes from the RESERVATION_MEM_MIN_REMAINING calculation, which makes sure that there's space for at least 32MB of non-buffer-pool memory in the mem_limit. 32MB + 16MB = 48MB gives the requirement. So queries without reservation-aware operators and low memory requirements can regress up to 32MB, which seems like something we can live with. http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_scanners.py@61 PS8, Line 61: class TestScannersAllTableFormats(ImpalaTestSuite): Dan - we have a debug action dimension for some of these scanner tests to exercise the reservation denial code path. We also get coverage for parquet (which is the most interesting case) via test_spilling. http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_sort.py@174 PS8, Line 174: new_vector = deepcopy(vector) : # Run with num_nodes=1 to make execution more deterministic. : new_vector.get_value('exec_option')['num_nodes'] = 1 > why does this look different than e.g. lines 136-140? Two reasons: * execute_query() above takes exec_option as input instead of the whole vector required by run_test_case() * We're only updating one exec_option so we don't really need the exec_option intermediate variable. -- To view, visit http://gerrit.cloudera.org:8080/9757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Gerrit-Change-Number: 9757 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 23 Apr 2018 21:32:56 +0000 Gerrit-HasComments: Yes
