Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9757 )

Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation
......................................................................


Patch Set 8:

(15 comments)

Addressed comments and fixed some tests.

http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG@13
PS6, Line 13: When starting each scan range, check to see how big the initial 
scan
            : range is (the full thing for row-based formats, the footer for
            : Parquet) and determine whether more reservation would be useful.
            :
            : For Parquet, base the ideal reservation on the actual column 
layout
            : of each file. This avoids reserving memory that we won't use for
            : the actual files that we're scanning. This also avoid the need to
            : estimate ideal reservation in the planner.
> do we have tests that verify that the reservation is increased to expected
Done


http://gerrit.cloudera.org:8080/#/c/9757/6//COMMIT_MSG@43
PS6, Line 43: Added counters to track when threads were not spawned due to 
reservation
            : and to track when reservation increases are requested and denied.
> add tests to verify that reservations were denied if enough was not availab
Done


http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.h@502
PS8, Line 502: compute
> computed?
Done


http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.cc@519
PS8, Line 519: small footer range only so we won't request an increase.
> is that always true? i.e. is the footer size really bounded?
Yeah if the footer turns out to be larger we allocate a larger buffer (with 
unreserved memory) and create a new scan range.


http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.cc@537
PS8, Line 537: reservation_increase_success_counter_
> will this counter be useful? Or does the ideal and actual reservation tell
We didn't have actual/ideal for the initial ranges, only parquet. But I think 
that's more useful so I did away with the increase counter and replaced it with 
actual/ideal for initial ranges.


http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node-base.cc@522
PS8, Line 522:     // Check if we could use at least one more max-sized I/O 
buffer for this range. Don't
             :     // increase in smaller increments since we may not be able 
to use additional smaller
             :     // buffers.
             :     while (*reservation < ideal_scan_range_reservation) {
             :       // Increase to the next I/O buffer multiple or to the 
ideal reservation.
             :       int64_t target = min(ideal_scan_range_reservation,
             :           BitUtil::RoundUpToPowerOf2(*reservation + 1, 
io_mgr->max_buffer_size()));
             :       DCHECK_LT(*reservation, target);
             :       COUNTER_ADD(reservation_increase_request_counter_, 1);
             :       bool increased = 
buffer_pool_client_.IncreaseReservation(target - *reservation);
             :       VLOG_FILE << "Increasing reservation from "
             :                 << PrettyPrinter::PrintBytes(*reservation) << " 
to "
             :                 << PrettyPrinter::PrintBytes(target) << " "
             :                 << (increased ? "succeeded" : "failed");
             :       if (!increased) break;
             :       COUNTER_ADD(reservation_increase_success_counter_, 1);
             :       *reservation = target;
             :     }
> is that the same loop as in ScannerContext? If so, maybe factor it into a s
Refactored into a DiskIoMgr helper function.


http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node.h@189
PS8, Line 189:   /// Return true if there is enough reservation to start an 
extra scanner thread.
             :   /// Tries to increase reservation if enough is not already 
available in
             :   /// 'spare_reservation_'. 'lock_' must be held via 'lock'
             :   bool EnoughReservationForExtraThread(const 
boost::unique_lock<boost::mutex>& lock);
> now that ReturnReservationFromScannerThread() eagerly decreases the reserva
This function was removed as part of the spare_reservation_ cleanup.


http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node.h@203
PS8, Line 203: return or all
> ?
Done


http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/hdfs-scan-node.cc@310
PS8, Line 310:     // must be available.
> now that we return reservation eagerly and we hold the lock, will spare_res
The exception was if getting the thread token failed, but that wasn't 
particularly desirable anyway. I cut out as much code as possible, including 
spare_reservation_.


http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/scanner-context.h@376
PS8, Line 376: reservation' is the amount of reservation that
             :   /// is given to this stream for allocating I/O buffers
> how does this reservation relate to total_reservation_? I assume this is a
Done


http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/exec/scanner-context.cc@78
PS8, Line 78: }
> where does the reservation get released? who owns the reservation?
I documented this in the comment of this method -seemed like the place it would 
be easiest to find.


http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/9757/8/be/src/util/runtime-profile.h@204
PS8, Line 204: Returns
> space before
Done


http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_mem_usage_scaling.py
File tests/query_test/test_mem_usage_scaling.py:

http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_mem_usage_scaling.py@133
PS8, Line 133: 48
> that's still a bit higher than on master.
I looked into it. This query is special because there were no reservation-aware 
operators priority to this patch series. The scan uses ~24MB of memory on 
master. After this change the scan can run with ~17MB of memory (16MB 
reservation). The regression comes from the RESERVATION_MEM_MIN_REMAINING 
calculation, which makes sure that there's space for at least 32MB of 
non-buffer-pool memory in the mem_limit. 32MB + 16MB = 48MB gives the 
requirement.

So queries without reservation-aware operators and low memory requirements can 
regress up to 32MB, which seems like something we can live with.


http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_scanners.py@61
PS8, Line 61: class TestScannersAllTableFormats(ImpalaTestSuite):
Dan - we have a debug action dimension for some of these scanner tests to 
exercise the reservation denial code path. We also get coverage for parquet 
(which is the most interesting case) via test_spilling.


http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/9757/8/tests/query_test/test_sort.py@174
PS8, Line 174:     new_vector = deepcopy(vector)
             :     # Run with num_nodes=1 to make execution more deterministic.
             :     new_vector.get_value('exec_option')['num_nodes'] = 1
> why does this look different than e.g. lines 136-140?
Two reasons:
* execute_query() above takes exec_option as input instead of the whole vector 
required by run_test_case()
* We're only updating one exec_option so we don't really need the exec_option 
intermediate variable.



--
To view, visit http://gerrit.cloudera.org:8080/9757
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Gerrit-Change-Number: 9757
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Mon, 23 Apr 2018 21:32:56 +0000
Gerrit-HasComments: Yes

Reply via email to