Bikramjeet Vig has posted comments on this change. ( )

Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation

Patch Set 6:


> This isn't ready for review yet, but since you were interested.
 > I've been able to confirm that the mem_usage_scaling regressions
 > were purely because of reserving more memory than needed. I haven't
 > investigated if these changes are sufficient to solve the stress
 > test regressions.

did you get time to investigate how this affects the stress test regression?
Commit Message:
PS6, Line 13: When starting each scan range, check to see how big the initial 
            : range is (the full thing for row-based formats, the footer for
            : Parquet) and determine whether more reservation would be useful.
            : For Parquet, check the actual column layout of the file before
            : increasing reservation to avoid getting reservation that we
            : can't actually use for the files that we're scanning. This also
            : means we can avoid estimating an "ideal" reservation in the 
do we have tests that verify that the reservation is increased to expected 
levels after a low planner estimate? both for parquet and other formats
PS6, Line 43: Added counters to track when threads were not spawned due to 
            : and to track when reservation increases are requested and denied.
add tests to verify that reservations were denied if enough was not available
File be/src/exec/
PS6, Line 45: // Should round up to nearest power-of-two buffer size if < max 
scan range buffer
maybe add a case that is somewhere in between min and max buffer
File be/src/exec/
PS6, Line 1699: ideal_reservation += min(BitUtil::RoundUpToPowerOf2(len, 
              :           DiskIoMgr::IDEAL_MAX_SIZED_BUFFERS_PER_SCAN_RANGE * 
maybe mention here briefly why we increment in steps of max_buffer_size if 
len>max_buffer_size. Similar to description in line 1750 -1753
File be/src/exec/
PS6, Line 527: int64_t target = BitUtil::RoundUpToPowerOf2(*reservation, 
             :       if (*reservation == target) {
             :         // If it's already an I/O buffer multiple, increase by 
an I/O buffer or to ideal.
             :         target = min(*reservation + max_buffer_size, 
             :       }
how about
int64_t target = BitUtil::RoundUpToPowerOf2( *reservation + 1, max_buffer_size);

this will always result an increase to an I/O buffer multiple upto the 
ideal_scan_range_reservation (since *reservation < ideal_scan_range_reservation)
File be/src/exec/hdfs-scan-node.h:
PS6, Line 178:   /// this thread with DeductReservationForScannerThread().
maybe mention that it releases Reservation and only holds on to minimum 
PS6, Line 204: or
nit: return all

maybe mention that it holds on to minimum reservation
File be/src/exec/
PS6, Line 226: spare_reservation_
what if between line 225 and 226, some scanner thread releases its reservation, 
which results in spare reservation being more than 
resource_profile_.min_reservation and 'increase' being negative. does that mean 
that buffer_pool_client_.IncreaseReservation would actually let go of 
reservation? If yes, is that the intended behaviour?
PS6, Line 392: // Take a snapshot of num_unqueued_files_ before calling 
             :     // We don't want num_unqueued_files_ to go to zero between 
the return from
             :     // GetNextUnstartedRange() and the check for when all ranges 
are complete.
update comment: replace GetNextUnstartedRange with StartNextScanRange.

Here and at line 434
PS6, Line 481: *scanner_thread_reservation, partition, filter_ctxs, 
             :   context.AddStream(scan_range, *scanner_thread_reservation);
reservation passed twice. can we remove redundancy by using 
context->total_reservation_ inside  AddStream()?
PS6, Line 518: // Reservation may have been increased by the scanner.
maybe mention, that this happens only in case of column formats like parquet 
File be/src/exec/scanner-context.h:
PS6, Line 379: o
nit: to allow
File fe/src/main/java/org/apache/impala/planner/
PS6, Line 150:   // Default reservation for a backend scan range for a column 
in columnar formats like
should we mention the units just to ward off any ambiguity?
PS6, Line 154: // TODO: is it worth making this a tunable query option?
not a huge fan of adding more knobs that the user doesnt know how to tune. If 
the purpose of providing this as a query option is to have a workaround, what 
kindof scenario do you think this would be useful in? a lot of small files 
without stats? would a better solution be to calculate stats instead?
nit: how about PARQUET_FOOTER_READ_SIZE, just to make it explicit
PS6, Line 752: maxScanRangeBytes_
confusing at first when you look at this and maxScanRangeLength,
maybe rename maxScanRangeLength to maxScanRangeLengthLimit
rename maxScanRangeLength to maxScanRangeLengthUsed
one is enforced limit and other is the actual max length used.
PS6, Line 1264: fileFormats_.contains(HdfsFileFormat.PARQUET)
will estimates be biased towards parquet if there are multiple formats?
does it even matter if it does?
PS6, Line 1443: estimates
nit: repeated word
PS6, Line 1444: In that case the reservation
              :    * may still be usable for a different column.
you mean in the case where we underestimate it for another column but try to 
reserve more during the scanner reservation increase phase in the backend?
PS6, Line 1465: encodedDataSize
nit: encodedDataByteSize

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Gerrit-Change-Number: 9757
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Bikramjeet Vig <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Comment-Date: Mon, 16 Apr 2018 22:23:47 +0000
Gerrit-HasComments: Yes

Reply via email to