[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront

Tim Armstrong (Code Review) Tue, 13 Feb 2018 12:03:36 -0800

Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8707 )


Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront
......................................................................


Patch Set 24:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8707/24/be/src/runtime/io/disk-io-mgr.h
File be/src/runtime/io/disk-io-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8707/24/be/src/runtime/io/disk-io-mgr.h@308
PS24, Line 308:       int64_t max_bytes);
> But given that a scanner will be scanning multiple scan ranges, don't we ha
To guarantee progress, the scanner definitely has to have enough reservation to 
allocate at least one buffer. Beyond that, it could decide to give more or less 
additional memory to a given scan range. My current implementation in Part 3 
implements the simple policy of setting aside the exact same reservation per 
range and doesn't try to do anything with unused reservation in cases where we 
have small or cached ranges. I originally was going to do something a bit more 
adaptive where scanner threads could "give up" reservation they didn't need for 
a given input split but decided to simplify that for now.

I think the dilemma for me is narrowing the interface by exploiting our 
knowledge that it's used in one specific way in the HdfsScanNode/HdfsScanner 
layer versus implementing a slightly more general interface where I can explain 
the logic behind the interface without reference to the higher layers.

Anyway, if you feel strongly I can combine them, it wouldn't be that hard to 
restore the generality later.

We could definitely move the allocation of buffers to be done on-demand in the 
DiskIoMgr. We'd still have the same logic for deciding when to block/unblock 
the ranges, but it would just be an accounting mechanism, rather than a queue 
of physical buffers. I think the main benefit of the upfront allocation 
approach is that it's more deterministic - we always allocate the same amount 
and it's always done in the scanner thread. I.e. if we mess up the reservation 
accounting, it will fail more deterministically instead of 
non-deterministically depending on how fast scanner and I/O mgr threads make 
progress.



-- 
To view, visit http://gerrit.cloudera.org:8080/8707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f
Gerrit-Change-Number: 8707
Gerrit-PatchSet: 24
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 13 Feb 2018 20:03:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront

Reply via email to