Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5378: Disk IO manager needs to understand ADLS
......................................................................


Patch Set 1:

(1 comment)

Ah, regarding runtime/hdfs-fs-cache, we did some plumbing for passing the keys 
through libHDFS if users didn't want to set it in core-site.xml.

We're not planning to do it for ADLS unless there's a big ask for it. Also, 
there is an easier alternative by using the Hadoop encrypted credential store 
which should land soon for Hadoop AdlFileSystem. The above work was done for S3 
in hdfs-fs-cache before there was a plan for this credential store for S3.

http://gerrit.cloudera.org:8080/#/c/7033/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 402:   // ADLS uses buffer sizes of 4k. Given that, and the above JNI 
array allocation overhead
> but we'd still truncate to the actual length of the column's data pages in 
Yes, it would cut a buffer at 4MB or a flush, whatever comes first. We'd want 
to optimize for the more likely case. Is it safe to say that in most cases we'd 
have data pages > 4MB ?

Regarding requiring more CPU, this was found while settling on the read chunk 
size for S3. The comment above (L392-L397) explains the overhead.

Sounds good, I'll convert it to a flag.


-- 
To view, visit http://gerrit.cloudera.org:8080/7033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I067f053fec941e3631610c5cc89a384f257ba906
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to