Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5378: Disk IO manager needs to understand ADLS ......................................................................
Patch Set 1: (1 comment) Ah, regarding runtime/hdfs-fs-cache, we did some plumbing for passing the keys through libHDFS if users didn't want to set it in core-site.xml. We're not planning to do it for ADLS unless there's a big ask for it. Also, there is an easier alternative by using the Hadoop encrypted credential store which should land soon for Hadoop AdlFileSystem. The above work was done for S3 in hdfs-fs-cache before there was a plan for this credential store for S3. http://gerrit.cloudera.org:8080/#/c/7033/1/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 402: // ADLS uses buffer sizes of 4k. Given that, and the above JNI array allocation overhead > but we'd still truncate to the actual length of the column's data pages in Yes, it would cut a buffer at 4MB or a flush, whatever comes first. We'd want to optimize for the more likely case. Is it safe to say that in most cases we'd have data pages > 4MB ? Regarding requiring more CPU, this was found while settling on the read chunk size for S3. The comment above (L392-L397) explains the overhead. Sounds good, I'll convert it to a flag. -- To view, visit http://gerrit.cloudera.org:8080/7033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I067f053fec941e3631610c5cc89a384f257ba906 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
