Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5378: Disk IO manager needs to understand ADLS
......................................................................


Patch Set 1:

(2 comments)

> (2 comments)
 > 
 > questions:
 > - what about insert staging for adls (in coordinator.cc?

ADLS claims to have atomic renames. So we don't need to worry about that like 
we did for S3.

 > - what about hdfs-fs-cache, does that need to be extended?

I'm not sure which cache you mean, so I'll address both. The file handle cache 
at this point doesn't support caching remote file handles. Also, we don't 
support SET CACHED for S3 and ADLS at this point.

http://gerrit.cloudera.org:8080/#/c/7033/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 402:   // ADLS uses buffer sizes of 4k. Given that, and the above JNI 
array allocation overhead
> you mean multiples of 4k?
I should have researched this a little better, I used 4k based on some 
misinformation. It looks like the buffer size used is 4MB according to this:
https://docs.microsoft.com/en-us/java/api/com.microsoft.azure.datalake.store._a_d_l_file_input_stream

Also noticed a Hadoop JIRA which mentions better performance with higher buffer 
sizes:
https://issues.apache.org/jira/browse/HADOOP-14407


The pro of using a buffer size of 4M is obviously to be aligned with ADLS and 
avoid fragmentation.

The con however, is that we'd spend considerably more CPU allocating the JNI 
byte buffer and also doing the memcpy.

What do you think would be better to settle for?


http://gerrit.cloudera.org:8080/#/c/7033/1/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

Line 764:   int RemoteADLSDiskId() const { return num_local_disks() + 
REMOTE_ADLS_DISK_OFFSET; }
> RemoteAdlsDiskId
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I067f053fec941e3631610c5cc89a384f257ba906
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to