Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4623: Thread level file handle caching
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6478/1/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

Line 233:   /// This is a single-threaded LRU cache for Hdfs file handles. The 
cache creates and
High-level observation: my understanding is that this changes the upper bound 
on # of open handles per file from (# of open scan ranges) to (# of I/O threads 
that read from that file)

Sometimes the second number can be higher if there are many threads servicing a 
queue (e.g. S3, remote reads, a local SSD, a non-standard config.

E.g. suppose you create a single scan range for reading a text file from S3, 
and it gets scheduled at some point onto each of the S3 I/O threads, then we 
might end up with 16 open files rather than 1.

If we had a shared cache between all the threads servicing a disk queue, then I 
think we'd get min(# of open scan ranges, # of parallel reads from the file), 
which is a strict improvement over the current approach.


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to