Tim Armstrong has posted comments on this change. Change subject: IMPALA-4623: Thread level file handle caching ......................................................................
Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6478/1/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 233: /// This is a single-threaded LRU cache for Hdfs file handles. The cache creates and High-level observation: my understanding is that this changes the upper bound on # of open handles per file from (# of open scan ranges) to (# of I/O threads that read from that file) Sometimes the second number can be higher if there are many threads servicing a queue (e.g. S3, remote reads, a local SSD, a non-standard config. E.g. suppose you create a single scan range for reading a text file from S3, and it gets scheduled at some point onto each of the S3 I/O threads, then we might end up with 16 open files rather than 1. If we had a shared cache between all the threads servicing a disk queue, then I think we'd get min(# of open scan ranges, # of parallel reads from the file), which is a strict improvement over the current approach. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
