Joe McDonnell has uploaded a new change for review. http://gerrit.cloudera.org:8080/6478
Change subject: IMPALA-4623: Thread level file handle caching ...................................................................... IMPALA-4623: Thread level file handle caching Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Open the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Introduce a thread file handle cache to share file handles between scan ranges For thread file handle caching, the scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from its cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads are now done with hdfsPread. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. Thread file handle caching is not used for local non-Hdfs files. Scan ranges that are accessing data cached by Hdfs are done in the scanner threads and do not use thread file handle caching. Instead, they use the existing global file handle cache. These maintain a file handle per scan range as before. When Impala starts up with max_cached_file_handles=N, the global cache is given 50% of the allowed file handles. The other 50% is split evenly between all io threads. TODO: 1. Determine appropriate defaults. 2. Maintain appropriate metrics. 3. Write tests 4. For scan ranages that use Hdfs caching, should there be some sharing at the scanner level? Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h 3 files changed, 216 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/1 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell <[email protected]>
