[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8945 ) Change subject: IMPALA-6364: Bypass file handle cache for ineligible files .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Gerrit-Change-Number: 8945 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jan 2018 21:21:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8945 ) Change subject: IMPALA-6364: Bypass file handle cache for ineligible files .. IMPALA-6364: Bypass file handle cache for ineligible files Currently, all HdfsFileHandles are owned and constructed by the file handle cache. When the file handle cache is disabled or the file handle is not eligible for caching, the HdfsFileHandle is stored exclusively in ScanRange::exclusive_hdfs_fh_, but the HdfsFileHandle still comes from the file handle cache. It is created via a call to DiskIoMgr::GetCachedHdfsFileHandle() with 'require_new_handle' set to true and destroyed via DiskIoMgr::ReleaseCachedHdfsFileHandle() with 'destroy_handle' set to true. Recent testing has revealed that the lock on the file handle cache is a bottleneck for workloads with many small remote files. There is no benefit to storing these exclusive file handles in the file handle cache, as they do not participate in the caching. This change introduces DiskIoMgr::GetExclusiveHdfsFileHandle() and DiskIoMgr::ReleaseExclusiveHdfsFileHandle(). These are equivalent to the Get/ReleaseCachedHdfsFileHandle() calls, except they bypass the file handle cache and create/destroy the file handle directly. ScanRange::Open()/Close(), which populates and frees ScanRange::exclusive_hdfs_fh_, now uses these new calls rather than accessing the file handle cache. This avoids the locking entirely, solving the bottleneck. To draw a distinction between the two codepaths, HdfsFileHandle is now an abstract class with two subclasses: - CachedHdfsFileHandles cover all handles that live in file handle cache. Get/ReleaseCachedHdfsFileHandle() use this subclass. - ExclusiveHdfsFileHandles cover all cases where a file handle does not come from the cache. The new Get/ReleaseExclusiveHdfsFileHandle() use this subclass. Separately, testing revealed that increasing the number of partitions for the file handle cache also fixes the contention problem. This changes the file handle cache to make the number of partitions configurable via startup parameter num_file_handle_cache_partitions. This allows mitigation of future bottlenecks without a patch. Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Reviewed-on: http://gerrit.cloudera.org:8080/8945 Reviewed-by: Joe McDonnellTested-by: Impala Public Jenkins --- M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/handle-cache.h M be/src/runtime/io/handle-cache.inline.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc 6 files changed, 138 insertions(+), 87 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Gerrit-Change-Number: 8945 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8945 ) Change subject: IMPALA-6364: Bypass file handle cache for ineligible files .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1679/ -- To view, visit http://gerrit.cloudera.org:8080/8945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Gerrit-Change-Number: 8945 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jan 2018 17:33:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8945 ) Change subject: IMPALA-6364: Bypass file handle cache for ineligible files .. Patch Set 3: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/8945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Gerrit-Change-Number: 8945 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jan 2018 17:32:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8945 ) Change subject: IMPALA-6364: Bypass file handle cache for ineligible files .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Gerrit-Change-Number: 8945 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jan 2018 17:16:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files
Mostafa Mokhtar has posted comments on this change. ( http://gerrit.cloudera.org:8080/8945 ) Change subject: IMPALA-6364: Bypass file handle cache for ineligible files .. Patch Set 2: Code-Review+1 Validated the the regression is addressed with this change. -- To view, visit http://gerrit.cloudera.org:8080/8945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Gerrit-Change-Number: 8945 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jan 2018 06:29:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8945 ) Change subject: IMPALA-6364: Bypass file handle cache for ineligible files .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8945/1/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8945/1/be/src/runtime/io/disk-io-mgr.cc@123 PS1, Line 123: 16 > I'm open to thoughts on what a good default would be. The contention we hav Added a TODO to look into this. Since the file handle cache is off by default, this should not delay this change. -- To view, visit http://gerrit.cloudera.org:8080/8945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Gerrit-Change-Number: 8945 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jan 2018 01:26:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8945 to look at the new patch set (#2). Change subject: IMPALA-6364: Bypass file handle cache for ineligible files .. IMPALA-6364: Bypass file handle cache for ineligible files Currently, all HdfsFileHandles are owned and constructed by the file handle cache. When the file handle cache is disabled or the file handle is not eligible for caching, the HdfsFileHandle is stored exclusively in ScanRange::exclusive_hdfs_fh_, but the HdfsFileHandle still comes from the file handle cache. It is created via a call to DiskIoMgr::GetCachedHdfsFileHandle() with 'require_new_handle' set to true and destroyed via DiskIoMgr::ReleaseCachedHdfsFileHandle() with 'destroy_handle' set to true. Recent testing has revealed that the lock on the file handle cache is a bottleneck for workloads with many small remote files. There is no benefit to storing these exclusive file handles in the file handle cache, as they do not participate in the caching. This change introduces DiskIoMgr::GetExclusiveHdfsFileHandle() and DiskIoMgr::ReleaseExclusiveHdfsFileHandle(). These are equivalent to the Get/ReleaseCachedHdfsFileHandle() calls, except they bypass the file handle cache and create/destroy the file handle directly. ScanRange::Open()/Close(), which populates and frees ScanRange::exclusive_hdfs_fh_, now uses these new calls rather than accessing the file handle cache. This avoids the locking entirely, solving the bottleneck. To draw a distinction between the two codepaths, HdfsFileHandle is now an abstract class with two subclasses: - CachedHdfsFileHandles cover all handles that live in file handle cache. Get/ReleaseCachedHdfsFileHandle() use this subclass. - ExclusiveHdfsFileHandles cover all cases where a file handle does not come from the cache. The new Get/ReleaseExclusiveHdfsFileHandle() use this subclass. Separately, testing revealed that increasing the number of partitions for the file handle cache also fixes the contention problem. This changes the file handle cache to make the number of partitions configurable via startup parameter num_file_handle_cache_partitions. This allows mitigation of future bottlenecks without a patch. Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f --- M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/handle-cache.h M be/src/runtime/io/handle-cache.inline.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc 6 files changed, 138 insertions(+), 87 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/8945/2 -- To view, visit http://gerrit.cloudera.org:8080/8945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Gerrit-Change-Number: 8945 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8945 ) Change subject: IMPALA-6364: Bypass file handle cache for ineligible files .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8945/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8945/1//COMMIT_MSG@43 PS1, Line 43: Separately, testing revealed that increasing the number of > Have you run any tests on systems where we'd exercise the exclusive path, e I haven't run on s3. The exclusive path is also used when file handle cache is off, so all of our tests currently run through it. I have an exhaustive build running. We are also planning on verifying the performance on a cluster that does remote reads. I think that can also include verifying remote read performance even with the file handle cache on. -- To view, visit http://gerrit.cloudera.org:8080/8945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Gerrit-Change-Number: 8945 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jan 2018 01:21:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8945 ) Change subject: IMPALA-6364: Bypass file handle cache for ineligible files .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8945/1/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8945/1/be/src/runtime/io/disk-io-mgr.cc@123 PS1, Line 123: 16 I'm open to thoughts on what a good default would be. The contention we have been seeing is mainly from the codepath where the file handle cache is off (remote files) and that will no longer care about this, but it might make sense to bump this just in case. -- To view, visit http://gerrit.cloudera.org:8080/8945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Gerrit-Change-Number: 8945 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jan 2018 00:02:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8945 Change subject: IMPALA-6364: Bypass file handle cache for ineligible files .. IMPALA-6364: Bypass file handle cache for ineligible files Currently, all HdfsFileHandles are owned and constructed by the file handle cache. When the file handle cache is disabled or the file handle is not eligible for caching, the HdfsFileHandle is stored exclusively in ScanRange::exclusive_hdfs_fh_, but the HdfsFileHandle still comes from the file handle cache. It is created via a call to DiskIoMgr::GetCachedHdfsFileHandle() with 'require_new_handle' set to true and destroyed via DiskIoMgr::ReleaseCachedHdfsFileHandle() with 'destroy_handle' set to true. Recent testing has revealed that the lock on the file handle cache is a bottleneck for workloads with many small remote files. There is no benefit to storing these exclusive file handles in the file handle cache, as they do not participate in the caching. This change introduces DiskIoMgr::GetExclusiveHdfsFileHandle() and DiskIoMgr::ReleaseExclusiveHdfsFileHandle(). These are equivalent to the Get/ReleaseCachedHdfsFileHandle() calls, except they bypass the file handle cache and create/destroy the file handle directly. ScanRange::Open()/Close(), which populates and frees ScanRange::exclusive_hdfs_fh_, now uses these new calls rather than accessing the file handle cache. This avoids the locking entirely, solving the bottleneck. To draw a distinction between the two codepaths, HdfsFileHandle is now an abstract class with two subclasses: - CachedHdfsFileHandles cover all handles that live in file handle cache. Get/ReleaseCachedHdfsFileHandle() use this subclass. - ExclusiveHdfsFileHandles cover all cases where a file handle does not come from the cache. The new Get/ReleaseExclusiveHdfsFileHandle() use this subclass. Separately, testing revealed that increasing the number of partitions for the file handle cache also fixes the contention problem. This changes the file handle cache to make the number of partitions configurable via startup parameter num_file_handle_cache_partitions. This allows mitigation of future bottlenecks without a patch. Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f --- M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/handle-cache.h M be/src/runtime/io/handle-cache.inline.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc 6 files changed, 137 insertions(+), 87 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/8945/1 -- To view, visit http://gerrit.cloudera.org:8080/8945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f Gerrit-Change-Number: 8945 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell