[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 21:21:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..

IMPALA-6364: Bypass file handle cache for ineligible files

Currently, all HdfsFileHandles are owned and constructed
by the file handle cache. When the file handle cache
is disabled or the file handle is not eligible for
caching, the HdfsFileHandle is stored exclusively in
ScanRange::exclusive_hdfs_fh_, but the HdfsFileHandle still
comes from the file handle cache. It is created via a call to
DiskIoMgr::GetCachedHdfsFileHandle() with 'require_new_handle'
set to true and destroyed via
DiskIoMgr::ReleaseCachedHdfsFileHandle() with 'destroy_handle'
set to true.

Recent testing has revealed that the lock on the file handle
cache is a bottleneck for workloads with many small remote
files. There is no benefit to storing these exclusive file
handles in the file handle cache, as they do not participate
in the caching.

This change introduces DiskIoMgr::GetExclusiveHdfsFileHandle()
and DiskIoMgr::ReleaseExclusiveHdfsFileHandle(). These are
equivalent to the Get/ReleaseCachedHdfsFileHandle() calls, except
they bypass the file handle cache and create/destroy the
file handle directly. ScanRange::Open()/Close(), which
populates and frees ScanRange::exclusive_hdfs_fh_, now uses
these new calls rather than accessing the file handle cache.
This avoids the locking entirely, solving the bottleneck.

To draw a distinction between the two codepaths, HdfsFileHandle
is now an abstract class with two subclasses:
 - CachedHdfsFileHandles cover all handles that live in file handle
   cache. Get/ReleaseCachedHdfsFileHandle() use this subclass.
 - ExclusiveHdfsFileHandles cover all cases where a file handle
   does not come from the cache. The new
   Get/ReleaseExclusiveHdfsFileHandle() use this subclass.

Separately, testing revealed that increasing the number of
partitions for the file handle cache also fixes the contention
problem. This changes the file handle cache to make the number
of partitions configurable via startup parameter
num_file_handle_cache_partitions. This allows mitigation of
future bottlenecks without a patch.

Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Reviewed-on: http://gerrit.cloudera.org:8080/8945
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins
---
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/handle-cache.h
M be/src/runtime/io/handle-cache.inline.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
6 files changed, 138 insertions(+), 87 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1679/


--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 17:33:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-05 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 3: Code-Review+2

Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 17:32:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 17:16:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-04 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 2: Code-Review+1

Validated the the regression is addressed with this change.


--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 06:29:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-04 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8945/1/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8945/1/be/src/runtime/io/disk-io-mgr.cc@123
PS1, Line 123:  16
> I'm open to thoughts on what a good default would be. The contention we hav
Added a TODO to look into this. Since the file handle cache is off by default, 
this should not delay this change.



--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 01:26:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-04 Thread Joe McDonnell (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8945

to look at the new patch set (#2).

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..

IMPALA-6364: Bypass file handle cache for ineligible files

Currently, all HdfsFileHandles are owned and constructed
by the file handle cache. When the file handle cache
is disabled or the file handle is not eligible for
caching, the HdfsFileHandle is stored exclusively in
ScanRange::exclusive_hdfs_fh_, but the HdfsFileHandle still
comes from the file handle cache. It is created via a call to
DiskIoMgr::GetCachedHdfsFileHandle() with 'require_new_handle'
set to true and destroyed via
DiskIoMgr::ReleaseCachedHdfsFileHandle() with 'destroy_handle'
set to true.

Recent testing has revealed that the lock on the file handle
cache is a bottleneck for workloads with many small remote
files. There is no benefit to storing these exclusive file
handles in the file handle cache, as they do not participate
in the caching.

This change introduces DiskIoMgr::GetExclusiveHdfsFileHandle()
and DiskIoMgr::ReleaseExclusiveHdfsFileHandle(). These are
equivalent to the Get/ReleaseCachedHdfsFileHandle() calls, except
they bypass the file handle cache and create/destroy the
file handle directly. ScanRange::Open()/Close(), which
populates and frees ScanRange::exclusive_hdfs_fh_, now uses
these new calls rather than accessing the file handle cache.
This avoids the locking entirely, solving the bottleneck.

To draw a distinction between the two codepaths, HdfsFileHandle
is now an abstract class with two subclasses:
 - CachedHdfsFileHandles cover all handles that live in file handle
   cache. Get/ReleaseCachedHdfsFileHandle() use this subclass.
 - ExclusiveHdfsFileHandles cover all cases where a file handle
   does not come from the cache. The new
   Get/ReleaseExclusiveHdfsFileHandle() use this subclass.

Separately, testing revealed that increasing the number of
partitions for the file handle cache also fixes the contention
problem. This changes the file handle cache to make the number
of partitions configurable via startup parameter
num_file_handle_cache_partitions. This allows mitigation of
future bottlenecks without a patch.

Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
---
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/handle-cache.h
M be/src/runtime/io/handle-cache.inline.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
6 files changed, 138 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/8945/2
--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-04 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8945/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8945/1//COMMIT_MSG@43
PS1, Line 43: Separately, testing revealed that increasing the number of
> Have you run any tests on systems where we'd exercise the exclusive path, e
I haven't run on s3. The exclusive path is also used when file handle cache is 
off, so all of our tests currently run through it. I have an exhaustive build 
running.

We are also planning on verifying the performance on a cluster that does remote 
reads. I think that can also include verifying remote read performance even 
with the file handle cache on.



--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 01:21:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-04 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8945 )

Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8945/1/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8945/1/be/src/runtime/io/disk-io-mgr.cc@123
PS1, Line 123:  16
I'm open to thoughts on what a good default would be. The contention we have 
been seeing is mainly from the codepath where the file handle cache is off 
(remote files) and that will no longer care about this, but it might make sense 
to bump this just in case.



--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jan 2018 00:02:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6364: Bypass file handle cache for ineligible files

2018-01-04 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8945


Change subject: IMPALA-6364: Bypass file handle cache for ineligible files
..

IMPALA-6364: Bypass file handle cache for ineligible files

Currently, all HdfsFileHandles are owned and constructed
by the file handle cache. When the file handle cache
is disabled or the file handle is not eligible for
caching, the HdfsFileHandle is stored exclusively in
ScanRange::exclusive_hdfs_fh_, but the HdfsFileHandle still
comes from the file handle cache. It is created via a call to
DiskIoMgr::GetCachedHdfsFileHandle() with 'require_new_handle'
set to true and destroyed via
DiskIoMgr::ReleaseCachedHdfsFileHandle() with 'destroy_handle'
set to true.

Recent testing has revealed that the lock on the file handle
cache is a bottleneck for workloads with many small remote
files. There is no benefit to storing these exclusive file
handles in the file handle cache, as they do not participate
in the caching.

This change introduces DiskIoMgr::GetExclusiveHdfsFileHandle()
and DiskIoMgr::ReleaseExclusiveHdfsFileHandle(). These are
equivalent to the Get/ReleaseCachedHdfsFileHandle() calls, except
they bypass the file handle cache and create/destroy the
file handle directly. ScanRange::Open()/Close(), which
populates and frees ScanRange::exclusive_hdfs_fh_, now uses
these new calls rather than accessing the file handle cache.
This avoids the locking entirely, solving the bottleneck.

To draw a distinction between the two codepaths, HdfsFileHandle
is now an abstract class with two subclasses:
 - CachedHdfsFileHandles cover all handles that live in file handle
   cache. Get/ReleaseCachedHdfsFileHandle() use this subclass.
 - ExclusiveHdfsFileHandles cover all cases where a file handle
   does not come from the cache. The new
   Get/ReleaseExclusiveHdfsFileHandle() use this subclass.

Separately, testing revealed that increasing the number of
partitions for the file handle cache also fixes the contention
problem. This changes the file handle cache to make the number
of partitions configurable via startup parameter
num_file_handle_cache_partitions. This allows mitigation of
future bottlenecks without a patch.

Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
---
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/handle-cache.h
M be/src/runtime/io/handle-cache.inline.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
6 files changed, 137 insertions(+), 87 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/8945/1
--
To view, visit http://gerrit.cloudera.org:8080/8945
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
Gerrit-Change-Number: 8945
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell