[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-04-01 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..

IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

This adds two sets of metrics. The first is per-partition metrics
to track the performance of the underlying filesystem for the
data cache. It keeps histograms of read, write, and eviction
latency for each data cache partition along with another metric
recording the path for the partition. These are exposed as the
following metrics:
impala-server.io-mgr.remote-data-cache-partition-$0.path
impala-server.io-mgr.remote-data-cache-partition-$0.read-latency
impala-server.io-mgr.remote-data-cache-partition-$0.write-latency
impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency

This also adds metrics to keep counts of hits, misses, and entries
in the data cache. Since reducing the latency of IO is an important
feature of the data cache, the absolute count of hits and misses
is as important as the hit bytes and miss bytes. This adds the
following metrics:
impala-server.io-mgr.remote-data-cache-hit-count
impala-server.io-mgr.remote-data-cache-miss-count
impala-server.io-mgr.remote-data-cache-num-entries

To track metrics around inserts, this also adds the following
metrics:
impala-server.io-mgr.remote-data-cache-num-inserts
impala-server.io-mgr.remote-data-cache-dropped-entries
impala-server.io-mgr.remote-data-cache-instant-evictions
An instant eviction happens when inserting an entry into the cache
fails and the entry is immediately evicted during insert. This is
currently only possible for LIRS when the entry's size is larger
than the unprotected capacity. This manifests when the cache
size is very small. For example, for an 8MB entry, this would
manifest when a cache shard is smaller than 160MB. This metric
is primarily for debugging.

Testing:
 - Hand testing to verify the per-partition latency histograms
 - Modified custom_cluster/test_data_cache.py to also test
   the counts.

Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Reviewed-on: http://gerrit.cloudera.org:8080/15382
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_data_cache.py
7 files changed, 284 insertions(+), 9 deletions(-)

Approvals:
  Thomas Tauber-Marshall: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 01 Apr 2020 04:47:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 01 Apr 2020 01:23:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5673/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 01 Apr 2020 00:19:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5595/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 01 Apr 2020 00:18:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 31 Mar 2020 23:28:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Joe McDonnell (Code Review)
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15382

to look at the new patch set (#10).

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..

IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

This adds two sets of metrics. The first is per-partition metrics
to track the performance of the underlying filesystem for the
data cache. It keeps histograms of read, write, and eviction
latency for each data cache partition along with another metric
recording the path for the partition. These are exposed as the
following metrics:
impala-server.io-mgr.remote-data-cache-partition-$0.path
impala-server.io-mgr.remote-data-cache-partition-$0.read-latency
impala-server.io-mgr.remote-data-cache-partition-$0.write-latency
impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency

This also adds metrics to keep counts of hits, misses, and entries
in the data cache. Since reducing the latency of IO is an important
feature of the data cache, the absolute count of hits and misses
is as important as the hit bytes and miss bytes. This adds the
following metrics:
impala-server.io-mgr.remote-data-cache-hit-count
impala-server.io-mgr.remote-data-cache-miss-count
impala-server.io-mgr.remote-data-cache-num-entries

To track metrics around inserts, this also adds the following
metrics:
impala-server.io-mgr.remote-data-cache-num-inserts
impala-server.io-mgr.remote-data-cache-dropped-entries
impala-server.io-mgr.remote-data-cache-instant-evictions
An instant eviction happens when inserting an entry into the cache
fails and the entry is immediately evicted during insert. This is
currently only possible for LIRS when the entry's size is larger
than the unprotected capacity. This manifests when the cache
size is very small. For example, for an 8MB entry, this would
manifest when a cache shard is smaller than 160MB. This metric
is primarily for debugging.

Testing:
 - Hand testing to verify the per-partition latency histograms
 - Modified custom_cluster/test_data_cache.py to also test
   the counts.

Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
---
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_data_cache.py
7 files changed, 284 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/10
--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json@567
PS9, Line 567: limit
> Limit
Done


http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json@573
PS9, Line 573: "description": "Total number of instantaneous evictions from 
the remote data cache.",
> I don't think its very obvious from the description what's being counted he
I added an extra sentence here and a more detailed comment in the commit 
message. This metric is a pretty intricate implementation detail.

I will file a Jira about expanding/updating the data caching documentation.



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 31 Mar 2020 23:26:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json@567
PS9, Line 567: limit
Limit


http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json@573
PS9, Line 573: "description": "Total number of instantaneous evictions from 
the remote data cache.",
I don't think its very obvious from the description what's being counted here.

That may be okay, given that its probably hard to describe concisely and this 
is mostly something that regular users wouldn't need to be interested in and is 
for Impala experts to use, but I wonder if it can be expanded somewhat? Or 
maybe it would be helpful to describe it thoroughly in the commit message so 
its written down somewhere?

Do you have plans to add documentation around all this cache stuff, eg. a 
"Tuning Impala's Data Cache" page?



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 31 Mar 2020 22:19:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5670/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 31 Mar 2020 21:07:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 9: Code-Review+1

Rebased, carry +1


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 31 Mar 2020 20:50:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5591/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 31 Mar 2020 20:51:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Joe McDonnell (Code Review)
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15382

to look at the new patch set (#9).

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..

IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

This adds two sets of metrics. The first is per-partition metrics
to track the performance of the underlying filesystem for the
data cache. It keeps histograms of read, write, and eviction
latency for each data cache partition along with another metric
recording the path for the partition. These are exposed as the
following metrics:
impala-server.io-mgr.remote-data-cache-partition-$0.path
impala-server.io-mgr.remote-data-cache-partition-$0.read-latency
impala-server.io-mgr.remote-data-cache-partition-$0.write-latency
impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency

This also adds metrics to keep counts of hits, misses, and entries
in the data cache. Since reducing the latency of IO is an important
feature of the data cache, the absolute count of hits and misses
is as important as the hit bytes and miss bytes. This adds the
following metrics:
impala-server.io-mgr.remote-data-cache-hit-count
impala-server.io-mgr.remote-data-cache-miss-count
impala-server.io-mgr.remote-data-cache-num-entries

To track metrics around inserts, this also adds the following
metrics:
impala-server.io-mgr.remote-data-cache-num-inserts
impala-server.io-mgr.remote-data-cache-dropped-entries
impala-server.io-mgr.remote-data-cache-instant-evictions

Testing:
 - Hand testing to verify the per-partition latency histograms
 - Modified custom_cluster/test_data_cache.py to also test
   the counts.

Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
---
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_data_cache.py
7 files changed, 284 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/9
--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15382/8/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15382/8/be/src/runtime/io/data-cache.cc@700
PS8, Line 700:   bool write_success;
> nit: set to false by default
My style preference is to declare variables without a value when the code is 
going to set it unconditionally.


http://gerrit.cloudera.org:8080/#/c/15382/8/tests/custom_cluster/test_data_cache.py
File tests/custom_cluster/test_data_cache.py:

http://gerrit.cloudera.org:8080/#/c/15382/8/tests/custom_cluster/test_data_cache.py@64
PS8, Line 64: 0
> flake8: E501 line too long (91 > 90 characters)
Done



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 31 Mar 2020 20:49:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-31 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 8: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15382/8/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15382/8/be/src/runtime/io/data-cache.cc@700
PS8, Line 700:   bool write_success;
nit: set to false by default



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 31 Mar 2020 18:19:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5650/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 30 Mar 2020 23:20:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15382/8/tests/custom_cluster/test_data_cache.py
File tests/custom_cluster/test_data_cache.py:

http://gerrit.cloudera.org:8080/#/c/15382/8/tests/custom_cluster/test_data_cache.py@64
PS8, Line 64: 0
flake8: E501 line too long (91 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 30 Mar 2020 22:39:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-30 Thread Joe McDonnell (Code Review)
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15382

to look at the new patch set (#8).

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..

IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

This adds two sets of metrics. The first is per-partition metrics
to track the performance of the underlying filesystem for the
data cache. It keeps histograms of read, write, and eviction
latency for each data cache partition along with another metric
recording the path for the partition. These are exposed as the
following metrics:
impala-server.io-mgr.remote-data-cache-partition-$0.path
impala-server.io-mgr.remote-data-cache-partition-$0.read-latency
impala-server.io-mgr.remote-data-cache-partition-$0.write-latency
impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency

This also adds metrics to keep counts of hits, misses, and entries
in the data cache. Since reducing the latency of IO is an important
feature of the data cache, the absolute count of hits and misses
is as important as the hit bytes and miss bytes. This adds the
following metrics:
impala-server.io-mgr.remote-data-cache-hit-count
impala-server.io-mgr.remote-data-cache-miss-count
impala-server.io-mgr.remote-data-cache-num-entries

To track metrics around inserts, this also adds the following
metrics:
impala-server.io-mgr.remote-data-cache-num-inserts
impala-server.io-mgr.remote-data-cache-dropped-entries
impala-server.io-mgr.remote-data-cache-instant-evictions

Testing:
 - Hand testing to verify the per-partition latency histograms
 - Modified custom_cluster/test_data_cache.py to also test
   the counts.

Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
---
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_data_cache.py
7 files changed, 283 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/8
--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-30 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25
PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count
: impala-server.io-mgr.remote-data-cache-miss-count
: impala-server.io-mgr.remote-data-cache-num-entries
> yeah hit ratio would be useful as well, I assume Prometheus can easily do t
You're right, it makes sense to add an overall write count. It will make 
interacting with these metrics easier. I added that in the next upload.

Hit rate is definitely useful. It's probably most useful over a fixed time 
period (last hour or last 30 minutes or something), so I'm thinking it is a 
good fit to do in prometheus.

I'm thinking the ratio of lookups vs writes probably won't show much specific 
to LIRS. LIRS evicts some low priority entries faster, but everything is 
written to the cache first. LIRS is not an algorithm that rejects things up 
front, so the number of writes would be the same for LIRS as it is for LRU. We 
can try it out when we have data.



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 30 Mar 2020 22:38:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-29 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25
PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count
: impala-server.io-mgr.remote-data-cache-miss-count
: impala-server.io-mgr.remote-data-cache-num-entries
> Yes, I'm thinking that there will be a separate mechanism to consume the co
yeah hit ratio would be useful as well, I assume Prometheus can easily do that 
too, it should just be 'hit-count / (hit-count + miss-count)'. just something 
to keep in mind when exposing this via Grafana.

for LIRS specifically, do we think it is interesting to compare the ratio of 
lookups vs writes? e.g. if you have a high ratio that probably means there is a 
bunch of data that is read only once, and thus isn't cached. might be a good 
way to compare the effectiveness of LRIS vs. LRU.

I think the histogram for write-latency would provide the counts, but that 
would be per-partition; we could probably get a total count by aggregating all 
the per-partition counts in prometheus, but would it be cleaner just and an 
aggregated metric for the counts?



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sun, 29 Mar 2020 19:23:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5630/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 28 Mar 2020 02:21:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5629/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 28 Mar 2020 02:16:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Joe McDonnell (Code Review)
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15382

to look at the new patch set (#6).

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..

IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

This adds two sets of metrics. The first is per-partition metrics
to track the performance of the underlying filesystem for the
data cache. It keeps histograms of read, write, and eviction
latency for each data cache partition along with another metric
recording the path for the partition. These are exposed as the
following metrics:
impala-server.io-mgr.remote-data-cache-partition-$0.path
impala-server.io-mgr.remote-data-cache-partition-$0.read-latency
impala-server.io-mgr.remote-data-cache-partition-$0.write-latency
impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency

This also adds metrics to keep counts of hits, misses, and entries
in the data cache. Since reducing the latency of IO is an important
feature of the data cache, the absolute count of hits and misses
is as important as the hit bytes and miss bytes. This adds the
following metrics:
impala-server.io-mgr.remote-data-cache-hit-count
impala-server.io-mgr.remote-data-cache-miss-count
impala-server.io-mgr.remote-data-cache-num-entries

To track failed inserts, this also adds the following metrics:
impala-server.io-mgr.remote-data-cache-dropped-entries
impala-server.io-mgr.remote-data-cache-failed-inserts

Testing:
 - Hand testing to verify the per-partition latency histograms
 - Modified custom_cluster/test_data_cache.py to also test
   the counts.

Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
---
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_data_cache.py
7 files changed, 257 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/6
--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15382/5/be/src/util/impalad-metrics.cc
File be/src/util/impalad-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/15382/5/be/src/util/impalad-metrics.cc@360
PS5, Line 360: IO_MGR_REMOTE_DATA_CACHE_DROPPED_ENTRIES = 
IO_MGR_METRICS->AddCounter(
Missed this indentation.



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 28 Mar 2020 01:40:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25
PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count
: impala-server.io-mgr.remote-data-cache-miss-count
: impala-server.io-mgr.remote-data-cache-num-entries
> how about hit and miss rates? if we just emit the counts, I guess something
Yes, I'm thinking that there will be a separate mechanism to consume the 
counters and be able to perform rate analysis. Prometheus seems to have this 
capability:
https://www.innoq.com/en/blog/prometheus-counters/

The histograms for writes, reads, and evictions keep counts of the number of 
events. So, we have that information in the WebUI. I'm less certain about how 
this is accessible through prometheus. We embed the count in what we send to 
prometheus:
https://github.com/apache/impala/blob/master/be/src/util/histogram-metric.cc#L131
It sounds like this is exposed as its own metric by prometheus:
https://prometheus.io/docs/practices/histograms/

I think that should provide counts for writes per partition.

Since it is interesting to know when we don't do the inserts, I added two 
metrics in this upload for the two most interesting cases where we don't do the 
insert:

impala-server.io-mgr.remote-data-cache-dropped-entries - this is the counter 
version of impala-server.io-mgr.remote-data-cache-dropped-bytes

impala-server.io-mgr.remote-data-cache-failed-inserts - this tracks where the 
Cache::Insert() call fails.


http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h@257
PS3, Line 257: int32_t index_;
> Maybe note this is used for differentiating metrics. I don't think that it
Good point, these are strictly for naming metrics (and will be used in future 
for the access trace). The index makes no functional difference. Added a 
comment here.


http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@549
PS3, Line 549:   if (!TestInfo::is_test()
> I find the control flow in this whole function kind of confusing. Maybe ins
Good point, reworked the control flow.


http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@876
PS3, Line 876: partition_idx++
> nit: ++partition_idx
Done



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 28 Mar 2020 01:35:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Joe McDonnell (Code Review)
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15382

to look at the new patch set (#5).

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..

IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

This adds two sets of metrics. The first is per-partition metrics
to track the performance of the underlying filesystem for the
data cache. It keeps histograms of read, write, and eviction
latency for each data cache partition along with another metric
recording the path for the partition. These are exposed as the
following metrics:
impala-server.io-mgr.remote-data-cache-partition-$0.path
impala-server.io-mgr.remote-data-cache-partition-$0.read-latency
impala-server.io-mgr.remote-data-cache-partition-$0.write-latency
impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency

This also adds metrics to keep counts of hits, misses, and entries
in the data cache. Since reducing the latency of IO is an important
feature of the data cache, the absolute count of hits and misses
is as important as the hit bytes and miss bytes. This adds the
following metrics:
impala-server.io-mgr.remote-data-cache-hit-count
impala-server.io-mgr.remote-data-cache-miss-count
impala-server.io-mgr.remote-data-cache-num-entries

To track failed inserts, this also adds the following metrics:
impala-server.io-mgr.remote-data-cache-dropped-entries
impala-server.io-mgr.remote-data-cache-failed-inserts

Testing:
 - Hand testing to verify the per-partition latency histograms
 - Modified custom_cluster/test_data_cache.py to also test
   the counts.

Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
---
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_data_cache.py
7 files changed, 257 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/5
--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 3:

(1 comment)

Mostly a high level comment about what metrics to include, code itself looks 
good.

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25
PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count
: impala-server.io-mgr.remote-data-cache-miss-count
: impala-server.io-mgr.remote-data-cache-num-entries
how about hit and miss rates? if we just emit the counts, I guess something 
like Grafana would be able to derive the hit / miss rates, right? is that the 
plan?

not sure what the LLAP-Grafana dashboard looks like, but might be worth taking 
a look at what they have?

for the miss-count, should there be an additional metric that helps 
differentiate between cache misses and cache writes? e.g. for LIRS you can have 
a cache miss, but the data won't necessarily be written to the cache. so maybe 
we can add a 'remote-data-cache-write-count' to track how many times a miss 
actually ends up writing data to the cache?

I stole some ideas from the Guava cache metrics FYI: 
https://guava.dev/releases/22.0/api/docs/com/google/common/cache/CacheStats.html



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 27 Mar 2020 19:59:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h@257
PS3, Line 257: int32_t index_;
Maybe note this is used for differentiating metrics. I don't think that it has 
any other relationship to a meaningful value or property of the cache?


http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@549
PS3, Line 549:   if (!TestInfo::is_test()
I find the control flow in this whole function kind of confusing. Maybe instead 
do something like
if (is_test() && FindMetricForTesting(PATH_METRIC) != nullptr) {
  // Metrics already inited.
  return;
}
and then below you can just have the metrics created like normal without any 
more is_test or == nullptr checks.


http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@876
PS3, Line 876: partition_idx++
nit: ++partition_idx



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 27 Mar 2020 19:52:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5598/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 25 Mar 2020 17:34:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 3:

Just stacked this on top of the latest LIRS


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 25 Mar 2020 16:52:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-25 Thread Joe McDonnell (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15382

to look at the new patch set (#3).

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..

IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

This adds two sets of metrics. The first is per-partition metrics
to track the performance of the underlying filesystem for the
data cache. It keeps histograms of read, write, and eviction
latency for each data cache partition along with another metric
recording the path for the partition. These are exposed as the
following metrics:
impala-server.io-mgr.remote-data-cache-partition-$0.path
impala-server.io-mgr.remote-data-cache-partition-$0.read-latency
impala-server.io-mgr.remote-data-cache-partition-$0.write-latency
impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency

This also adds metrics to keep counts of hits, misses, and entries
in the data cache. Since reducing the latency of IO is an important
feature of the data cache, the absolute count of hits and misses
is as important as the hit bytes and miss bytes. This adds the
following metrics:
impala-server.io-mgr.remote-data-cache-hit-count
impala-server.io-mgr.remote-data-cache-miss-count
impala-server.io-mgr.remote-data-cache-num-entries

Testing:
 - Hand testing to verify the per-partition latency histograms
 - Modified custom_cluster/test_data_cache.py to also test
   the counts.

Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
---
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_data_cache.py
7 files changed, 205 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/3
--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15382/1/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15382/1/be/src/runtime/io/data-cache.cc@567
PS1, Line 567: eviction_latency_ = 
ImpaladMetrics::IO_MGR_METRICS->RegisterMetric(new HistogramMetric(
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 06 Mar 2020 22:06:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-06 Thread Joe McDonnell (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15382

to look at the new patch set (#2).

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..

IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

This adds two sets of metrics. The first is per-partition metrics
to track the performance of the underlying filesystem for the
data cache. It keeps histograms of read, write, and eviction
latency for each data cache partition along with another metric
recording the path for the partition. These are exposed as the
following metrics:
impala-server.io-mgr.remote-data-cache-partition-$0.path
impala-server.io-mgr.remote-data-cache-partition-$0.read-latency
impala-server.io-mgr.remote-data-cache-partition-$0.write-latency
impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency

This also adds metrics to keep counts of hits, misses, and entries
in the data cache. Since reducing the latency of IO is an important
feature of the data cache, the absolute count of hits and misses
is as important as the hit bytes and miss bytes. This adds the
following metrics:
impala-server.io-mgr.remote-data-cache-hit-count
impala-server.io-mgr.remote-data-cache-miss-count
impala-server.io-mgr.remote-data-cache-num-entries

Testing:
 - Hand testing to verify the per-partition latency histograms
 - Modified custom_cluster/test_data_cache.py to also test
   the counts.

Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
---
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_data_cache.py
7 files changed, 205 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/2
--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-06 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15382


Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..

IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

This adds two sets of metrics. The first is per-partition metrics
to track the performance of the underlying filesystem for the
data cache. It keeps histograms of read, write, and eviction
latency for each data cache partition along with another metric
recording the path for the partition. These are exposed as the
following metrics:
impala-server.io-mgr.remote-data-cache-partition-$0.path
impala-server.io-mgr.remote-data-cache-partition-$0.read-latency
impala-server.io-mgr.remote-data-cache-partition-$0.write-latency
impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency

This also adds metrics to keep counts of hits, misses, and entries
in the data cache. Since reducing the latency of IO is an important
feature of the data cache, the absolute count of hits and misses
is as important as the hit bytes and miss bytes. This adds the
following metrics:
impala-server.io-mgr.remote-data-cache-hit-count
impala-server.io-mgr.remote-data-cache-miss-count
impala-server.io-mgr.remote-data-cache-num-entries

Testing:
 - Hand testing to verify the per-partition latency histograms
 - Modified custom_cluster/test_data_cache.py to also test
   the counts.

Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
---
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_data_cache.py
7 files changed, 204 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/1
--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5448/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 06 Mar 2020 22:54:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5447/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 06 Mar 2020 22:49:41 +
Gerrit-HasComments: No