[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache This adds two sets of metrics. The first is per-partition metrics to track the performance of the underlying filesystem for the data cache. It keeps histograms of read, write, and eviction latency for each data cache partition along with another metric recording the path for the partition. These are exposed as the following metrics: impala-server.io-mgr.remote-data-cache-partition-$0.path impala-server.io-mgr.remote-data-cache-partition-$0.read-latency impala-server.io-mgr.remote-data-cache-partition-$0.write-latency impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency This also adds metrics to keep counts of hits, misses, and entries in the data cache. Since reducing the latency of IO is an important feature of the data cache, the absolute count of hits and misses is as important as the hit bytes and miss bytes. This adds the following metrics: impala-server.io-mgr.remote-data-cache-hit-count impala-server.io-mgr.remote-data-cache-miss-count impala-server.io-mgr.remote-data-cache-num-entries To track metrics around inserts, this also adds the following metrics: impala-server.io-mgr.remote-data-cache-num-inserts impala-server.io-mgr.remote-data-cache-dropped-entries impala-server.io-mgr.remote-data-cache-instant-evictions An instant eviction happens when inserting an entry into the cache fails and the entry is immediately evicted during insert. This is currently only possible for LIRS when the entry's size is larger than the unprotected capacity. This manifests when the cache size is very small. For example, for an 8MB entry, this would manifest when a cache shard is smaller than 160MB. This metric is primarily for debugging. Testing: - Hand testing to verify the per-partition latency histograms - Modified custom_cluster/test_data_cache.py to also test the counts. Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Reviewed-on: http://gerrit.cloudera.org:8080/15382 Reviewed-by: Thomas Tauber-Marshall Tested-by: Impala Public Jenkins --- M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_data_cache.py 7 files changed, 284 insertions(+), 9 deletions(-) Approvals: Thomas Tauber-Marshall: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 11 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 01 Apr 2020 04:47:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 01 Apr 2020 01:23:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5673/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 01 Apr 2020 00:19:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5595/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 01 Apr 2020 00:18:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 31 Mar 2020 23:28:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15382 to look at the new patch set (#10). Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache This adds two sets of metrics. The first is per-partition metrics to track the performance of the underlying filesystem for the data cache. It keeps histograms of read, write, and eviction latency for each data cache partition along with another metric recording the path for the partition. These are exposed as the following metrics: impala-server.io-mgr.remote-data-cache-partition-$0.path impala-server.io-mgr.remote-data-cache-partition-$0.read-latency impala-server.io-mgr.remote-data-cache-partition-$0.write-latency impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency This also adds metrics to keep counts of hits, misses, and entries in the data cache. Since reducing the latency of IO is an important feature of the data cache, the absolute count of hits and misses is as important as the hit bytes and miss bytes. This adds the following metrics: impala-server.io-mgr.remote-data-cache-hit-count impala-server.io-mgr.remote-data-cache-miss-count impala-server.io-mgr.remote-data-cache-num-entries To track metrics around inserts, this also adds the following metrics: impala-server.io-mgr.remote-data-cache-num-inserts impala-server.io-mgr.remote-data-cache-dropped-entries impala-server.io-mgr.remote-data-cache-instant-evictions An instant eviction happens when inserting an entry into the cache fails and the entry is immediately evicted during insert. This is currently only possible for LIRS when the entry's size is larger than the unprotected capacity. This manifests when the cache size is very small. For example, for an 8MB entry, this would manifest when a cache shard is smaller than 160MB. This metric is primarily for debugging. Testing: - Hand testing to verify the per-partition latency histograms - Modified custom_cluster/test_data_cache.py to also test the counts. Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 --- M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_data_cache.py 7 files changed, 284 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/10 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json@567 PS9, Line 567: limit > Limit Done http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json@573 PS9, Line 573: "description": "Total number of instantaneous evictions from the remote data cache.", > I don't think its very obvious from the description what's being counted he I added an extra sentence here and a more detailed comment in the commit message. This metric is a pretty intricate implementation detail. I will file a Jira about expanding/updating the data caching documentation. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 31 Mar 2020 23:26:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json@567 PS9, Line 567: limit Limit http://gerrit.cloudera.org:8080/#/c/15382/9/common/thrift/metrics.json@573 PS9, Line 573: "description": "Total number of instantaneous evictions from the remote data cache.", I don't think its very obvious from the description what's being counted here. That may be okay, given that its probably hard to describe concisely and this is mostly something that regular users wouldn't need to be interested in and is for Impala experts to use, but I wonder if it can be expanded somewhat? Or maybe it would be helpful to describe it thoroughly in the commit message so its written down somewhere? Do you have plans to add documentation around all this cache stuff, eg. a "Tuning Impala's Data Cache" page? -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 31 Mar 2020 22:19:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5670/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 31 Mar 2020 21:07:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 9: Code-Review+1 Rebased, carry +1 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 31 Mar 2020 20:50:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5591/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 31 Mar 2020 20:51:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15382 to look at the new patch set (#9). Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache This adds two sets of metrics. The first is per-partition metrics to track the performance of the underlying filesystem for the data cache. It keeps histograms of read, write, and eviction latency for each data cache partition along with another metric recording the path for the partition. These are exposed as the following metrics: impala-server.io-mgr.remote-data-cache-partition-$0.path impala-server.io-mgr.remote-data-cache-partition-$0.read-latency impala-server.io-mgr.remote-data-cache-partition-$0.write-latency impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency This also adds metrics to keep counts of hits, misses, and entries in the data cache. Since reducing the latency of IO is an important feature of the data cache, the absolute count of hits and misses is as important as the hit bytes and miss bytes. This adds the following metrics: impala-server.io-mgr.remote-data-cache-hit-count impala-server.io-mgr.remote-data-cache-miss-count impala-server.io-mgr.remote-data-cache-num-entries To track metrics around inserts, this also adds the following metrics: impala-server.io-mgr.remote-data-cache-num-inserts impala-server.io-mgr.remote-data-cache-dropped-entries impala-server.io-mgr.remote-data-cache-instant-evictions Testing: - Hand testing to verify the per-partition latency histograms - Modified custom_cluster/test_data_cache.py to also test the counts. Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 --- M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_data_cache.py 7 files changed, 284 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/9 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/15382/8/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/15382/8/be/src/runtime/io/data-cache.cc@700 PS8, Line 700: bool write_success; > nit: set to false by default My style preference is to declare variables without a value when the code is going to set it unconditionally. http://gerrit.cloudera.org:8080/#/c/15382/8/tests/custom_cluster/test_data_cache.py File tests/custom_cluster/test_data_cache.py: http://gerrit.cloudera.org:8080/#/c/15382/8/tests/custom_cluster/test_data_cache.py@64 PS8, Line 64: 0 > flake8: E501 line too long (91 > 90 characters) Done -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 31 Mar 2020 20:49:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 8: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/15382/8/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/15382/8/be/src/runtime/io/data-cache.cc@700 PS8, Line 700: bool write_success; nit: set to false by default -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 31 Mar 2020 18:19:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5650/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 30 Mar 2020 23:20:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/15382/8/tests/custom_cluster/test_data_cache.py File tests/custom_cluster/test_data_cache.py: http://gerrit.cloudera.org:8080/#/c/15382/8/tests/custom_cluster/test_data_cache.py@64 PS8, Line 64: 0 flake8: E501 line too long (91 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 30 Mar 2020 22:39:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15382 to look at the new patch set (#8). Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache This adds two sets of metrics. The first is per-partition metrics to track the performance of the underlying filesystem for the data cache. It keeps histograms of read, write, and eviction latency for each data cache partition along with another metric recording the path for the partition. These are exposed as the following metrics: impala-server.io-mgr.remote-data-cache-partition-$0.path impala-server.io-mgr.remote-data-cache-partition-$0.read-latency impala-server.io-mgr.remote-data-cache-partition-$0.write-latency impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency This also adds metrics to keep counts of hits, misses, and entries in the data cache. Since reducing the latency of IO is an important feature of the data cache, the absolute count of hits and misses is as important as the hit bytes and miss bytes. This adds the following metrics: impala-server.io-mgr.remote-data-cache-hit-count impala-server.io-mgr.remote-data-cache-miss-count impala-server.io-mgr.remote-data-cache-num-entries To track metrics around inserts, this also adds the following metrics: impala-server.io-mgr.remote-data-cache-num-inserts impala-server.io-mgr.remote-data-cache-dropped-entries impala-server.io-mgr.remote-data-cache-instant-evictions Testing: - Hand testing to verify the per-partition latency histograms - Modified custom_cluster/test_data_cache.py to also test the counts. Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 --- M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_data_cache.py 7 files changed, 283 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/8 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25 PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count : impala-server.io-mgr.remote-data-cache-miss-count : impala-server.io-mgr.remote-data-cache-num-entries > yeah hit ratio would be useful as well, I assume Prometheus can easily do t You're right, it makes sense to add an overall write count. It will make interacting with these metrics easier. I added that in the next upload. Hit rate is definitely useful. It's probably most useful over a fixed time period (last hour or last 30 minutes or something), so I'm thinking it is a good fit to do in prometheus. I'm thinking the ratio of lookups vs writes probably won't show much specific to LIRS. LIRS evicts some low priority entries faster, but everything is written to the cache first. LIRS is not an algorithm that rejects things up front, so the number of writes would be the same for LIRS as it is for LRU. We can try it out when we have data. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 30 Mar 2020 22:38:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25 PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count : impala-server.io-mgr.remote-data-cache-miss-count : impala-server.io-mgr.remote-data-cache-num-entries > Yes, I'm thinking that there will be a separate mechanism to consume the co yeah hit ratio would be useful as well, I assume Prometheus can easily do that too, it should just be 'hit-count / (hit-count + miss-count)'. just something to keep in mind when exposing this via Grafana. for LIRS specifically, do we think it is interesting to compare the ratio of lookups vs writes? e.g. if you have a high ratio that probably means there is a bunch of data that is read only once, and thus isn't cached. might be a good way to compare the effectiveness of LRIS vs. LRU. I think the histogram for write-latency would provide the counts, but that would be per-partition; we could probably get a total count by aggregating all the per-partition counts in prometheus, but would it be cleaner just and an aggregated metric for the counts? -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sun, 29 Mar 2020 19:23:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5630/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 28 Mar 2020 02:21:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5629/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 28 Mar 2020 02:16:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15382 to look at the new patch set (#6). Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache This adds two sets of metrics. The first is per-partition metrics to track the performance of the underlying filesystem for the data cache. It keeps histograms of read, write, and eviction latency for each data cache partition along with another metric recording the path for the partition. These are exposed as the following metrics: impala-server.io-mgr.remote-data-cache-partition-$0.path impala-server.io-mgr.remote-data-cache-partition-$0.read-latency impala-server.io-mgr.remote-data-cache-partition-$0.write-latency impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency This also adds metrics to keep counts of hits, misses, and entries in the data cache. Since reducing the latency of IO is an important feature of the data cache, the absolute count of hits and misses is as important as the hit bytes and miss bytes. This adds the following metrics: impala-server.io-mgr.remote-data-cache-hit-count impala-server.io-mgr.remote-data-cache-miss-count impala-server.io-mgr.remote-data-cache-num-entries To track failed inserts, this also adds the following metrics: impala-server.io-mgr.remote-data-cache-dropped-entries impala-server.io-mgr.remote-data-cache-failed-inserts Testing: - Hand testing to verify the per-partition latency histograms - Modified custom_cluster/test_data_cache.py to also test the counts. Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 --- M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_data_cache.py 7 files changed, 257 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/6 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/15382/5/be/src/util/impalad-metrics.cc File be/src/util/impalad-metrics.cc: http://gerrit.cloudera.org:8080/#/c/15382/5/be/src/util/impalad-metrics.cc@360 PS5, Line 360: IO_MGR_REMOTE_DATA_CACHE_DROPPED_ENTRIES = IO_MGR_METRICS->AddCounter( Missed this indentation. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 28 Mar 2020 01:40:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25 PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count : impala-server.io-mgr.remote-data-cache-miss-count : impala-server.io-mgr.remote-data-cache-num-entries > how about hit and miss rates? if we just emit the counts, I guess something Yes, I'm thinking that there will be a separate mechanism to consume the counters and be able to perform rate analysis. Prometheus seems to have this capability: https://www.innoq.com/en/blog/prometheus-counters/ The histograms for writes, reads, and evictions keep counts of the number of events. So, we have that information in the WebUI. I'm less certain about how this is accessible through prometheus. We embed the count in what we send to prometheus: https://github.com/apache/impala/blob/master/be/src/util/histogram-metric.cc#L131 It sounds like this is exposed as its own metric by prometheus: https://prometheus.io/docs/practices/histograms/ I think that should provide counts for writes per partition. Since it is interesting to know when we don't do the inserts, I added two metrics in this upload for the two most interesting cases where we don't do the insert: impala-server.io-mgr.remote-data-cache-dropped-entries - this is the counter version of impala-server.io-mgr.remote-data-cache-dropped-bytes impala-server.io-mgr.remote-data-cache-failed-inserts - this tracks where the Cache::Insert() call fails. http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h@257 PS3, Line 257: int32_t index_; > Maybe note this is used for differentiating metrics. I don't think that it Good point, these are strictly for naming metrics (and will be used in future for the access trace). The index makes no functional difference. Added a comment here. http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@549 PS3, Line 549: if (!TestInfo::is_test() > I find the control flow in this whole function kind of confusing. Maybe ins Good point, reworked the control flow. http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@876 PS3, Line 876: partition_idx++ > nit: ++partition_idx Done -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Sat, 28 Mar 2020 01:35:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15382 to look at the new patch set (#5). Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache This adds two sets of metrics. The first is per-partition metrics to track the performance of the underlying filesystem for the data cache. It keeps histograms of read, write, and eviction latency for each data cache partition along with another metric recording the path for the partition. These are exposed as the following metrics: impala-server.io-mgr.remote-data-cache-partition-$0.path impala-server.io-mgr.remote-data-cache-partition-$0.read-latency impala-server.io-mgr.remote-data-cache-partition-$0.write-latency impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency This also adds metrics to keep counts of hits, misses, and entries in the data cache. Since reducing the latency of IO is an important feature of the data cache, the absolute count of hits and misses is as important as the hit bytes and miss bytes. This adds the following metrics: impala-server.io-mgr.remote-data-cache-hit-count impala-server.io-mgr.remote-data-cache-miss-count impala-server.io-mgr.remote-data-cache-num-entries To track failed inserts, this also adds the following metrics: impala-server.io-mgr.remote-data-cache-dropped-entries impala-server.io-mgr.remote-data-cache-failed-inserts Testing: - Hand testing to verify the per-partition latency histograms - Modified custom_cluster/test_data_cache.py to also test the counts. Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 --- M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_data_cache.py 7 files changed, 257 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/5 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 3: (1 comment) Mostly a high level comment about what metrics to include, code itself looks good. http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25 PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count : impala-server.io-mgr.remote-data-cache-miss-count : impala-server.io-mgr.remote-data-cache-num-entries how about hit and miss rates? if we just emit the counts, I guess something like Grafana would be able to derive the hit / miss rates, right? is that the plan? not sure what the LLAP-Grafana dashboard looks like, but might be worth taking a look at what they have? for the miss-count, should there be an additional metric that helps differentiate between cache misses and cache writes? e.g. for LIRS you can have a cache miss, but the data won't necessarily be written to the cache. so maybe we can add a 'remote-data-cache-write-count' to track how many times a miss actually ends up writing data to the cache? I stole some ideas from the Guava cache metrics FYI: https://guava.dev/releases/22.0/api/docs/com/google/common/cache/CacheStats.html -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 27 Mar 2020 19:59:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h@257 PS3, Line 257: int32_t index_; Maybe note this is used for differentiating metrics. I don't think that it has any other relationship to a meaningful value or property of the cache? http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@549 PS3, Line 549: if (!TestInfo::is_test() I find the control flow in this whole function kind of confusing. Maybe instead do something like if (is_test() && FindMetricForTesting(PATH_METRIC) != nullptr) { // Metrics already inited. return; } and then below you can just have the metrics created like normal without any more is_test or == nullptr checks. http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@876 PS3, Line 876: partition_idx++ nit: ++partition_idx -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 27 Mar 2020 19:52:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5598/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 25 Mar 2020 17:34:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 3: Just stacked this on top of the latest LIRS -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 25 Mar 2020 16:52:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15382 to look at the new patch set (#3). Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache This adds two sets of metrics. The first is per-partition metrics to track the performance of the underlying filesystem for the data cache. It keeps histograms of read, write, and eviction latency for each data cache partition along with another metric recording the path for the partition. These are exposed as the following metrics: impala-server.io-mgr.remote-data-cache-partition-$0.path impala-server.io-mgr.remote-data-cache-partition-$0.read-latency impala-server.io-mgr.remote-data-cache-partition-$0.write-latency impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency This also adds metrics to keep counts of hits, misses, and entries in the data cache. Since reducing the latency of IO is an important feature of the data cache, the absolute count of hits and misses is as important as the hit bytes and miss bytes. This adds the following metrics: impala-server.io-mgr.remote-data-cache-hit-count impala-server.io-mgr.remote-data-cache-miss-count impala-server.io-mgr.remote-data-cache-num-entries Testing: - Hand testing to verify the per-partition latency histograms - Modified custom_cluster/test_data_cache.py to also test the counts. Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 --- M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_data_cache.py 7 files changed, 205 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/3 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/15382/1/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/15382/1/be/src/runtime/io/data-cache.cc@567 PS1, Line 567: eviction_latency_ = ImpaladMetrics::IO_MGR_METRICS->RegisterMetric(new HistogramMetric( line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 06 Mar 2020 22:06:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15382 to look at the new patch set (#2). Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache This adds two sets of metrics. The first is per-partition metrics to track the performance of the underlying filesystem for the data cache. It keeps histograms of read, write, and eviction latency for each data cache partition along with another metric recording the path for the partition. These are exposed as the following metrics: impala-server.io-mgr.remote-data-cache-partition-$0.path impala-server.io-mgr.remote-data-cache-partition-$0.read-latency impala-server.io-mgr.remote-data-cache-partition-$0.write-latency impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency This also adds metrics to keep counts of hits, misses, and entries in the data cache. Since reducing the latency of IO is an important feature of the data cache, the absolute count of hits and misses is as important as the hit bytes and miss bytes. This adds the following metrics: impala-server.io-mgr.remote-data-cache-hit-count impala-server.io-mgr.remote-data-cache-miss-count impala-server.io-mgr.remote-data-cache-num-entries Testing: - Hand testing to verify the per-partition latency histograms - Modified custom_cluster/test_data_cache.py to also test the counts. Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 --- M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_data_cache.py 7 files changed, 205 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/2 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15382 Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache This adds two sets of metrics. The first is per-partition metrics to track the performance of the underlying filesystem for the data cache. It keeps histograms of read, write, and eviction latency for each data cache partition along with another metric recording the path for the partition. These are exposed as the following metrics: impala-server.io-mgr.remote-data-cache-partition-$0.path impala-server.io-mgr.remote-data-cache-partition-$0.read-latency impala-server.io-mgr.remote-data-cache-partition-$0.write-latency impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency This also adds metrics to keep counts of hits, misses, and entries in the data cache. Since reducing the latency of IO is an important feature of the data cache, the absolute count of hits and misses is as important as the hit bytes and miss bytes. This adds the following metrics: impala-server.io-mgr.remote-data-cache-hit-count impala-server.io-mgr.remote-data-cache-miss-count impala-server.io-mgr.remote-data-cache-num-entries Testing: - Hand testing to verify the per-partition latency histograms - Modified custom_cluster/test_data_cache.py to also test the counts. Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 --- M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json M tests/custom_cluster/test_data_cache.py 7 files changed, 204 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/1 -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5448/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 06 Mar 2020 22:54:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15382 ) Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5447/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15382 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478 Gerrit-Change-Number: 15382 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 06 Mar 2020 22:49:41 + Gerrit-HasComments: No