[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 10 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 27 Sep 2018 01:55:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. Following up to IMPALA-6857, it's useful for monitoring tools to see if the pause monitor is getting triggered, and to see other GC metrics. The Java side here, and the Thrift side, were easy enough. However, the Impala metric implementation here caused us to call into the frontend to read through the JMX memory beans 72 times, because each call to GetValue() was getting all the data for the pool. This structure made it hard to add additional, non-pool, metrics, and it felt wasteful. To combat this, I added a cache of 10 seconds for getting the metrics from the Frontend. The counters will typically re-use the same data. There are five metrics here, and to avoid yet another enum class, I used C++ lambdas to capture which field of the Thrift object I care about. If folks like the approach, I think it can simplify way the enums for the pool metrics as well. I measured the cost of calling into the metrics code by looping the metrics-gathering 100 times and looking at CPU time for the process using this script: START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') for i in $(seq 100); do curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null done END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') echo $START_CPU $END_CPU $(($END_CPU - $START_CPU)) On a release build on my development machine, gathering metrics 100 times took 0.16 cpu seconds without this change and 0.07 cpu seconds with this change. The measurement accuracy here is 0.01 (I spot-checked this with using the cpuacct cgroup infrastructure which gives you nanos, but it was more painful to script), but this convinces me that this is a net improvement. Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Reviewed-on: http://gerrit.cloudera.org:8080/11468 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/Frontend.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java M tests/custom_cluster/test_pause_monitor.py A tests/observability/__init__.py A tests/observability/test_jvm_metrics.py M tests/run-tests.py 12 files changed, 389 insertions(+), 145 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 11 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/822/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 9 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 22:42:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 10 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 22:07:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11468 to look at the new patch set (#9). Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. Following up to IMPALA-6857, it's useful for monitoring tools to see if the pause monitor is getting triggered, and to see other GC metrics. The Java side here, and the Thrift side, were easy enough. However, the Impala metric implementation here caused us to call into the frontend to read through the JMX memory beans 72 times, because each call to GetValue() was getting all the data for the pool. This structure made it hard to add additional, non-pool, metrics, and it felt wasteful. To combat this, I added a cache of 10 seconds for getting the metrics from the Frontend. The counters will typically re-use the same data. There are five metrics here, and to avoid yet another enum class, I used C++ lambdas to capture which field of the Thrift object I care about. If folks like the approach, I think it can simplify way the enums for the pool metrics as well. I measured the cost of calling into the metrics code by looping the metrics-gathering 100 times and looking at CPU time for the process using this script: START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') for i in $(seq 100); do curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null done END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') echo $START_CPU $END_CPU $(($END_CPU - $START_CPU)) On a release build on my development machine, gathering metrics 100 times took 0.16 cpu seconds without this change and 0.07 cpu seconds with this change. The measurement accuracy here is 0.01 (I spot-checked this with using the cpuacct cgroup infrastructure which gives you nanos, but it was more painful to script), but this convinces me that this is a net improvement. Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 --- M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/Frontend.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java M tests/custom_cluster/test_pause_monitor.py A tests/observability/__init__.py A tests/observability/test_jvm_metrics.py M tests/run-tests.py 12 files changed, 389 insertions(+), 145 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/9 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 9 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 9: Code-Review+2 Carrying +2. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 9 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 22:07:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3227/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 10 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 22:07:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc@246 PS8, Line 246: ok() > Looks serious enough to log rather than silently returning. Done http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc@295 PS8, Line 295: for (const TJvmMemoryPool& usage: last_response_.memory_pools) { > Can we DCHECK that last_response_ is not nullptr? Could happen if GrabMetri last_response_ isn't a pointer. Thrift seems to use std::vector internally (see be/generated-sources/gen-cpp/Frontend_types.h) so I think it's always initialized. That said, this pointed out that I hadn't initialized last_fetch_, which I now have. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 8 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 22:06:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 8: Code-Review+2 (3 comments) http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.h@226 PS8, Line 226: // nit: /// (multiple places) http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc@246 PS8, Line 246: ok() Looks serious enough to log rather than silently returning. http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc@295 PS8, Line 295: for (const TJvmMemoryPool& usage: last_response_.memory_pools) { Can we DCHECK that last_response_ is not nullptr? Could happen if GrabMetrics...() fails in the first run. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 8 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 21:01:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/821/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 8 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 20:33:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11468 to look at the new patch set (#8). Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. Following up to IMPALA-6857, it's useful for monitoring tools to see if the pause monitor is getting triggered, and to see other GC metrics. The Java side here, and the Thrift side, were easy enough. However, the Impala metric implementation here caused us to call into the frontend to read through the JMX memory beans 72 times, because each call to GetValue() was getting all the data for the pool. This structure made it hard to add additional, non-pool, metrics, and it felt wasteful. To combat this, I added a cache of 10 seconds for getting the metrics from the Frontend. The counters will typically re-use the same data. There are five metrics here, and to avoid yet another enum class, I used C++ lambdas to capture which field of the Thrift object I care about. If folks like the approach, I think it can simplify way the enums for the pool metrics as well. I measured the cost of calling into the metrics code by looping the metrics-gathering 100 times and looking at CPU time for the process using this script: START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') for i in $(seq 100); do curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null done END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') echo $START_CPU $END_CPU $(($END_CPU - $START_CPU)) On a release build on my development machine, gathering metrics 100 times took 0.16 cpu seconds without this change and 0.07 cpu seconds with this change. The measurement accuracy here is 0.01 (I spot-checked this with using the cpuacct cgroup infrastructure which gives you nanos, but it was more painful to script), but this convinces me that this is a net improvement. Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 --- M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/Frontend.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java M tests/custom_cluster/test_pause_monitor.py A tests/observability/__init__.py A tests/observability/test_jvm_metrics.py M tests/run-tests.py 12 files changed, 380 insertions(+), 145 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/8 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 8 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 6: (1 comment) Thanks for the review, Tim. http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.h@240 PS6, Line 240: > I don't think we want the space before & Done -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 19:58:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/819/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 7 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 19:21:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/817/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 19:06:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 7: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.h@240 PS6, Line 240: I don't think we want the space before & http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@172 PS5, Line 172: // lock_. > I replaced std::function with the specific function pointer type. Here, the WFM. I think a typedef would be fine since this is an internal API (I suspect at Google the concerns are related to APIs that are consumed by many dependencies). I think this is exposing the fact that I don't know all the corners of the google style guide that well :) http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@181 PS5, Line 181: static JvmMetricCache* GetInstance(); > Done Just to complete the circle, I updated our list of differences from the google style guide to mention this case. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 7 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 18:41:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@205 PS6, Line 205: [](const TGetJvmMemoryMetricsResponse& r) { return r.gc_num_info_threshold_exceeded; }); > line too long (94 > 90) Done http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@207 PS6, Line 207: [](const TGetJvmMemoryMetricsResponse& r) { return r.gc_num_warn_threshold_exceeded; }); > line too long (94 > 90) Done http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@212 PS6, Line 212: [](const TGetJvmMemoryMetricsResponse& r) { return r.gc_total_extra_sleep_time_millis; }); > line too long (96 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 18:41:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11468 to look at the new patch set (#7). Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. Following up to IMPALA-6857, it's useful for monitoring tools to see if the pause monitor is getting triggered, and to see other GC metrics. The Java side here, and the Thrift side, were easy enough. However, the Impala metric implementation here caused us to call into the frontend to read through the JMX memory beans 72 times, because each call to GetValue() was getting all the data for the pool. This structure made it hard to add additional, non-pool, metrics, and it felt wasteful. To combat this, I added a cache of 10 seconds for getting the metrics from the Frontend. The counters will typically re-use the same data. There are five metrics here, and to avoid yet another enum class, I used C++ lambdas to capture which field of the Thrift object I care about. If folks like the approach, I think it can simplify way the enums for the pool metrics as well. I measured the cost of calling into the metrics code by looping the metrics-gathering 100 times and looking at CPU time for the process using this script: START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') for i in $(seq 100); do curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null done END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') echo $START_CPU $END_CPU $(($END_CPU - $START_CPU)) On a release build on my development machine, gathering metrics 100 times took 0.16 cpu seconds without this change and 0.07 cpu seconds with this change. The measurement accuracy here is 0.01 (I spot-checked this with using the cpuacct cgroup infrastructure which gives you nanos, but it was more painful to script), but this convinces me that this is a net improvement. Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 --- M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/Frontend.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java M tests/custom_cluster/test_pause_monitor.py A tests/observability/__init__.py A tests/observability/test_jvm_metrics.py M tests/run-tests.py 12 files changed, 380 insertions(+), 145 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/7 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 7 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@205 PS6, Line 205: [](const TGetJvmMemoryMetricsResponse& r) { return r.gc_num_info_threshold_exceeded; }); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@207 PS6, Line 207: [](const TGetJvmMemoryMetricsResponse& r) { return r.gc_num_warn_threshold_exceeded; }); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@212 PS6, Line 212: [](const TGetJvmMemoryMetricsResponse& r) { return r.gc_total_extra_sleep_time_millis; }); line too long (96 > 90) -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 18:34:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 6: (19 comments) http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@167 PS5, Line 167: caching the gathe > fix this? Done http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@169 PS5, Line 169: // > Add some method docs and some details about locking Done http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@172 PS5, Line 172: // lock_. > I believe the general best practice for passing functions to lightweight fu I replaced std::function with the specific function pointer type. Here, the type is always the same (int64_t)(*f)(const TGetJvmMemorymetricsResponse&), which just rolls off the tongue, eh? I figure being specific is going to be easier to deal with than templating the class, since we need to store this thing too. I also chose not to typedef this because something in the Google style guide says you shouldn't expose typedefs in your API. Anyway--let me know. http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@181 PS5, Line 181: static JvmMetricCache* GetInstance(); > I guess the google style guide says to use this naming convention https://g Done http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@217 PS5, Line 217: > docs. Done http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@224 PS5, Line 224: }; > I guess we already materialise it to a std::function here so maybe my above Right now it's a function pointer, so I think the concern is moot. Let me know if I'm wrong. http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@241 PS5, Line 241: if (now - last_fetch_ < CACHE_PERIOD_MILLIS) return; : if (!JniUtil::GetJvmMemoryMetrics(_response_).ok()) return; : l > nit: single line Done http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@252 PS5, Line 252: > I think 'using' declarations are preferred in the source file rather than q Done http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@290 PS5, Line 290: vector names; > nit: Either return a vector<> or make it a ptr? Done http://gerrit.cloudera.org:8080/#/c/11468/5/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: http://gerrit.cloudera.org:8080/#/c/11468/5/common/thrift/Frontend.thrift@769 PS5, Line 769: > Why keep the empty one? Old habits about believing that Thrift should be compatible. Doesn't apply here, so I've removed it throughout. http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java File fe/src/main/java/org/apache/impala/common/JniUtil.java: http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@132 PS5, Line 132: jvmMetrics.setMemory_pools(new ArrayList()); : TJvmMemoryPool totalUsage = new TJvmMemoryPool(); > Not needed anymore? Done http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@200 PS5, Line 200: jvmMetrics.setGc_total_extra_sleep_time_millis( > Commented elsewhere, these are not always GCs AFAICT. If we are only intere I'm trying to keep the metric names consistent with Hadoop. http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@210 PS5, Line 210: jvmMetrics.setGc_count(gcCount); : jvmMetrics.setGc_time_millis(gcTimeMillis); : : r > I'm not super sure if it makes sense to add these across different GarbageC A few answers here. This is consistent with Hadoop, and that's nice, since tooling for one can work for the other. As much as anything, an overall count of the seconds spent on garbage collection is an ok proxy for how much you're paying for GC. If you're spending 1000 millis per second on GC, you've pinned a thread to GC, and that's typically bad. The gccount stuff is less defensible, but when correlated with memory usage charts, you can see what's going on. We already expose the specific GC metrics via /jmx. i.e., if you wanted to separate by collector, you could. Generically, though, the collector you're using depends on your Java flags and on your JVM, so a monitoring tool might want something more generic that's always available. Anyway: on balance, I think consistency with Hadoop is reasonable. I'm open to removing gcCount. I think gcTimeMillis is useful. For your other question, these are counters. If
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11468 to look at the new patch set (#6). Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. Following up to IMPALA-6857, it's useful for monitoring tools to see if the pause monitor is getting triggered, and to see other GC metrics. The Java side here, and the Thrift side, were easy enough. However, the Impala metric implementation here caused us to call into the frontend to read through the JMX memory beans 72 times, because each call to GetValue() was getting all the data for the pool. This structure made it hard to add additional, non-pool, metrics, and it felt wasteful. To combat this, I added a cache of 10 seconds for getting the metrics from the Frontend. The counters will typically re-use the same data. There are five metrics here, and to avoid yet another enum class, I used C++ lambdas to capture which field of the Thrift object I care about. If folks like the approach, I think it can simplify way the enums for the pool metrics as well. I measured the cost of calling into the metrics code by looping the metrics-gathering 100 times and looking at CPU time for the process using this script: START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') for i in $(seq 100); do curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null done END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') echo $START_CPU $END_CPU $(($END_CPU - $START_CPU)) On a release build on my development machine, gathering metrics 100 times took 0.16 cpu seconds without this change and 0.07 cpu seconds with this change. The measurement accuracy here is 0.01 (I spot-checked this with using the cpuacct cgroup infrastructure which gives you nanos, but it was more painful to script), but this convinces me that this is a net improvement. Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 --- M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/Frontend.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java M tests/custom_cluster/test_pause_monitor.py A tests/observability/__init__.py A tests/observability/test_jvm_metrics.py M tests/run-tests.py 12 files changed, 375 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/6 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 5: (13 comments) http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@167 PS5, Line 167: caching gathering fix this? http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@169 PS5, Line 169: class JvmMetricCache { Add some method docs and some details about locking http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@217 PS5, Line 217: class JvmMemoryCounterMetric : public IntCounter { docs. http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@241 PS5, Line 241: if (now - last_fetch_ < kCachePeriodMillis) { : return; : } nit: single line http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@252 PS5, Line 252: boost:: I think 'using' declarations are preferred in the source file rather than qualifying it everywhere. http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@290 PS5, Line 290: void JvmMetricCache::GetPoolNames(vector& names) { nit: Either return a vector<> or make it a ptr? https://google.github.io/styleguide/cppguide.html#Output_Parameters http://gerrit.cloudera.org:8080/#/c/11468/5/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: http://gerrit.cloudera.org:8080/#/c/11468/5/common/thrift/Frontend.thrift@769 PS5, Line 769: struct TGetJvmMemoryMetricsRequest { Why keep the empty one? http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java File fe/src/main/java/org/apache/impala/common/JniUtil.java: http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@132 PS5, Line 132: TGetJvmMemoryMetricsRequest request = new TGetJvmMemoryMetricsRequest(); : JniUtil.deserializeThrift(protocolFactory_, request, argument); Not needed anymore? http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@200 PS5, Line 200: jvmMetrics.setGc_num_warn_threshold_exceeded( Commented elsewhere, these are not always GCs AFAICT. If we are only interested in num pauses, may be rename it to num_pause_warn_threshold...? http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@210 PS5, Line 210: for (GarbageCollectorMXBean bean : ManagementFactory.getGarbageCollectorMXBeans()) { : gcCount += bean.getCollectionCount(); : gcTimeMillis += bean.getCollectionTime(); : } I'm not super sure if it makes sense to add these across different GarbageCollectors. I think we are more interested in specific Garbage collectors. Also qq, these are the counts (and elapsed times) aggregated till this point in time right? From a debugging POV, doesn't it make sense to poll the last GC information (like in JMX output)? Because we are usually interested in in certain windows when it happens. http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java File fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java: http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java@201 PS5, Line 201: numGcWarnThresholdExceeded Per my understanding, a pause is not always a GC pause (look at formatMessage()). should you distinguish between these two pauses or rename it to numPause http://gerrit.cloudera.org:8080/#/c/11468/5/tests/custom_cluster/test_pause_monitor.py File tests/custom_cluster/test_pause_monitor.py: http://gerrit.cloudera.org:8080/#/c/11468/5/tests/custom_cluster/test_pause_monitor.py@36 PS5, Line 36:# Wait for a few seconds for the logs to get flushed. : time.sleep(5) still needed? http://gerrit.cloudera.org:8080/#/c/11468/5/tests/infra/test_jvm_metrics.py File tests/infra/test_jvm_metrics.py: http://gerrit.cloudera.org:8080/#/c/11468/5/tests/infra/test_jvm_metrics.py@22 PS5, Line 22: """Class for pause monitor tests.""" Doesn't look like it? -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 5: (6 comments) Remaining comments are mainly about maintainability. http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@172 PS5, Line 172: long GetCounterMetric(std::function accessor); I believe the general best practice for passing functions to lightweight functions like this is to make this a template function with the function type as a template parameter. The issue is that there's overhead in converting a lambda to a std::function, because std::function has to have a uniform representation for passing in arbitrary numbers of captured variables. https://vittorioromeo.info/index/blog/passing_functions_to_functions.html has some explanation. http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@181 PS5, Line 181: static const int64_t kCachePeriodMillis = 1000; I guess the google style guide says to use this naming convention https://google.github.io/styleguide/cppguide.html#Constant_Names, but we've pretty consistently used UPPER_CASE for constants in C++. http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@224 PS5, Line 224: std::function accessor); I guess we already materialise it to a std::function here so maybe my above comment doesn't make a difference, but maybe we should pass it by const reference above to avoid copying it? We could make it templates all the way down by making this class templated by the function type, but that feels like overkill. http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java File fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java: http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java@62 PS5, Line 62: // Overall metrics Could be worth mentioning why they're volatile, because it's an unusual and somewhat risky pattern if misapplied. E.g. "Volatile to allow populating metrics concurrently with the values being updated without staleness (but with no other synchronization guarantees)." http://gerrit.cloudera.org:8080/#/c/11468/5/tests/infra/test_jvm_metrics.py File tests/infra/test_jvm_metrics.py: http://gerrit.cloudera.org:8080/#/c/11468/5/tests/infra/test_jvm_metrics.py@21 PS5, Line 21: class TestJvmMetrics(ImpalaTestSuite): I'm not sure that infra/ is the right place for this test, since I think it means "test infra". It seems to fit with webserver/test_web_pages.py. Maybe we should consolidate into an observability/ subdirectory, since that seems to be the general theme. http://gerrit.cloudera.org:8080/#/c/11468/5/tests/infra/test_jvm_metrics.py@29 PS5, Line 29: METRICS = [ NON_ZERO_METRICS? -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 26 Sep 2018 05:56:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/798/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Sep 2018 23:27:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 5: (4 comments) I think I fixed the line-length and whitespace issues. Tim--care to take another look? Thanks! http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.h@223 PS4, Line 223: const string& key, > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.cc@257 PS2, Line 257: int64_t JvmMetricCache::GetPoolMetric(const std::string& mempool_name, > line too long (98 > 90) Done http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.cc@257 PS4, Line 257: int64_t JvmMetricCache::GetPoolMetric(const std::string& mempool_name, > line too long (98 > 90) Done http://gerrit.cloudera.org:8080/#/c/11468/4/tests/infra/test_jvm_metrics.py File tests/infra/test_jvm_metrics.py: http://gerrit.cloudera.org:8080/#/c/11468/4/tests/infra/test_jvm_metrics.py@20 PS4, Line 20: > flake8: E302 expected 2 blank lines, found 1 Done -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 25 Sep 2018 22:35:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11468 to look at the new patch set (#5). Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. Following up to IMPALA-6857, it's useful for monitoring tools to see if the pause monitor is getting triggered, and to see other GC metrics. The Java side here, and the Thrift side, were easy enough. However, the Impala metric implementation here caused us to call into the frontend to read through the JMX memory beans 72 times, because each call to GetValue() was getting all the data for the pool. This structure made it hard to add additional, non-pool, metrics, and it felt wasteful. To combat this, I added a cache of 10 seconds for getting the metrics from the Frontend. The counters will typically re-use the same data. There are five metrics here, and to avoid yet another enum class, I used C++ lambdas to capture which field of the Thrift object I care about. If folks like the approach, I think it can simplify way the enums for the pool metrics as well. I measured the cost of calling into the metrics code by looping the metrics-gathering 100 times and looking at CPU time for the process using this script: START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') for i in $(seq 100); do curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null done END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') echo $START_CPU $END_CPU $(($END_CPU - $START_CPU)) On a release build on my development machine, gathering metrics 100 times took 0.16 cpu seconds without this change and 0.07 cpu seconds with this change. The measurement accuracy here is 0.01 (I spot-checked this with using the cpuacct cgroup infrastructure which gives you nanos, but it was more painful to script), but this convinces me that this is a net improvement. TODO: Update tests to check new metrics. Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 --- M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/Frontend.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java M tests/custom_cluster/test_pause_monitor.py A tests/infra/test_jvm_metrics.py 8 files changed, 355 insertions(+), 127 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/5 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.h@223 PS4, Line 223: const string& key, std::function accessor); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.cc@257 PS4, Line 257: int64_t JvmMetricCache::GetPoolMetric(const std::string& mempool_name, JvmMemoryMetricType type) { line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/11468/4/tests/infra/test_jvm_metrics.py File tests/infra/test_jvm_metrics.py: http://gerrit.cloudera.org:8080/#/c/11468/4/tests/infra/test_jvm_metrics.py@20 PS4, Line 20: class TestJvmMetrics(ImpalaTestSuite): flake8: E302 expected 2 blank lines, found 1 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 20 Sep 2018 11:49:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/11468/3/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: http://gerrit.cloudera.org:8080/#/c/11468/3/be/src/util/memory-metrics.h@223 PS3, Line 223: const string& key, std::function accessor); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/11468/3/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: http://gerrit.cloudera.org:8080/#/c/11468/3/be/src/util/memory-metrics.cc@257 PS3, Line 257: int64_t JvmMetricCache::GetPoolMetric(const std::string& mempool_name, JvmMemoryMetricType type) { line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/11468/3/tests/infra/test_jvm_metrics.py File tests/infra/test_jvm_metrics.py: http://gerrit.cloudera.org:8080/#/c/11468/3/tests/infra/test_jvm_metrics.py@20 PS3, Line 20: class TestJvmMetrics(ImpalaTestSuite): flake8: E302 expected 2 blank lines, found 1 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 20 Sep 2018 07:13:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.h File be/src/util/memory-metrics.h: http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.h@223 PS2, Line 223: const string& key, std::function accessor); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.cc File be/src/util/memory-metrics.cc: http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.cc@257 PS2, Line 257: int64_t JvmMetricCache::GetPoolMetric(const std::string& mempool_name, JvmMemoryMetricType type) { line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/11468/2/tests/infra/test_jvm_metrics.py File tests/infra/test_jvm_metrics.py: http://gerrit.cloudera.org:8080/#/c/11468/2/tests/infra/test_jvm_metrics.py@20 PS2, Line 20: class TestJvmMetrics(ImpalaTestSuite): flake8: E302 expected 2 blank lines, found 1 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 20 Sep 2018 06:03:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/726/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 20 Sep 2018 00:36:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11468 to look at the new patch set (#4). Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. Following up to IMPALA-6857, it's useful for monitoring tools to see if the pause monitor is getting triggered, and to see other GC metrics. The Java side here, and the Thrift side, were easy enough. However, the Impala metric implementation here caused us to call into the frontend to read through the JMX memory beans 72 times, because each call to GetValue() was getting all the data for the pool. This structure made it hard to add additional, non-pool, metrics, and it felt wasteful. To combat this, I added a cache of 10 seconds for getting the metrics from the Frontend. The counters will typically re-use the same data. There are five metrics here, and to avoid yet another enum class, I used C++ lambdas to capture which field of the Thrift object I care about. If folks like the approach, I think it can simplify way the enums for the pool metrics as well. I measured the cost of calling into the metrics code by looping the metrics-gathering 100 times and looking at CPU time for the process using this script: START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') for i in $(seq 100); do curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null done END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') echo $START_CPU $END_CPU $(($END_CPU - $START_CPU)) On a release build on my development machine, gathering metrics 100 times took 0.16 cpu seconds without this change and 0.07 cpu seconds with this change. The measurement accuracy here is 0.01 (I spot-checked this with using the cpuacct cgroup infrastructure which gives you nanos, but it was more painful to script), but this convinces me that this is a net improvement. TODO: Update tests to check new metrics. Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 --- M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/Frontend.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java M tests/custom_cluster/test_pause_monitor.py A tests/infra/test_jvm_metrics.py 8 files changed, 352 insertions(+), 127 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/4 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/722/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 19 Sep 2018 23:40:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11468 ) Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/720/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 19 Sep 2018 23:39:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Hello Bharath Vissapragada, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11468 to look at the new patch set (#3). Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. Following up to IMPALA-6857, it's useful for monitoring tools to see if the pause monitor is getting triggered, and to see other GC metrics. The Java side here, and the Thrift side, were easy enough. However, the Impala metric implementation here caused us to call into the frontend to read through the JMX memory beans 72 times, because each call to GetValue() was getting all the data for the pool. This structure made it hard to add additional, non-pool, metrics, and it felt wasteful. To combat this, I added a cache of 10 seconds for getting the metrics from the Frontend. The counters will typically re-use the same data. There are five metrics here, and to avoid yet another enum class, I used C++ lambdas to capture which field of the Thrift object I care about. If folks like the approach, I think it can simplify way the enums for the pool metrics as well. I measured the cost of calling into the metrics code by looping the metrics-gathering 100 times and looking at CPU time for the process using this script: START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') for i in $(seq 100); do curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null done END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') echo $START_CPU $END_CPU $(($END_CPU - $START_CPU)) On a release build on my development machine, gathering metrics 100 times took 0.16 cpu seconds without this change and 0.07 cpu seconds with this change. The measurement accuracy here is 0.01 (I spot-checked this with using the cpuacct cgroup infrastructure which gives you nanos, but it was more painful to script), but this convinces me that this is a net improvement. TODO: Update tests to check new metrics. Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 --- M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/Frontend.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java M tests/custom_cluster/test_pause_monitor.py A tests/infra/test_jvm_metrics.py 8 files changed, 352 insertions(+), 128 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/3 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.
Hello Bharath Vissapragada, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11468 to look at the new patch set (#2). Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. .. IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. Following up to IMPALA-6857, it's useful for monitoring tools to see if the pause monitor is getting triggered, and to see other GC metrics. The Java side here, and the Thrift side, were easy enough. However, the Impala metric implementation here caused us to call into the frontend to read through the JMX memory beans 72 times, because each call to GetValue() was getting all the data for the pool. This structure made it hard to add additional, non-pool, metrics, and it felt wasteful. To combat this, I added a cache of 10 seconds for getting the metrics from the Frontend. The counters will typically re-use the same data. There are five metrics here, and to avoid yet another enum class, I used C++ lambdas to capture which field of the Thrift object I care about. If folks like the approach, I think it can simplify way the enums for the pool metrics as well. I measured the cost of calling into the metrics code by looping the metrics-gathering 100 times and looking at CPU time for the process using this script: START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') for i in $(seq 100); do curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null done END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') echo $START_CPU $END_CPU $(($END_CPU - $START_CPU)) On a release build on my development machine, gathering metrics 100 times took 0.16 cpu seconds without this change and 0.07 cpu seconds with this change. The measurement accuracy here is 0.01 (I spot-checked this with using the cpuacct cgroup infrastructure which gives you nanos, but it was more painful to script), but this convinces me that this is a net improvement. TODO: Update tests to check new metrics. Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 --- M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/Frontend.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java M tests/custom_cluster/test_pause_monitor.py A tests/infra/test_jvm_metrics.py 8 files changed, 352 insertions(+), 128 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/2 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong