Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11468

to look at the new patch set (#6).

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
......................................................................

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  for i in $(seq 100); do
    curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
---
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/Frontend.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
M tests/custom_cluster/test_pause_monitor.py
A tests/observability/__init__.py
A tests/observability/test_jvm_metrics.py
M tests/run-tests.py
12 files changed, 375 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/6
--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger <phi...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to