[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 10: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 10
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 27 Sep 2018 01:55:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  for i in $(seq 100); do
curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Reviewed-on: http://gerrit.cloudera.org:8080/11468
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/Frontend.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
M tests/custom_cluster/test_pause_monitor.py
A tests/observability/__init__.py
A tests/observability/test_jvm_metrics.py
M tests/run-tests.py
12 files changed, 389 insertions(+), 145 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 11
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/822/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 9
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 22:42:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 10
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 22:07:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Philip Zeyliger (Code Review)
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11468

to look at the new patch set (#9).

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  for i in $(seq 100); do
curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
---
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/Frontend.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
M tests/custom_cluster/test_pause_monitor.py
A tests/observability/__init__.py
A tests/observability/test_jvm_metrics.py
M tests/run-tests.py
12 files changed, 389 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/9
--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 9
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 9: Code-Review+2

Carrying +2.


--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 9
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 22:07:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3227/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 10
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 22:07:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc@246
PS8, Line 246: ok()
> Looks serious enough to log rather than silently returning.
Done


http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc@295
PS8, Line 295:   for (const TJvmMemoryPool& usage: last_response_.memory_pools) 
{
> Can we DCHECK that last_response_ is not nullptr? Could happen if GrabMetri
last_response_ isn't a pointer. Thrift seems to use std::vector internally (see 
be/generated-sources/gen-cpp/Frontend_types.h) so I think it's always 
initialized.

That said, this pointed out that I hadn't initialized last_fetch_, which I now 
have.



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 22:06:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 8: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.h@226
PS8, Line 226: //
nit: /// (multiple places)


http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc@246
PS8, Line 246: ok()
Looks serious enough to log rather than silently returning.


http://gerrit.cloudera.org:8080/#/c/11468/8/be/src/util/memory-metrics.cc@295
PS8, Line 295:   for (const TJvmMemoryPool& usage: last_response_.memory_pools) 
{
Can we DCHECK that last_response_ is not nullptr? Could happen if 
GrabMetrics...() fails in the first run.



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 21:01:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/821/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 20:33:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Philip Zeyliger (Code Review)
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11468

to look at the new patch set (#8).

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  for i in $(seq 100); do
curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
---
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/Frontend.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
M tests/custom_cluster/test_pause_monitor.py
A tests/observability/__init__.py
A tests/observability/test_jvm_metrics.py
M tests/run-tests.py
12 files changed, 380 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/8
--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 6:

(1 comment)

Thanks for the review, Tim.

http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.h@240
PS6, Line 240:
> I don't think we want the space before &
Done



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 19:58:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/819/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 19:21:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/817/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 19:06:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 7: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.h@240
PS6, Line 240:
I don't think we want the space before &


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@172
PS5, Line 172: // lock_.
> I replaced std::function with the specific function pointer type. Here, the
WFM. I think a typedef would be fine since this is an internal API (I suspect 
at Google the concerns are related to APIs that are consumed by many 
dependencies).

I think this is exposing the fact that I don't know all the corners of the 
google style guide that well :)


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@181
PS5, Line 181: static JvmMetricCache* GetInstance();
> Done
Just to complete the circle, I updated our list of differences from the google 
style guide to mention this case.



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 18:41:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@205
PS6, Line 205:   [](const TGetJvmMemoryMetricsResponse& r) { return 
r.gc_num_info_threshold_exceeded; });
> line too long (94 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@207
PS6, Line 207:   [](const TGetJvmMemoryMetricsResponse& r) { return 
r.gc_num_warn_threshold_exceeded; });
> line too long (94 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@212
PS6, Line 212:   [](const TGetJvmMemoryMetricsResponse& r) { return 
r.gc_total_extra_sleep_time_millis; });
> line too long (96 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 18:41:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Philip Zeyliger (Code Review)
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11468

to look at the new patch set (#7).

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  for i in $(seq 100); do
curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
---
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/Frontend.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
M tests/custom_cluster/test_pause_monitor.py
A tests/observability/__init__.py
A tests/observability/test_jvm_metrics.py
M tests/run-tests.py
12 files changed, 380 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/7
--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@205
PS6, Line 205:   [](const TGetJvmMemoryMetricsResponse& r) { return 
r.gc_num_info_threshold_exceeded; });
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@207
PS6, Line 207:   [](const TGetJvmMemoryMetricsResponse& r) { return 
r.gc_num_warn_threshold_exceeded; });
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/11468/6/be/src/util/memory-metrics.cc@212
PS6, Line 212:   [](const TGetJvmMemoryMetricsResponse& r) { return 
r.gc_total_extra_sleep_time_millis; });
line too long (96 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 18:34:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 6:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@167
PS5, Line 167: caching the gathe
> fix this?
Done


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@169
PS5, Line 169: //
> Add some method docs and some details about locking
Done


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@172
PS5, Line 172: // lock_.
> I believe the general best practice for passing functions to lightweight fu
I replaced std::function with the specific function pointer type. Here, the 
type is always the same (int64_t)(*f)(const TGetJvmMemorymetricsResponse&), 
which just rolls off the tongue, eh? I figure being specific is going to be 
easier to deal with than templating the class, since we need to store this 
thing too. I also chose not to typedef this because something in the Google 
style guide says you shouldn't expose typedefs in your API.

Anyway--let me know.


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@181
PS5, Line 181: static JvmMetricCache* GetInstance();
> I guess the google style guide says to use this naming convention https://g
Done


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@217
PS5, Line 217:
> docs.
Done


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@224
PS5, Line 224: };
> I guess we already materialise it to a std::function here so maybe my above
Right now it's a function pointer, so I think the concern is moot. Let me know 
if I'm wrong.


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@241
PS5, Line 241: if (now - last_fetch_ < CACHE_PERIOD_MILLIS) return;
 :   if (!JniUtil::GetJvmMemoryMetrics(_response_).ok()) 
return;
 :   l
> nit: single line
Done


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@252
PS5, Line 252:
> I think 'using' declarations are preferred in the source file rather than q
Done


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@290
PS5, Line 290:   vector names;
> nit: Either return a vector<> or make it a ptr?
Done


http://gerrit.cloudera.org:8080/#/c/11468/5/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/11468/5/common/thrift/Frontend.thrift@769
PS5, Line 769:
> Why keep the empty one?
Old habits about believing that Thrift should be compatible. Doesn't apply 
here, so I've removed it throughout.


http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java
File fe/src/main/java/org/apache/impala/common/JniUtil.java:

http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@132
PS5, Line 132:  jvmMetrics.setMemory_pools(new ArrayList());
 : TJvmMemoryPool totalUsage = new TJvmMemoryPool();
> Not needed anymore?
Done


http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@200
PS5, Line 200: jvmMetrics.setGc_total_extra_sleep_time_millis(
> Commented elsewhere, these are not always GCs AFAICT. If we are only intere
I'm trying to keep the metric names consistent with Hadoop.


http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@210
PS5, Line 210:  jvmMetrics.setGc_count(gcCount);
 : jvmMetrics.setGc_time_millis(gcTimeMillis);
 :
 : r
> I'm not super sure if it makes sense to add these across different GarbageC
A few answers here.

This is consistent with Hadoop, and that's nice, since tooling for one can work 
for the other.

As much as anything, an overall count of the seconds spent on garbage 
collection is an ok proxy for how much you're paying for GC. If you're spending 
1000 millis per second on GC, you've pinned a thread to GC, and that's 
typically bad. The gccount stuff is less defensible, but when correlated with 
memory usage charts, you can see what's going on.

We already expose the specific GC metrics via /jmx. i.e., if you wanted to 
separate by collector, you could. Generically, though, the collector you're 
using depends on your Java flags and on your JVM, so a monitoring tool might 
want something more generic that's always available.

Anyway: on balance, I think consistency with Hadoop is reasonable. I'm open to 
removing gcCount. I think gcTimeMillis is useful.

For your other question, these are counters. If 

[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Philip Zeyliger (Code Review)
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11468

to look at the new patch set (#6).

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  for i in $(seq 100); do
curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
---
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/Frontend.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
M tests/custom_cluster/test_pause_monitor.py
A tests/observability/__init__.py
A tests/observability/test_jvm_metrics.py
M tests/run-tests.py
12 files changed, 375 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/6
--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 5:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@167
PS5, Line 167: caching gathering
fix this?


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@169
PS5, Line 169: class JvmMetricCache {
Add some method docs and some details about locking


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@217
PS5, Line 217: class JvmMemoryCounterMetric : public IntCounter {
docs.


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@241
PS5, Line 241: if (now - last_fetch_ < kCachePeriodMillis) {
 : return;
 :   }
nit: single line


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@252
PS5, Line 252:  boost::
I think 'using' declarations are preferred in the source file rather than 
qualifying it everywhere.


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.cc@290
PS5, Line 290: void JvmMetricCache::GetPoolNames(vector& names) {
nit: Either return a vector<> or make it a ptr?

https://google.github.io/styleguide/cppguide.html#Output_Parameters


http://gerrit.cloudera.org:8080/#/c/11468/5/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/11468/5/common/thrift/Frontend.thrift@769
PS5, Line 769: struct TGetJvmMemoryMetricsRequest {
Why keep the empty one?


http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java
File fe/src/main/java/org/apache/impala/common/JniUtil.java:

http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@132
PS5, Line 132:  TGetJvmMemoryMetricsRequest request = new 
TGetJvmMemoryMetricsRequest();
 : JniUtil.deserializeThrift(protocolFactory_, request, 
argument);
Not needed anymore?


http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@200
PS5, Line 200: jvmMetrics.setGc_num_warn_threshold_exceeded(
Commented elsewhere, these are not always GCs AFAICT. If we are only interested 
in num pauses, may be rename it to num_pause_warn_threshold...?


http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@210
PS5, Line 210:  for (GarbageCollectorMXBean bean : 
ManagementFactory.getGarbageCollectorMXBeans()) {
 :   gcCount += bean.getCollectionCount();
 :   gcTimeMillis += bean.getCollectionTime();
 : }
I'm not super sure if it makes sense to add these across different 
GarbageCollectors. I think we are more interested in specific Garbage 
collectors.

Also qq, these are the counts (and elapsed times) aggregated till this point in 
time right? From a debugging POV, doesn't it make sense to poll the last GC 
information (like in JMX output)? Because we are usually interested in in 
certain windows when it happens.


http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
File fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java:

http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java@201
PS5, Line 201: numGcWarnThresholdExceeded
Per my understanding, a pause is not always a GC pause (look at 
formatMessage()). should you distinguish between these two pauses or rename it 
to numPause


http://gerrit.cloudera.org:8080/#/c/11468/5/tests/custom_cluster/test_pause_monitor.py
File tests/custom_cluster/test_pause_monitor.py:

http://gerrit.cloudera.org:8080/#/c/11468/5/tests/custom_cluster/test_pause_monitor.py@36
PS5, Line 36:# Wait for a few seconds for the logs to get flushed.
: time.sleep(5)
still needed?


http://gerrit.cloudera.org:8080/#/c/11468/5/tests/infra/test_jvm_metrics.py
File tests/infra/test_jvm_metrics.py:

http://gerrit.cloudera.org:8080/#/c/11468/5/tests/infra/test_jvm_metrics.py@22
PS5, Line 22:   """Class for pause monitor tests."""
Doesn't look like it?



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 

[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 5:

(6 comments)

Remaining comments are mainly about maintainability.

http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@172
PS5, Line 172: long 
GetCounterMetric(std::function 
accessor);
I believe the general best practice for passing functions to lightweight 
functions like this is to make this a template function with the function type 
as a template parameter.

The issue is that there's overhead in converting a lambda to a std::function, 
because std::function has to have a uniform representation for passing in 
arbitrary numbers of captured variables.

https://vittorioromeo.info/index/blog/passing_functions_to_functions.html has 
some explanation.


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@181
PS5, Line 181: static const int64_t kCachePeriodMillis = 1000;
I guess the google style guide says to use this naming convention 
https://google.github.io/styleguide/cppguide.html#Constant_Names, but we've 
pretty consistently used UPPER_CASE for constants in C++.


http://gerrit.cloudera.org:8080/#/c/11468/5/be/src/util/memory-metrics.h@224
PS5, Line 224: std::function 
accessor);
I guess we already materialise it to a std::function here so maybe my above 
comment doesn't make a difference, but maybe we should pass it by const 
reference above to avoid copying it?

We could make it templates all the way down by making this class templated by 
the function type, but that feels like overkill.


http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
File fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java:

http://gerrit.cloudera.org:8080/#/c/11468/5/fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java@62
PS5, Line 62:   // Overall metrics
Could be worth mentioning why they're volatile, because it's an unusual and 
somewhat risky pattern if misapplied. E.g.

  "Volatile to allow populating metrics concurrently with the values being 
updated without staleness (but with no other synchronization guarantees)."


http://gerrit.cloudera.org:8080/#/c/11468/5/tests/infra/test_jvm_metrics.py
File tests/infra/test_jvm_metrics.py:

http://gerrit.cloudera.org:8080/#/c/11468/5/tests/infra/test_jvm_metrics.py@21
PS5, Line 21: class TestJvmMetrics(ImpalaTestSuite):
I'm not sure that infra/ is the right place for this test, since I think it 
means "test infra". It seems to fit with webserver/test_web_pages.py. Maybe we 
should consolidate into an observability/ subdirectory, since that seems to be 
the general theme.


http://gerrit.cloudera.org:8080/#/c/11468/5/tests/infra/test_jvm_metrics.py@29
PS5, Line 29: METRICS = [
NON_ZERO_METRICS?



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 26 Sep 2018 05:56:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/798/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 25 Sep 2018 23:27:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-25 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 5:

(4 comments)

I think I fixed the line-length and whitespace issues.

Tim--care to take another look?

Thanks!

http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.h@223
PS4, Line 223: const string& key,
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.cc@257
PS2, Line 257: int64_t JvmMetricCache::GetPoolMetric(const std::string& 
mempool_name,
> line too long (98 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.cc@257
PS4, Line 257: int64_t JvmMetricCache::GetPoolMetric(const std::string& 
mempool_name,
> line too long (98 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11468/4/tests/infra/test_jvm_metrics.py
File tests/infra/test_jvm_metrics.py:

http://gerrit.cloudera.org:8080/#/c/11468/4/tests/infra/test_jvm_metrics.py@20
PS4, Line 20:
> flake8: E302 expected 2 blank lines, found 1
Done



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 25 Sep 2018 22:35:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-25 Thread Philip Zeyliger (Code Review)
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11468

to look at the new patch set (#5).

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  for i in $(seq 100); do
curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

TODO: Update tests to check new metrics.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
---
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/Frontend.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
M tests/custom_cluster/test_pause_monitor.py
A tests/infra/test_jvm_metrics.py
8 files changed, 355 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/5
--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.h@223
PS4, Line 223: const string& key, 
std::function accessor);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/11468/4/be/src/util/memory-metrics.cc@257
PS4, Line 257: int64_t JvmMetricCache::GetPoolMetric(const std::string& 
mempool_name, JvmMemoryMetricType type) {
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/11468/4/tests/infra/test_jvm_metrics.py
File tests/infra/test_jvm_metrics.py:

http://gerrit.cloudera.org:8080/#/c/11468/4/tests/infra/test_jvm_metrics.py@20
PS4, Line 20: class TestJvmMetrics(ImpalaTestSuite):
flake8: E302 expected 2 blank lines, found 1



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 20 Sep 2018 11:49:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11468/3/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

http://gerrit.cloudera.org:8080/#/c/11468/3/be/src/util/memory-metrics.h@223
PS3, Line 223: const string& key, 
std::function accessor);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/11468/3/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/11468/3/be/src/util/memory-metrics.cc@257
PS3, Line 257: int64_t JvmMetricCache::GetPoolMetric(const std::string& 
mempool_name, JvmMemoryMetricType type) {
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/11468/3/tests/infra/test_jvm_metrics.py
File tests/infra/test_jvm_metrics.py:

http://gerrit.cloudera.org:8080/#/c/11468/3/tests/infra/test_jvm_metrics.py@20
PS3, Line 20: class TestJvmMetrics(ImpalaTestSuite):
flake8: E302 expected 2 blank lines, found 1



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 20 Sep 2018 07:13:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.h
File be/src/util/memory-metrics.h:

http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.h@223
PS2, Line 223: const string& key, 
std::function accessor);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.cc
File be/src/util/memory-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/11468/2/be/src/util/memory-metrics.cc@257
PS2, Line 257: int64_t JvmMetricCache::GetPoolMetric(const std::string& 
mempool_name, JvmMemoryMetricType type) {
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/11468/2/tests/infra/test_jvm_metrics.py
File tests/infra/test_jvm_metrics.py:

http://gerrit.cloudera.org:8080/#/c/11468/2/tests/infra/test_jvm_metrics.py@20
PS2, Line 20: class TestJvmMetrics(ImpalaTestSuite):
flake8: E302 expected 2 blank lines, found 1



--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 20 Sep 2018 06:03:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/726/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 20 Sep 2018 00:36:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-19 Thread Philip Zeyliger (Code Review)
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11468

to look at the new patch set (#4).

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  for i in $(seq 100); do
curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

TODO: Update tests to check new metrics.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
---
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/Frontend.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
M tests/custom_cluster/test_pause_monitor.py
A tests/infra/test_jvm_metrics.py
8 files changed, 352 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/4
--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/722/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 19 Sep 2018 23:40:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11468 )

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/720/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 19 Sep 2018 23:39:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-19 Thread Philip Zeyliger (Code Review)
Hello Bharath Vissapragada, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11468

to look at the new patch set (#3).

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  for i in $(seq 100); do
curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

TODO: Update tests to check new metrics.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
---
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/Frontend.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
M tests/custom_cluster/test_pause_monitor.py
A tests/infra/test_jvm_metrics.py
8 files changed, 352 insertions(+), 128 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/3
--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

2018-09-19 Thread Philip Zeyliger (Code Review)
Hello Bharath Vissapragada, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11468

to look at the new patch set (#2).

Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to 
Impala metrics.
..

IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics.

Following up to IMPALA-6857, it's useful for monitoring tools to see if
the pause monitor is getting triggered, and to see other GC metrics.

The Java side here, and the Thrift side, were easy enough.

However, the Impala metric implementation here caused us to call into
the frontend to read through the JMX memory beans 72 times, because each
call to GetValue() was getting all the data for the pool. This structure
made it hard to add additional, non-pool, metrics, and it felt wasteful.
To combat this, I added a cache of 10 seconds for getting the metrics
from the Frontend. The counters will typically re-use the same data.

There are five metrics here, and to avoid yet another enum class, I used
C++ lambdas to capture which field of the Thrift object I care about. If
folks like the approach, I think it can simplify way the enums for the
pool metrics as well.

I measured the cost of calling into the metrics code by
looping the metrics-gathering 100 times and looking at CPU
time for the process using this script:

  START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  for i in $(seq 100); do
curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null
  done
  END_CPU=$(  cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk 
'{ print $14 + $15 }')
  echo $START_CPU $END_CPU $(($END_CPU - $START_CPU))

On a release build on my development machine, gathering metrics 100
times took 0.16 cpu seconds without this change and 0.07 cpu seconds
with this change. The measurement accuracy here is 0.01 (I spot-checked
this with using the cpuacct cgroup infrastructure which gives you nanos,
but it was more painful to script), but this convinces me that this is a
net improvement.

TODO: Update tests to check new metrics.

Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
---
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/Frontend.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
M tests/custom_cluster/test_pause_monitor.py
A tests/infra/test_jvm_metrics.py
8 files changed, 352 insertions(+), 128 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/2
--
To view, visit http://gerrit.cloudera.org:8080/11468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2
Gerrit-Change-Number: 11468
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong