Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Greg Mann Mon, 10 Jun 2019 02:50:00 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/
-----------------------------------------------------------


(Updated June 10, 2019, 9:49 a.m.)


Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
Kone.


Repository: mesos


Description (updated)
-------

This patch adds a hard-coded timeout to the metrics snapshot
handler which will log the names of any metrics which have not
become ready after 10 seconds. A slight refactor of the
snapshot code is done as well.


Diffs
-----

  3rdparty/libprocess/include/process/metrics/metrics.hpp 
75711edbaf46797e5eb54ba720ea11cf3de81522 
  3rdparty/libprocess/src/metrics/metrics.cpp 
623d44adbe838f995ddbe89ee26f5bcc9c600be5 
  3rdparty/libprocess/src/tests/metrics_tests.cpp 
881275693e67f3c9fb670c7e70cb5014090ed7a5 


Diff: https://reviews.apache.org/r/70783/diff/4/


Testing (updated)
-------

`make check`

Also ran some metrics benchmark tests both before and after the change.

Before patch:

[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test
[ RUN      ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
Test setup: 1 agents with a total of 100 frameworks
unversioned /metrics/snapshot' response took 144.821514ms
v1 'master::call::GetMetrics' application/x-protobuf response took 147.056543ms
v1 'master::call::GetMetrics' application/json response took 175.9909ms
[       OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
 (935 ms)
[ RUN      ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/1
Test setup: 1 agents with a total of 1000 frameworks
unversioned /metrics/snapshot' response took 1.083320021secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
1.105169806secs
v1 'master::call::GetMetrics' application/json response took 1.238749012secs
[       OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/1
 (7044 ms)
[ RUN      ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/2
Test setup: 1 agents with a total of 10000 frameworks
unversioned /metrics/snapshot' response took 9.400786071secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
9.709327709secs
v1 'master::call::GetMetrics' application/json response took 11.207594934secs
[       OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/2
 (62656 ms)
[ RUN      ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/3
Test setup: 1 agents with a total of 20000 frameworks
unversioned /metrics/snapshot' response took 17.926824609secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
19.467704754secs
v1 'master::call::GetMetrics' application/json response took 21.32824915secs
[       OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/3
 (132619 ms)
[----------] 4 tests from 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test (203255 ms 
total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (203343 ms total)
[  PASSED  ] 4 tests.


After patch:

[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test
[ RUN      ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
Test setup: 1 agents with a total of 100 frameworks
unversioned /metrics/snapshot' response took 143.560831ms
v1 'master::call::GetMetrics' application/x-protobuf response took 154.460223ms
v1 'master::call::GetMetrics' application/json response took 186.55918ms
[       OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
 (987 ms)
[ RUN      ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/1
Test setup: 1 agents with a total of 1000 frameworks
unversioned /metrics/snapshot' response took 1.082617553secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
1.097321636secs
v1 'master::call::GetMetrics' application/json response took 1.273446816secs
[       OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/1
 (6996 ms)
[ RUN      ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/2
Test setup: 1 agents with a total of 10000 frameworks
unversioned /metrics/snapshot' response took 9.327752261secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
9.944210962secs
v1 'master::call::GetMetrics' application/json response took 10.800611643secs
[       OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/2
 (61873 ms)
[ RUN      ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/3
Test setup: 1 agents with a total of 20000 frameworks
unversioned /metrics/snapshot' response took 17.728278409secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
19.944552117secs
v1 'master::call::GetMetrics' application/json response took 22.376012355secs
[       OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/3
 (132274 ms)
[----------] 4 tests from 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test (202132 ms 
total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (202272 ms total)
[  PASSED  ] 4 tests.


Thanks,

Greg Mann

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Reply via email to