subject:"Review Request 70783\: Added debug logging for metrics which are slow to become ready."

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-12 Thread Joseph Wu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/#review215855
---


Ship it!




Refactor looks good.  One minor tweak to consider below:


3rdparty/libprocess/src/metrics/metrics.cpp
Lines 221-237 (patched)


Consider merging these two into a single block:
```
if (timeout.isNone() || timeout.get() > SLOW_METRICS_TIMEOUT) {
  ...
}
```


- Joseph Wu


On June 10, 2019, 2:49 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70783/
> ---
> 
> (Updated June 10, 2019, 2:49 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
> Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds a hard-coded timeout to the metrics snapshot
> handler which will log the names of any metrics which have not
> become ready after 10 seconds. A slight refactor of the
> snapshot code is done as well.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/include/process/metrics/metrics.hpp 
> 75711edbaf46797e5eb54ba720ea11cf3de81522 
>   3rdparty/libprocess/src/metrics/metrics.cpp 
> 623d44adbe838f995ddbe89ee26f5bcc9c600be5 
>   3rdparty/libprocess/src/tests/metrics_tests.cpp 
> 881275693e67f3c9fb670c7e70cb5014090ed7a5 
> 
> 
> Diff: https://reviews.apache.org/r/70783/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Also ran some metrics benchmark tests both before and after the change.
> 
> Before patch:
> 
> [==] Running 4 tests from 1 test case.
> [--] Global test environment set-up.
> [--] 4 tests from 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test
> [ RUN  ] 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
> Test setup: 1 agents with a total of 100 frameworks
> unversioned /metrics/snapshot' response took 144.821514ms
> v1 'master::call::GetMetrics' application/x-protobuf response took 
> 147.056543ms
> v1 'master::call::GetMetrics' application/json response took 175.9909ms
> [   OK ] 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
>  (935 ms)
> [ RUN  ] 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/1
> Test setup: 1 agents with a total of 1000 frameworks
> unversioned /metrics/snapshot' response took 1.083320021secs
> v1 'master::call::GetMetrics' application/x-protobuf response took 
> 1.105169806secs
> v1 'master::call::GetMetrics' application/json response took 1.238749012secs
> [   OK ] 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/1
>  (7044 ms)
> [ RUN  ] 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/2
> Test setup: 1 agents with a total of 1 frameworks
> unversioned /metrics/snapshot' response took 9.400786071secs
> v1 'master::call::GetMetrics' application/x-protobuf response took 
> 9.709327709secs
> v1 'master::call::GetMetrics' application/json response took 11.207594934secs
> [   OK ] 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/2
>  (62656 ms)
> [ RUN  ] 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/3
> Test setup: 1 agents with a total of 2 frameworks
> unversioned /metrics/snapshot' response took 17.926824609secs
> v1 'master::call::GetMetrics' application/x-protobuf response took 
> 19.467704754secs
> v1 'master::call::GetMetrics' application/json response took 21.32824915secs
> [   OK ] 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/3
>  (132619 ms)
> [--] 4 tests from 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test (203255 
> ms total)
> 
> [--] Global test environment tear-down
> [==] 4 tests from 1 test case ran. (203343 ms total)
> [  PASSED  ] 4 tests.
> 
> 
> After patch:
> 
> [==] Running 4 tests from 1 test case.
> [--] Global test environment set-up.
> [--] 4 tests from 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test
> [ RUN  ] 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
> Test setup: 1 agents with a total of 100 frameworks
> unversioned /metrics/snapshot' response took 143.560831ms
> v1 'master::call::GetMetrics' application/x-protobuf response took 
> 154.460223ms
> v1 'master::call::GetMetrics' application/json response took 186.55918ms
> [   OK ] 
> AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
>  (987 ms)
> [ RUN  ] 
>

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-10 Thread Greg Mann


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/
---

(Updated June 10, 2019, 9:49 a.m.)


Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
Kone.


Repository: mesos


Description (updated)
---

This patch adds a hard-coded timeout to the metrics snapshot
handler which will log the names of any metrics which have not
become ready after 10 seconds. A slight refactor of the
snapshot code is done as well.


Diffs
-

  3rdparty/libprocess/include/process/metrics/metrics.hpp 
75711edbaf46797e5eb54ba720ea11cf3de81522 
  3rdparty/libprocess/src/metrics/metrics.cpp 
623d44adbe838f995ddbe89ee26f5bcc9c600be5 
  3rdparty/libprocess/src/tests/metrics_tests.cpp 
881275693e67f3c9fb670c7e70cb5014090ed7a5 


Diff: https://reviews.apache.org/r/70783/diff/4/


Testing (updated)
---

`make check`

Also ran some metrics benchmark tests both before and after the change.

Before patch:

[==] Running 4 tests from 1 test case.
[--] Global test environment set-up.
[--] 4 tests from 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test
[ RUN  ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
Test setup: 1 agents with a total of 100 frameworks
unversioned /metrics/snapshot' response took 144.821514ms
v1 'master::call::GetMetrics' application/x-protobuf response took 147.056543ms
v1 'master::call::GetMetrics' application/json response took 175.9909ms
[   OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
 (935 ms)
[ RUN  ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/1
Test setup: 1 agents with a total of 1000 frameworks
unversioned /metrics/snapshot' response took 1.083320021secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
1.105169806secs
v1 'master::call::GetMetrics' application/json response took 1.238749012secs
[   OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/1
 (7044 ms)
[ RUN  ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/2
Test setup: 1 agents with a total of 1 frameworks
unversioned /metrics/snapshot' response took 9.400786071secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
9.709327709secs
v1 'master::call::GetMetrics' application/json response took 11.207594934secs
[   OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/2
 (62656 ms)
[ RUN  ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/3
Test setup: 1 agents with a total of 2 frameworks
unversioned /metrics/snapshot' response took 17.926824609secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
19.467704754secs
v1 'master::call::GetMetrics' application/json response took 21.32824915secs
[   OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/3
 (132619 ms)
[--] 4 tests from 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test (203255 ms 
total)

[--] Global test environment tear-down
[==] 4 tests from 1 test case ran. (203343 ms total)
[  PASSED  ] 4 tests.


After patch:

[==] Running 4 tests from 1 test case.
[--] Global test environment set-up.
[--] 4 tests from 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test
[ RUN  ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
Test setup: 1 agents with a total of 100 frameworks
unversioned /metrics/snapshot' response took 143.560831ms
v1 'master::call::GetMetrics' application/x-protobuf response took 154.460223ms
v1 'master::call::GetMetrics' application/json response took 186.55918ms
[   OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/0
 (987 ms)
[ RUN  ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/1
Test setup: 1 agents with a total of 1000 frameworks
unversioned /metrics/snapshot' response took 1.082617553secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
1.097321636secs
v1 'master::call::GetMetrics' application/json response took 1.273446816secs
[   OK ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/1
 (6996 ms)
[ RUN  ] 
AgentFrameworkTaskCountContentType/MasterMetricsQuery_BENCHMARK_Test.GetMetrics/2
Test setup: 1 agents with a total of 1 frameworks
unversioned /metrics/snapshot' response took 9.327752261secs
v1 'master::call::GetMetrics' application/x-protobuf response took 
9.944210962secs
v1 'master::call::GetMetrics' application/json response took 10.800611643secs
[   OK ]

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-05 Thread Mesos Reviewbot


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/#review215710
---



Patch looks great!

Reviews applied: [70783]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On June 5, 2019, 4:13 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70783/
> ---
> 
> (Updated June 5, 2019, 4:13 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
> Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added debug logging for metrics which are slow to become ready.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/include/process/metrics/metrics.hpp 
> 75711edbaf46797e5eb54ba720ea11cf3de81522 
>   3rdparty/libprocess/src/metrics/metrics.cpp 
> 623d44adbe838f995ddbe89ee26f5bcc9c600be5 
>   3rdparty/libprocess/src/tests/metrics_tests.cpp 
> 881275693e67f3c9fb670c7e70cb5014090ed7a5 
> 
> 
> Diff: https://reviews.apache.org/r/70783/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-05 Thread Greg Mann


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/
---

(Updated June 5, 2019, 4:13 p.m.)


Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
Kone.


Repository: mesos


Description
---

Added debug logging for metrics which are slow to become ready.


Diffs (updated)
-

  3rdparty/libprocess/include/process/metrics/metrics.hpp 
75711edbaf46797e5eb54ba720ea11cf3de81522 
  3rdparty/libprocess/src/metrics/metrics.cpp 
623d44adbe838f995ddbe89ee26f5bcc9c600be5 
  3rdparty/libprocess/src/tests/metrics_tests.cpp 
881275693e67f3c9fb670c7e70cb5014090ed7a5 


Diff: https://reviews.apache.org/r/70783/diff/4/

Changes: https://reviews.apache.org/r/70783/diff/3-4/


Testing
---


Thanks,

Greg Mann

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-05 Thread Greg Mann


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/#review215701
---




3rdparty/libprocess/src/metrics/metrics.cpp
Lines 212-226 (patched)


Whoops, just realized that I am not discarding these timers. Will update 
the patch again to do so.


- Greg Mann


On June 5, 2019, 3:39 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70783/
> ---
> 
> (Updated June 5, 2019, 3:39 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
> Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added debug logging for metrics which are slow to become ready.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/include/process/metrics/metrics.hpp 
> 75711edbaf46797e5eb54ba720ea11cf3de81522 
>   3rdparty/libprocess/src/metrics/metrics.cpp 
> 623d44adbe838f995ddbe89ee26f5bcc9c600be5 
>   3rdparty/libprocess/src/tests/metrics_tests.cpp 
> 881275693e67f3c9fb670c7e70cb5014090ed7a5 
> 
> 
> Diff: https://reviews.apache.org/r/70783/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-05 Thread Greg Mann



> On June 5, 2019, 1:05 a.m., Benjamin Mahler wrote:
> > 3rdparty/libprocess/src/metrics/metrics.cpp
> > Lines 199-207 (patched)
> > 
> >
> > If we consolidate the logging between user timeout and slow timeout can 
> > we have one path?
> > 
> > ```
> >   Future waited =  > timeout>;
> > 
> >   return waited
> > .then(defer(self(), ::_snapshotHandler, ...));
> > }
> >   
> > _snapshotHandler(...)
> > {
> >   // LOG(INFO) not ready futures after timeout (user or built-in SLOW 
> > one)
> >   ...
> > 
> >   Future waited = ;
> > 
> >   return waited
> > .then(defer(self(), ::__snapshotHandler, ...));
> > }
> >   
> > __snapshotHandler(...)
> > {
> >   ...
> >   return response;
> > }
> > ```
> > 
> > We would log those not ready after either the user provided one, or the 
> > built-in slow one, whichever is smaller (I guess we can use a min() for 
> > that too if we want). The downside here seems to be that if a user 
> > specifies a small one and still lots of pull gauges are used, will lead to 
> > a lot of INFO logging. Not sure how much we should worry about this case.
> > 
> > We should probably name the endpoint handler functions and snapshot map 
> > function differently or house the handler ones in an 'Http' sub-struct, a 
> > bit confusing right now.

I had considered a similar option previously but ended up opting for a more 
minimal diff. I wrote a new solution along these lines and although the patch 
is now much more complex, I think the end result is cleaner. The complexity 
largely arises from the fact that if the user specifies a timeout longer than 
the hard-coded timeout, then we can potentially execute the logging code twice. 
Let me know what you think.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/#review215693
---


On June 5, 2019, 3:39 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70783/
> ---
> 
> (Updated June 5, 2019, 3:39 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
> Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added debug logging for metrics which are slow to become ready.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/include/process/metrics/metrics.hpp 
> 75711edbaf46797e5eb54ba720ea11cf3de81522 
>   3rdparty/libprocess/src/metrics/metrics.cpp 
> 623d44adbe838f995ddbe89ee26f5bcc9c600be5 
>   3rdparty/libprocess/src/tests/metrics_tests.cpp 
> 881275693e67f3c9fb670c7e70cb5014090ed7a5 
> 
> 
> Diff: https://reviews.apache.org/r/70783/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-05 Thread Greg Mann


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/
---

(Updated June 5, 2019, 3:39 p.m.)


Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
Kone.


Repository: mesos


Description
---

Added debug logging for metrics which are slow to become ready.


Diffs (updated)
-

  3rdparty/libprocess/include/process/metrics/metrics.hpp 
75711edbaf46797e5eb54ba720ea11cf3de81522 
  3rdparty/libprocess/src/metrics/metrics.cpp 
623d44adbe838f995ddbe89ee26f5bcc9c600be5 
  3rdparty/libprocess/src/tests/metrics_tests.cpp 
881275693e67f3c9fb670c7e70cb5014090ed7a5 


Diff: https://reviews.apache.org/r/70783/diff/3/

Changes: https://reviews.apache.org/r/70783/diff/2-3/


Testing
---


Thanks,

Greg Mann

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-05 Thread Greg Mann



> On June 4, 2019, 8:42 p.m., Joseph Wu wrote:
> > I think it is a little unfortunate that we end up with a conditional 
> > `select()` and an extra chaining step in this endpoint.  But since we 
> > aren't saving the vector of keys/futures/statistics anywhere, the chaining 
> > is mostly unavoidable.
> > 
> > Since this adds some significant logic, I would like there to be at least 
> > an addition to the tests.  We can't read the resulting log line, but we can 
> > at least exercise the code.  For example, copying this chunk of existing 
> > test code and increasing the in-test timeout to 10+ seconds:
> > 
> > ```
> > TEST_F(MetricsTest, THREADSAFE_SnapshotTimeout)
> > {
> >   ...
> >   
> >   // Get the snapshot.
> >   Future response = http::get(upid, "snapshot", "timeout=2secs"); 
> > // <- Increase this timeout.
> > 
> >   // Make sure the request is pending before the timeout is exceeded.
> >   //
> >   // TODO(neilc): Replace the `sleep` here with a less flaky
> >   // synchronization method.
> >   os::sleep(Milliseconds(10));
> >   Clock::settle();
> > 
> >   ASSERT_TRUE(response.isPending());
> > 
> >   // Advance the clock to trigger the timeout.
> >   Clock::advance(Seconds(2)); // <- And this 
> > timeout.
> > 
> >   AWAIT_EXPECT_RESPONSE_STATUS_EQ(OK().status, response);
> > 
> >   // Parse the response.
> >   Try responseJSON = 
> > JSON::parse(response->body);
> >   ASSERT_SOME(responseJSON);
> > 
> >   // We can't use simple JSON equality testing here as initializing
> >   // libprocess adds metrics to the system. We want to only check if
> >   // the metrics from this test are correctly handled.
> >   map values = responseJSON->values;
> > 
> >   EXPECT_EQ(1u, values.count("test/counter"));
> >   EXPECT_DOUBLE_EQ(0.0, 
> > values["test/counter"].as().as());
> > 
> >   EXPECT_EQ(1u, values.count("test/gauge"));
> >   EXPECT_DOUBLE_EQ(42.0, 
> > values["test/gauge"].as().as());
> > 
> >   EXPECT_EQ(0u, values.count("test/gauge_fail"));
> >   EXPECT_EQ(0u, values.count("test/gauge_timeout"));
> > ```

Good call, I updated the test.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/#review215689
---


On June 5, 2019, 3:39 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70783/
> ---
> 
> (Updated June 5, 2019, 3:39 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
> Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added debug logging for metrics which are slow to become ready.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/include/process/metrics/metrics.hpp 
> 75711edbaf46797e5eb54ba720ea11cf3de81522 
>   3rdparty/libprocess/src/metrics/metrics.cpp 
> 623d44adbe838f995ddbe89ee26f5bcc9c600be5 
>   3rdparty/libprocess/src/tests/metrics_tests.cpp 
> 881275693e67f3c9fb670c7e70cb5014090ed7a5 
> 
> 
> Diff: https://reviews.apache.org/r/70783/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-05 Thread Greg Mann


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/
---

(Updated June 5, 2019, 3:37 p.m.)


Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
Kone.


Repository: mesos


Description (updated)
---

Added debug logging for metrics which are slow to become ready.


Diffs (updated)
-

  3rdparty/libprocess/include/process/metrics/metrics.hpp 
75711edbaf46797e5eb54ba720ea11cf3de81522 
  3rdparty/libprocess/src/metrics/metrics.cpp 
623d44adbe838f995ddbe89ee26f5bcc9c600be5 
  3rdparty/libprocess/src/tests/metrics_tests.cpp 
881275693e67f3c9fb670c7e70cb5014090ed7a5 


Diff: https://reviews.apache.org/r/70783/diff/2/

Changes: https://reviews.apache.org/r/70783/diff/1-2/


Testing
---


Thanks,

Greg Mann

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-04 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/#review215693
---

Looks good, but I'm also wondering if there's a cleaner way to implement (see 
comment below).

Can you also include the impact this has on the benchmark (optimized build)?

3rdparty/libprocess/src/metrics/metrics.cpp
Lines 199-207 (patched)

If we consolidate the logging between user timeout and slow timeout can we 
have one path?

```
  Future waited = ;

  return waited
.then(defer(self(), ::_snapshotHandler, ...));
}

_snapshotHandler(...)
{
  // LOG(INFO) not ready futures after timeout (user or built-in SLOW one)
  ...

  Future waited = ;

  return waited
.then(defer(self(), ::__snapshotHandler, ...));
}

__snapshotHandler(...)
{
  ...
  return response;
}
```

We would log those not ready after either the user provided one, or the 
built-in slow one, whichever is smaller (I guess we can use a min() for that 
too if we want). The downside here seems to be that if a user specifies a small 
one and still lots of pull gauges are used, will lead to a lot of INFO logging. 
Not sure how much we should worry about this case.

We should probably name the endpoint handler functions and snapshot map 
function differently or house the handler ones in an 'Http' sub-struct, a bit 
confusing right now.

3rdparty/libprocess/src/metrics/metrics.cpp
Lines 241-242 (original), 261-262 (patched)

It's odd to have this similar logging be inconsistent (i.e. VLOG(1) intead 
of INFO and 1 per line instead of all in one line).

3rdparty/libprocess/src/metrics/metrics.cpp
Lines 303-304 (patched)

Do we need these variables? Seems better without them

- Benjamin Mahler

On June 4, 2019, 2:54 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70783/
> ---
> 
> (Updated June 4, 2019, 2:54 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
> Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds logging which, after a hard-coded delay, prints
> the keys of any metrics which have not yet become ready
> following a request for the metrics snapshot.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/include/process/metrics/metrics.hpp 
> 75711edbaf46797e5eb54ba720ea11cf3de81522 
>   3rdparty/libprocess/src/metrics/metrics.cpp 
> 623d44adbe838f995ddbe89ee26f5bcc9c600be5 
> 
> 
> Diff: https://reviews.apache.org/r/70783/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-04 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/#review215689
---

I think it is a little unfortunate that we end up with a conditional `select()` 
and an extra chaining step in this endpoint.  But since we aren't saving the 
vector of keys/futures/statistics anywhere, the chaining is mostly unavoidable.

Since this adds some significant logic, I would like there to be at least an 
addition to the tests.  We can't read the resulting log line, but we can at 
least exercise the code.  For example, copying this chunk of existing test code 
and increasing the in-test timeout to 10+ seconds:

```
TEST_F(MetricsTest, THREADSAFE_SnapshotTimeout)
{
  ...

  // Get the snapshot.
  Future response = http::get(upid, "snapshot", "timeout=2secs"); // 
<- Increase this timeout.

  // Make sure the request is pending before the timeout is exceeded.
  //
  // TODO(neilc): Replace the `sleep` here with a less flaky
  // synchronization method.
  os::sleep(Milliseconds(10));
  Clock::settle();

  ASSERT_TRUE(response.isPending());

  // Advance the clock to trigger the timeout.
  Clock::advance(Seconds(2)); // <- And this 
timeout.

  AWAIT_EXPECT_RESPONSE_STATUS_EQ(OK().status, response);

  // Parse the response.
  Try responseJSON = JSON::parse(response->body);
  ASSERT_SOME(responseJSON);

  // We can't use simple JSON equality testing here as initializing
  // libprocess adds metrics to the system. We want to only check if
  // the metrics from this test are correctly handled.
  map values = responseJSON->values;

  EXPECT_EQ(1u, values.count("test/counter"));
  EXPECT_DOUBLE_EQ(0.0, values["test/counter"].as().as());

  EXPECT_EQ(1u, values.count("test/gauge"));
  EXPECT_DOUBLE_EQ(42.0, values["test/gauge"].as().as());

  EXPECT_EQ(0u, values.count("test/gauge_fail"));
  EXPECT_EQ(0u, values.count("test/gauge_timeout"));
```

3rdparty/libprocess/src/metrics/metrics.cpp
Lines 40 (patched)

It would be nice to mention here (or in the `logSlowMetrics()` header 
definition + commit description) that this timeout does not cancel snapshotting 
of any metric.  It merely adds an informative log line, before continuing with 
snapshotting as normal.

- Joseph Wu

On June 4, 2019, 7:54 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70783/
> ---
> 
> (Updated June 4, 2019, 7:54 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
> Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds logging which, after a hard-coded delay, prints
> the keys of any metrics which have not yet become ready
> following a request for the metrics snapshot.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/include/process/metrics/metrics.hpp 
> 75711edbaf46797e5eb54ba720ea11cf3de81522 
>   3rdparty/libprocess/src/metrics/metrics.cpp 
> 623d44adbe838f995ddbe89ee26f5bcc9c600be5 
> 
> 
> Diff: https://reviews.apache.org/r/70783/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>

Review Request 70783: Added debug logging for metrics which are slow to become ready.

2019-06-04 Thread Greg Mann


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70783/
---

Review request for mesos, Benno Evers, Benjamin Mahler, Joseph Wu, and Vinod 
Kone.


Repository: mesos


Description
---

This patch adds logging which, after a hard-coded delay, prints
the keys of any metrics which have not yet become ready
following a request for the metrics snapshot.


Diffs
-

  3rdparty/libprocess/include/process/metrics/metrics.hpp 
75711edbaf46797e5eb54ba720ea11cf3de81522 
  3rdparty/libprocess/src/metrics/metrics.cpp 
623d44adbe838f995ddbe89ee26f5bcc9c600be5 


Diff: https://reviews.apache.org/r/70783/diff/1/


Testing
---


Thanks,

Greg Mann

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Re: Review Request 70783: Added debug logging for metrics which are slow to become ready.

Review Request 70783: Added debug logging for metrics which are slow to become ready.

12 matches

Site Navigation

Mail list logo

Footer information