Re: Review Request 53840: Metric in the allocator to track latency in running allocations.

2017-05-23 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53840/#review175817
---


Ship it!




I'll commit with these adjustments.


docs/monitoring.md
Lines 1053 (patched)


s/algorithm/batch/



docs/monitoring.md
Lines 1053 (patched)


s/algorithm/batch/



docs/monitoring.md
Lines 1058 (patched)


Having all these lines for each timer feels like too much repetition but 
alas, it's the pattern right now and let's just maintain consistency and clean 
up later.



src/tests/hierarchical_allocator_tests.cpp
Lines 3578-3581 (original), 3578-3581 (patched)


I kept the comment but do see the need to move the assignment.



src/tests/hierarchical_allocator_tests.cpp
Lines 3673 (patched)


I simplified the test a bit. Until an abstraction is provided, tests tend 
to be copied so let's keep things as simple as possible.

```
TEST_F_TEMP_DISABLED_ON_WINDOWS(
HierarchicalAllocatorTest,
AllocationRunLatencyMetrics)
{
  Clock::pause();

  initialize();

  // These time series statistics will be generated
  // once at least 2 allocation runs occur.
  auto statistics = {
"allocator/mesos/allocation_run_latency_ms/count",
"allocator/mesos/allocation_run_latency_ms/min",
"allocator/mesos/allocation_run_latency_ms/max",
"allocator/mesos/allocation_run_latency_ms/p50",
"allocator/mesos/allocation_run_latency_ms/p95",
"allocator/mesos/allocation_run_latency_ms/p99",
"allocator/mesos/allocation_run_latency_ms/p999",
"allocator/mesos/allocation_run_latency_ms/p",
  };

  JSON::Object metrics = Metrics();
  map values = metrics.values;

  EXPECT_EQ(0u, values.count("allocator/mesos/allocation_run_latency_ms"));

  // No allocation latency statistics should appear.
  foreach (const string& statistic, statistics) {
EXPECT_EQ(0u, values.count(statistic))
  << "Expected " << statistic << " to be absent";
  }

  // Allow the allocation timer to measure time.
  Clock::resume();

  // Trigger at least two calls to allocate to generate the window 
statistics.
  SlaveInfo agent = createSlaveInfo("cpus:2;mem:1024;disk:0");
  allocator->addSlave(
  agent.id(),
  agent,
  AGENT_CAPABILITIES(),
  None(),
  agent.resources(),
  {});

  // Wait for the allocation triggered by `addSlave()` to complete.
  Clock::pause();
  Clock::settle();
  Clock::resume();

  FrameworkInfo framework = createFrameworkInfo({"role1"});
  allocator->addFramework(framework.id(), framework, {}, true);

  // Wait for the allocation triggered by `addFramework()` to complete.
  AWAIT_READY(allocations.get());

  metrics = Metrics();
  values = metrics.values;

  // A non-zero measurement should be present.
  EXPECT_EQ(1u, values.count("allocator/mesos/allocation_run_latency_ms"));

  JSON::Value value =
metrics.values["allocator/mesos/allocation_run_latency_ms"];
  ASSERT_TRUE(value.is()) << value.which();

  JSON::Number timing = value.as();
  ASSERT_EQ(JSON::Number::FLOATING, timing.type);
  EXPECT_GE(timing.as(), 0.0);

  // The statistics should be generated.
  foreach (const string& statistic, statistics) {
EXPECT_EQ(1u, values.count(statistic))
  << "Expected " << statistic << " to be present";
  }
}

```


- Jiang Yan Xu


On May 23, 2017, 9:30 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53840/
> ---
> 
> (Updated May 23, 2017, 9:30 a.m.)
> 
> 
> Review request for mesos, James Peach and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6579
> https://issues.apache.org/jira/browse/MESOS-6579
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a metric 'allocator/mesos/allocation_run_interval_ms' which
> tracks the latency between scheduling an allocation to an actual
> allocation run.
> 
> 
> Diffs
> -
> 
>   docs/monitoring.md a027f4905a0e6e41ff4e1348d34fd7aa5f1cbe61 
>   src/master/allocator/mesos/hierarchical.cpp 
> b75ed9a20a9a42f958cebbacd91e5e15b0d21394 
>   src/master/allocator/mesos/metrics.hpp 
> ce486eb079dc44bf0bbfad8668426c7ae87c97b3 
>   src/master/allocator/mesos/metrics.cpp 
> ec987fe0ce8605bf9e7c0ccbe3ac384deaf53be1 
>   src/

Re: Review Request 53840: Metric in the allocator to track latency in running allocations.

2017-05-23 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53840/
---

(Updated May 23, 2017, 4:30 p.m.)


Review request for mesos, James Peach and Jiang Yan Xu.


Changes
---

Rebase.


Bugs: MESOS-6579
https://issues.apache.org/jira/browse/MESOS-6579


Repository: mesos


Description
---

Added a metric 'allocator/mesos/allocation_run_interval_ms' which
tracks the latency between scheduling an allocation to an actual
allocation run.


Diffs (updated)
-

  docs/monitoring.md a027f4905a0e6e41ff4e1348d34fd7aa5f1cbe61 
  src/master/allocator/mesos/hierarchical.cpp 
b75ed9a20a9a42f958cebbacd91e5e15b0d21394 
  src/master/allocator/mesos/metrics.hpp 
ce486eb079dc44bf0bbfad8668426c7ae87c97b3 
  src/master/allocator/mesos/metrics.cpp 
ec987fe0ce8605bf9e7c0ccbe3ac384deaf53be1 
  src/tests/hierarchical_allocator_tests.cpp 
6dee2296d5a14185dbf7eee17968b20148839bfd 


Diff: https://reviews.apache.org/r/53840/diff/7/

Changes: https://reviews.apache.org/r/53840/diff/6-7/


Testing
---

All tests passed.


Thanks,

Anindya Sinha



Re: Review Request 53840: Metric in the allocator to track latency in running allocations.

2017-05-16 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53840/
---

(Updated May 16, 2017, 7:22 a.m.)


Review request for mesos, James Peach and Jiang Yan Xu.


Changes
---

Updated docs.


Bugs: MESOS-6579
https://issues.apache.org/jira/browse/MESOS-6579


Repository: mesos


Description
---

Added a metric 'allocator/mesos/allocation_run_interval_ms' which
tracks the latency between scheduling an allocation to an actual
allocation run.


Diffs (updated)
-

  docs/monitoring.md c42afce0a26dc15e033ee1375b7cb23d55be40ab 
  src/master/allocator/mesos/hierarchical.cpp 
b75ed9a20a9a42f958cebbacd91e5e15b0d21394 
  src/master/allocator/mesos/metrics.hpp 
ce486eb079dc44bf0bbfad8668426c7ae87c97b3 
  src/master/allocator/mesos/metrics.cpp 
ec987fe0ce8605bf9e7c0ccbe3ac384deaf53be1 
  src/tests/hierarchical_allocator_tests.cpp 
08180b9975869de328f0c095dd3cddf0c84fbecf 


Diff: https://reviews.apache.org/r/53840/diff/6/

Changes: https://reviews.apache.org/r/53840/diff/5-6/


Testing
---

All tests passed.


Thanks,

Anindya Sinha



Re: Review Request 53840: Metric in the allocator to track latency in running allocations.

2017-05-15 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53840/
---

(Updated May 16, 2017, 6:23 a.m.)


Review request for mesos, James Peach and Jiang Yan Xu.


Changes
---

Addressed review comments.


Bugs: MESOS-6579
https://issues.apache.org/jira/browse/MESOS-6579


Repository: mesos


Description
---

Added a metric 'allocator/mesos/allocation_run_interval_ms' which
tracks the latency between scheduling an allocation to an actual
allocation run.


Diffs (updated)
-

  src/master/allocator/mesos/hierarchical.cpp 
b75ed9a20a9a42f958cebbacd91e5e15b0d21394 
  src/master/allocator/mesos/metrics.hpp 
ce486eb079dc44bf0bbfad8668426c7ae87c97b3 
  src/master/allocator/mesos/metrics.cpp 
ec987fe0ce8605bf9e7c0ccbe3ac384deaf53be1 
  src/tests/hierarchical_allocator_tests.cpp 
08180b9975869de328f0c095dd3cddf0c84fbecf 


Diff: https://reviews.apache.org/r/53840/diff/5/

Changes: https://reviews.apache.org/r/53840/diff/4-5/


Testing
---

All tests passed.


Thanks,

Anindya Sinha



Re: Review Request 53840: Metric in the allocator to track latency in running allocations.

2017-05-15 Thread Anindya Sinha


> On May 15, 2017, 5:25 a.m., Jiang Yan Xu wrote:
> > src/tests/hierarchical_allocator_tests.cpp
> > Lines 3459-3463 (original), 3459-3462 (patched)
> > 
> >
> > Why this change? Seems like the two ways of expressing the expectation 
> > are equivalent.

Reinstated the original.


> On May 15, 2017, 5:25 a.m., Jiang Yan Xu wrote:
> > src/tests/hierarchical_allocator_tests.cpp
> > Lines 3537 (patched)
> > 
> >
> > Seems inelegant to reuse the test for `allocation_run`, for 
> > `allocation_run` we are storing a full list of statistics in `statistics` 
> > but we are not doing the same for the new metric. Can we create an 
> > independent test?

Created a separate test for `allocation_run_latency`.


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53840/#review174871
---


On May 16, 2017, 6:23 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53840/
> ---
> 
> (Updated May 16, 2017, 6:23 a.m.)
> 
> 
> Review request for mesos, James Peach and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6579
> https://issues.apache.org/jira/browse/MESOS-6579
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a metric 'allocator/mesos/allocation_run_interval_ms' which
> tracks the latency between scheduling an allocation to an actual
> allocation run.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> b75ed9a20a9a42f958cebbacd91e5e15b0d21394 
>   src/master/allocator/mesos/metrics.hpp 
> ce486eb079dc44bf0bbfad8668426c7ae87c97b3 
>   src/master/allocator/mesos/metrics.cpp 
> ec987fe0ce8605bf9e7c0ccbe3ac384deaf53be1 
>   src/tests/hierarchical_allocator_tests.cpp 
> 08180b9975869de328f0c095dd3cddf0c84fbecf 
> 
> 
> Diff: https://reviews.apache.org/r/53840/diff/5/
> 
> 
> Testing
> ---
> 
> All tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 53840: Metric in the allocator to track latency in running allocations.

2017-05-15 Thread Anindya Sinha


> On April 25, 2017, 4:41 p.m., James Peach wrote:
> > src/master/allocator/mesos/hierarchical.cpp
> > Line 1427 (original), 1428 (patched)
> > 
> >
> > You also need to stop the latency metric here, otherwise you will 
> > capture multiple allocator dispatches.

As Yan mentioned, I stop the `allocation_run_latency` as the first thing done 
in this function.


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53840/#review172945
---


On May 16, 2017, 6:23 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53840/
> ---
> 
> (Updated May 16, 2017, 6:23 a.m.)
> 
> 
> Review request for mesos, James Peach and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6579
> https://issues.apache.org/jira/browse/MESOS-6579
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a metric 'allocator/mesos/allocation_run_interval_ms' which
> tracks the latency between scheduling an allocation to an actual
> allocation run.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> b75ed9a20a9a42f958cebbacd91e5e15b0d21394 
>   src/master/allocator/mesos/metrics.hpp 
> ce486eb079dc44bf0bbfad8668426c7ae87c97b3 
>   src/master/allocator/mesos/metrics.cpp 
> ec987fe0ce8605bf9e7c0ccbe3ac384deaf53be1 
>   src/tests/hierarchical_allocator_tests.cpp 
> 08180b9975869de328f0c095dd3cddf0c84fbecf 
> 
> 
> Diff: https://reviews.apache.org/r/53840/diff/5/
> 
> 
> Testing
> ---
> 
> All tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 53840: Metric in the allocator to track latency in running allocations.

2017-05-14 Thread Jiang Yan Xu


> On April 25, 2017, 9:41 a.m., James Peach wrote:
> > src/master/allocator/mesos/metrics.hpp
> > Lines 73 (patched)
> > 
> >
> > Since this metric is measuring the latency from when the allocation was 
> > dispatched to when it executed, how about calling it 
> > `allocation_run_latency`? I find "interval" misleading since it is not the 
> > interval between runs.

+1 on `allocation_run_latency`. A short description could be 

```
// The latency of allocation runs due to the batching of allocation requests.
```


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53840/#review172945
---


On April 21, 2017, 10:53 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53840/
> ---
> 
> (Updated April 21, 2017, 10:53 a.m.)
> 
> 
> Review request for mesos, James Peach and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6579
> https://issues.apache.org/jira/browse/MESOS-6579
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a metric 'allocator/mesos/allocation_run_interval_ms' which
> tracks the latency between scheduling an allocation to an actual
> allocation run.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> 051f749dd5921a322ca930a042c31814616d38f9 
>   src/master/allocator/mesos/metrics.hpp 
> ce486eb079dc44bf0bbfad8668426c7ae87c97b3 
>   src/master/allocator/mesos/metrics.cpp 
> ec987fe0ce8605bf9e7c0ccbe3ac384deaf53be1 
>   src/tests/hierarchical_allocator_tests.cpp 
> 33e7b455f8664858eb4f03727b076a10c80cd6e0 
> 
> 
> Diff: https://reviews.apache.org/r/53840/diff/4/
> 
> 
> Testing
> ---
> 
> All tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 53840: Metric in the allocator to track latency in running allocations.

2017-05-14 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53840/#review174871
---




src/master/allocator/mesos/hierarchical.cpp
Lines 1425-1435 (original), 1426-1437 (patched)


As James commented, we should also stop the timer even if we return early 
but instead of adding it above `return Nothing()`, it seems appropriate to just 
use `metrics.allocation_run_interval.stop()` as the first statement of the 
method to mark the end of the batching delay.



src/tests/hierarchical_allocator_tests.cpp
Lines 3459-3463 (original), 3459-3462 (patched)


Why this change? Seems like the two ways of expressing the expectation are 
equivalent.



src/tests/hierarchical_allocator_tests.cpp
Lines 3537 (patched)


Seems inelegant to reuse the test for `allocation_run`, for 
`allocation_run` we are storing a full list of statistics in `statistics` but 
we are not doing the same for the new metric. Can we create an independent test?



src/tests/hierarchical_allocator_tests.cpp
Line 3544 (original), 3544 (patched)


Can it realistically be 0.0?


- Jiang Yan Xu


On April 21, 2017, 10:53 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53840/
> ---
> 
> (Updated April 21, 2017, 10:53 a.m.)
> 
> 
> Review request for mesos, James Peach and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6579
> https://issues.apache.org/jira/browse/MESOS-6579
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a metric 'allocator/mesos/allocation_run_interval_ms' which
> tracks the latency between scheduling an allocation to an actual
> allocation run.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> 051f749dd5921a322ca930a042c31814616d38f9 
>   src/master/allocator/mesos/metrics.hpp 
> ce486eb079dc44bf0bbfad8668426c7ae87c97b3 
>   src/master/allocator/mesos/metrics.cpp 
> ec987fe0ce8605bf9e7c0ccbe3ac384deaf53be1 
>   src/tests/hierarchical_allocator_tests.cpp 
> 33e7b455f8664858eb4f03727b076a10c80cd6e0 
> 
> 
> Diff: https://reviews.apache.org/r/53840/diff/4/
> 
> 
> Testing
> ---
> 
> All tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 53840: Metric in the allocator to track latency in running allocations.

2017-04-25 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53840/#review172945
---



I would also like to see some documentation of this metric in 
`docs/monitoring.md`.


src/master/allocator/mesos/hierarchical.cpp
Line 1427 (original), 1428 (patched)


You also need to stop the latency metric here, otherwise you will capture 
multiple allocator dispatches.



src/master/allocator/mesos/metrics.hpp
Line 69 (original), 69 (patched)


I think we should fix this comment. This measures the time spent in the 
allocation algorithm.



src/master/allocator/mesos/metrics.hpp
Lines 73 (patched)


Since this metric is measuring the latency from when the allocation was 
dispatched to when it executed, how about calling it `allocation_run_latency`? 
I find "interval" misleading since it is not the interval between runs.


- James Peach


On April 21, 2017, 5:53 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53840/
> ---
> 
> (Updated April 21, 2017, 5:53 p.m.)
> 
> 
> Review request for mesos, James Peach and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6579
> https://issues.apache.org/jira/browse/MESOS-6579
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a metric 'allocator/mesos/allocation_run_interval_ms' which
> tracks the latency between scheduling an allocation to an actual
> allocation run.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> 051f749dd5921a322ca930a042c31814616d38f9 
>   src/master/allocator/mesos/metrics.hpp 
> ce486eb079dc44bf0bbfad8668426c7ae87c97b3 
>   src/master/allocator/mesos/metrics.cpp 
> ec987fe0ce8605bf9e7c0ccbe3ac384deaf53be1 
>   src/tests/hierarchical_allocator_tests.cpp 
> 33e7b455f8664858eb4f03727b076a10c80cd6e0 
> 
> 
> Diff: https://reviews.apache.org/r/53840/diff/4/
> 
> 
> Testing
> ---
> 
> All tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 53840: Metric in the allocator to track latency in running allocations.

2017-04-21 Thread Anindya Sinha


> On April 20, 2017, 6:04 p.m., James Peach wrote:
> > If I understand this correctly, this is publishing a time series of the 
> > idle time between allocator runs. Is that correct? If so, what insight does 
> > this metric give and how should operators interpret it? Why is a simple 
> > counter of allocator runs not sufficient?
> 
> Anindya Sinha wrote:
> So as to determine how frequently (or infrequently) allocation cycles are 
> being run by the allocator. The counters provide the current count of 
> allocation runs.
> 
> James Peach wrote:
> As we discussed with @xujyan, this is intended to measure the queueing 
> delay for allocation dispatches. I'm still not convinced that this is a 
> helpful metric. As implemented, this appears to measure the time between 
> allocation intervals. I guess you would start the timer in 
> `HierarchicalAllocatorProcess::allocate` and stop it in 
> `HierarchicalAllocatorProcess::_allocate`.

Yes, that is right. Updated the patch.


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53840/#review172526
---


On April 21, 2017, 5:53 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53840/
> ---
> 
> (Updated April 21, 2017, 5:53 p.m.)
> 
> 
> Review request for mesos, James Peach and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6579
> https://issues.apache.org/jira/browse/MESOS-6579
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a metric 'allocator/mesos/allocation_run_interval_ms' which
> tracks the latency between scheduling an allocation to an actual
> allocation run.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> 051f749dd5921a322ca930a042c31814616d38f9 
>   src/master/allocator/mesos/metrics.hpp 
> ce486eb079dc44bf0bbfad8668426c7ae87c97b3 
>   src/master/allocator/mesos/metrics.cpp 
> ec987fe0ce8605bf9e7c0ccbe3ac384deaf53be1 
>   src/tests/hierarchical_allocator_tests.cpp 
> 33e7b455f8664858eb4f03727b076a10c80cd6e0 
> 
> 
> Diff: https://reviews.apache.org/r/53840/diff/4/
> 
> 
> Testing
> ---
> 
> All tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 53840: Metric in the allocator to track latency in running allocations.

2017-04-21 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53840/
---

(Updated April 21, 2017, 5:53 p.m.)


Review request for mesos, James Peach and Jiang Yan Xu.


Summary (updated)
-

Metric in the allocator to track latency in running allocations.


Bugs: MESOS-6579
https://issues.apache.org/jira/browse/MESOS-6579


Repository: mesos


Description (updated)
---

Added a metric 'allocator/mesos/allocation_run_interval_ms' which
tracks the latency between scheduling an allocation to an actual
allocation run.


Diffs (updated)
-

  src/master/allocator/mesos/hierarchical.cpp 
051f749dd5921a322ca930a042c31814616d38f9 
  src/master/allocator/mesos/metrics.hpp 
ce486eb079dc44bf0bbfad8668426c7ae87c97b3 
  src/master/allocator/mesos/metrics.cpp 
ec987fe0ce8605bf9e7c0ccbe3ac384deaf53be1 
  src/tests/hierarchical_allocator_tests.cpp 
33e7b455f8664858eb4f03727b076a10c80cd6e0 


Diff: https://reviews.apache.org/r/53840/diff/4/

Changes: https://reviews.apache.org/r/53840/diff/3-4/


Testing
---

All tests passed.


Thanks,

Anindya Sinha