Re: Review Request 45534: Added per-role and quota share metrics to the DRFSorter.

2016-04-07 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45534/#review127721
---



High level thought is that we should start with just the shares within the role 
sorter, since these are the easiest to understand. After Vinod and I talked, we 
realized the "quota" shares were difficult to reason about, so let's not 
include those for now (one thing at a time).

I also would suggest the following naming scheme:

```
allocator/mesos/roles//shares/dominant
```

Since we may want additional shares to be exposed:

```
allocator/mesos/roles//shares/dominant: 0.5
allocator/mesos/roles//shares/cpus: 0.5
allocator/mesos/roles//shares/mem: 0.2
allocator/mesos/roles//shares/disk: 0.1
```


docs/monitoring.md (lines 967 - 973)


Let's not include this one for now, I chatted with vinod about this one to 
get some more thoughts and we agree this is pretty difficult to explain to the 
users and likely the most valuable thing to expose for now is the dominant 
shares for the normal sorter.



src/master/allocator/sorter/drf/metrics.hpp (line 59)


```
// Dominant share of each client.
hashmap dominantShares;
```



src/master/allocator/sorter/sorter.hpp (lines 46 - 52)


Why a factory rather than just a string prefix? How about:

```
  // Provides the allocator's execution context (via a UPID)
  // and a name prefix in order to support metrics within the
  // sorter implementation.
  explicit Sorter(
  const process::UPID& allocator,
  const std::string& metricsPrefix) {}
```



src/tests/hierarchical_allocator_tests.cpp (lines 2748 - 2752)


How about:

```
// Verifies that per-role dominant share metrics are correctly reported.
```



src/tests/hierarchical_allocator_tests.cpp (lines 2762 - 2764)


Would you mind wrapping to reduce jaggedness in general?

```
  // Register one agent and one framework. The framework will
  // immediately receive receive an offer and make it have the
  // maximum possible dominant share.
```



src/tests/hierarchical_allocator_tests.cpp (lines 2783 - 2787)


We need to settle after these allocator calls, otherwise we may not see the 
changes reflected in the metrics. Did you run this test in repetition?



src/tests/hierarchical_allocator_tests.cpp (lines 2791 - 2792)


How about moving this up to before the decline happens, to be consistent 
with your other comments that describe what's happening in the test, so:

```
  // Decline the offered resources and expect a zero share.
  Future allocation = allocations.get();
  AWAIT_READY(allocation);
  allocator->recoverResources(
  allocation->frameworkId,
  agent1.id(),
  allocation->resources.get(agent1.id()).get(),
  None());
```



src/tests/hierarchical_allocator_tests.cpp (line 2835)


Can you move this down to be adjacent with the expectation on the metrics? 
It's also how we did it in the metrics test above:

```
  expected.values = {
  {"allocator/mesos/dominant_share/roles/roleA", 0.5},
  };

  metrics = Metrics();
  EXPECT_TRUE(metrics.contains(expected));
```


- Ben Mahler


On April 4, 2016, 12:32 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45534/
> ---
> 
> (Updated April 4, 2016, 12:32 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Ben Mahler.
> 
> 
> Bugs: MESOS-4760
> https://issues.apache.org/jira/browse/MESOS-4760
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added per-role and quota share metrics to the DRFSorter.
> 
> 
> Diffs
> -
> 
>   docs/monitoring.md 76be32842e8b437fda40c8565a34ec4f8d8dfbcc 
>   src/Makefile.am f22ae5b3bd9336a56c802e0e51d39d6cb675caf2 
>   src/master/allocator/mesos/hierarchical.hpp 
> e979fdf60da1409d1c2d08f0e9f03cef067506dd 
>   src/master/allocator/sorter/drf/metrics.hpp PRE-CREATION 
>   src/master/allocator/sorter/drf/metrics.cpp PRE-CREATION 
>   src/master/allocator/sorter/drf/sorter.hpp 
> f316bb5b8bfe93311ecac57198392e104b234b04 
>   src/master/allocator/sorter/drf/sorter.cpp 
> c14f9a45b9e2ea8d79bd8d2f092d313afa4cbac3 
>   

Re: Review Request 45534: Added per-role and quota share metrics to the DRFSorter.

2016-04-04 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45534/#review126835
---



Patch looks great!

Reviews applied: [45529, 45533, 45534]

Passed command: export OS='ubuntu:14.04' CONFIGURATION='--verbose' 
COMPILER='gcc' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker_build.sh

- Mesos ReviewBot


On April 4, 2016, 12:32 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45534/
> ---
> 
> (Updated April 4, 2016, 12:32 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Ben Mahler.
> 
> 
> Bugs: MESOS-4760
> https://issues.apache.org/jira/browse/MESOS-4760
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added per-role and quota share metrics to the DRFSorter.
> 
> 
> Diffs
> -
> 
>   docs/monitoring.md 76be32842e8b437fda40c8565a34ec4f8d8dfbcc 
>   src/Makefile.am f22ae5b3bd9336a56c802e0e51d39d6cb675caf2 
>   src/master/allocator/mesos/hierarchical.hpp 
> e979fdf60da1409d1c2d08f0e9f03cef067506dd 
>   src/master/allocator/sorter/drf/metrics.hpp PRE-CREATION 
>   src/master/allocator/sorter/drf/metrics.cpp PRE-CREATION 
>   src/master/allocator/sorter/drf/sorter.hpp 
> f316bb5b8bfe93311ecac57198392e104b234b04 
>   src/master/allocator/sorter/drf/sorter.cpp 
> c14f9a45b9e2ea8d79bd8d2f092d313afa4cbac3 
>   src/master/allocator/sorter/sorter.hpp 
> e2338d5297e11a1ca4f6e5d72a4526aa4579610c 
>   src/tests/hierarchical_allocator_tests.cpp 
> 8f78a204d296f94f515f21511710a35c33e27255 
> 
> Diff: https://reviews.apache.org/r/45534/diff/
> 
> 
> Testing
> ---
> 
> make check (OS X, clang trunk, not optimized)
> 
> I did also benchmark the slowdown of the allocator due to this addition with 
> the benchmark allocator https://reviews.apache.org/r/44853. There I saw that 
> for an unoptimized build this patch adds up to 70 ms to the time needed to 
> query the metrics endpoint (this was for the case of 5000 slaves and 1000 
> frameworks), though one could expected that an optimized build might perform 
> better. The numbers I got where
> 
> `#`slaves | #frameworks | old time [us] | new time [us] | slowdown
> |-|---|---|-
>1000 |   1 | 38980 | 23847 |0.6
>1000 |  50 | 27834 | 42091 |1.5
>1000 | 100 | 40060 | 47571 |1.2
>1000 | 200 | 63132 | 75806 |1.2
>1000 | 500 |145170 |171929 |1.2
>1000 |1000 |427721 |473822 |1.1
>5000 |   1 | 23249 | 21426 |0.9
>5000 |  50 | 41032 | 36318 |0.9
>5000 | 100 | 43636 | 45210 |1.0
>5000 | 200 | 60204 | 65570 |1.1
>5000 | 500 |121509 |196894 |1.6
>5000 |1000 |449476 |496641 |1.1
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 45534: Added per-role and quota share metrics to the DRFSorter.

2016-04-04 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45534/
---

(Updated April 4, 2016, 2:32 p.m.)


Review request for mesos, Alexander Rukletsov and Ben Mahler.


Bugs: MESOS-4760
https://issues.apache.org/jira/browse/MESOS-4760


Repository: mesos


Description
---

Added per-role and quota share metrics to the DRFSorter.


Diffs (updated)
-

  docs/monitoring.md 76be32842e8b437fda40c8565a34ec4f8d8dfbcc 
  src/Makefile.am f22ae5b3bd9336a56c802e0e51d39d6cb675caf2 
  src/master/allocator/mesos/hierarchical.hpp 
e979fdf60da1409d1c2d08f0e9f03cef067506dd 
  src/master/allocator/sorter/drf/metrics.hpp PRE-CREATION 
  src/master/allocator/sorter/drf/metrics.cpp PRE-CREATION 
  src/master/allocator/sorter/drf/sorter.hpp 
f316bb5b8bfe93311ecac57198392e104b234b04 
  src/master/allocator/sorter/drf/sorter.cpp 
c14f9a45b9e2ea8d79bd8d2f092d313afa4cbac3 
  src/master/allocator/sorter/sorter.hpp 
e2338d5297e11a1ca4f6e5d72a4526aa4579610c 
  src/tests/hierarchical_allocator_tests.cpp 
8f78a204d296f94f515f21511710a35c33e27255 

Diff: https://reviews.apache.org/r/45534/diff/


Testing
---

make check (OS X, clang trunk, not optimized)

I did also benchmark the slowdown of the allocator due to this addition with 
the benchmark allocator https://reviews.apache.org/r/44853. There I saw that 
for an unoptimized build this patch adds up to 70 ms to the time needed to 
query the metrics endpoint (this was for the case of 5000 slaves and 1000 
frameworks), though one could expected that an optimized build might perform 
better. The numbers I got where

`#`slaves | #frameworks | old time [us] | new time [us] | slowdown
|-|---|---|-
   1000 |   1 | 38980 | 23847 |0.6
   1000 |  50 | 27834 | 42091 |1.5
   1000 | 100 | 40060 | 47571 |1.2
   1000 | 200 | 63132 | 75806 |1.2
   1000 | 500 |145170 |171929 |1.2
   1000 |1000 |427721 |473822 |1.1
   5000 |   1 | 23249 | 21426 |0.9
   5000 |  50 | 41032 | 36318 |0.9
   5000 | 100 | 43636 | 45210 |1.0
   5000 | 200 | 60204 | 65570 |1.1
   5000 | 500 |121509 |196894 |1.6
   5000 |1000 |449476 |496641 |1.1


Thanks,

Benjamin Bannier



Re: Review Request 45534: Added per-role and quota share metrics to the DRFSorter.

2016-03-31 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45534/#review126342
---



Bad patch!

Reviews applied: [45534, 45533, 45529]

Failed command: ./support/apply-review.sh -n -r 45533

Error:
2016-03-31 16:52:57 URL:https://reviews.apache.org/r/45533/diff/raw/ 
[3969/3969] -> "45533.patch" [1]
45533.patch:88: trailing whitespace.
  // necessary. 
warning: 1 line adds whitespace errors.
src/master/allocator/mesos/hierarchical.cpp:138: trailing whitespace.
+  // necessary. 

Full log: https://builds.apache.org/job/mesos-reviewbot/12243/console

- Mesos ReviewBot


On March 31, 2016, 2:09 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45534/
> ---
> 
> (Updated March 31, 2016, 2:09 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Ben Mahler.
> 
> 
> Bugs: MESOS-4760
> https://issues.apache.org/jira/browse/MESOS-4760
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added per-role and quota share metrics to the DRFSorter.
> 
> 
> Diffs
> -
> 
>   docs/monitoring.md 76be32842e8b437fda40c8565a34ec4f8d8dfbcc 
>   src/Makefile.am f22ae5b3bd9336a56c802e0e51d39d6cb675caf2 
>   src/master/allocator/mesos/hierarchical.hpp 
> e979fdf60da1409d1c2d08f0e9f03cef067506dd 
>   src/master/allocator/sorter/drf/metrics.hpp PRE-CREATION 
>   src/master/allocator/sorter/drf/metrics.cpp PRE-CREATION 
>   src/master/allocator/sorter/drf/sorter.hpp 
> f316bb5b8bfe93311ecac57198392e104b234b04 
>   src/master/allocator/sorter/drf/sorter.cpp 
> c14f9a45b9e2ea8d79bd8d2f092d313afa4cbac3 
>   src/master/allocator/sorter/sorter.hpp 
> e2338d5297e11a1ca4f6e5d72a4526aa4579610c 
>   src/tests/hierarchical_allocator_tests.cpp 
> 8f78a204d296f94f515f21511710a35c33e27255 
> 
> Diff: https://reviews.apache.org/r/45534/diff/
> 
> 
> Testing
> ---
> 
> make check (OS X, clang trunk, not optimized)
> 
> I did also benchmark the slowdown of the allocator due to this addition with 
> the benchmark allocator https://reviews.apache.org/r/44853. There I saw that 
> for an unoptimized build this patch adds up to 70 ms to the time needed to 
> query the metrics endpoint (this was for the case of 5000 slaves and 1000 
> frameworks), though one could expected that an optimized build might perform 
> better. The numbers I got where
> 
> `#`slaves | #frameworks | old time [us] | new time [us] | slowdown
> |-|---|---|-
>1000 |   1 | 38980 | 23847 |0.6
>1000 |  50 | 27834 | 42091 |1.5
>1000 | 100 | 40060 | 47571 |1.2
>1000 | 200 | 63132 | 75806 |1.2
>1000 | 500 |145170 |171929 |1.2
>1000 |1000 |427721 |473822 |1.1
>5000 |   1 | 23249 | 21426 |0.9
>5000 |  50 | 41032 | 36318 |0.9
>5000 | 100 | 43636 | 45210 |1.0
>5000 | 200 | 60204 | 65570 |1.1
>5000 | 500 |121509 |196894 |1.6
>5000 |1000 |449476 |496641 |1.1
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Review Request 45534: Added per-role and quota share metrics to the DRFSorter.

2016-03-31 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45534/
---

Review request for mesos, Alexander Rukletsov and Ben Mahler.


Bugs: MESOS-4760
https://issues.apache.org/jira/browse/MESOS-4760


Repository: mesos


Description
---

Added per-role and quota share metrics to the DRFSorter.


Diffs
-

  docs/monitoring.md 76be32842e8b437fda40c8565a34ec4f8d8dfbcc 
  src/Makefile.am f22ae5b3bd9336a56c802e0e51d39d6cb675caf2 
  src/master/allocator/mesos/hierarchical.hpp 
e979fdf60da1409d1c2d08f0e9f03cef067506dd 
  src/master/allocator/sorter/drf/metrics.hpp PRE-CREATION 
  src/master/allocator/sorter/drf/metrics.cpp PRE-CREATION 
  src/master/allocator/sorter/drf/sorter.hpp 
f316bb5b8bfe93311ecac57198392e104b234b04 
  src/master/allocator/sorter/drf/sorter.cpp 
c14f9a45b9e2ea8d79bd8d2f092d313afa4cbac3 
  src/master/allocator/sorter/sorter.hpp 
e2338d5297e11a1ca4f6e5d72a4526aa4579610c 
  src/tests/hierarchical_allocator_tests.cpp 
8f78a204d296f94f515f21511710a35c33e27255 

Diff: https://reviews.apache.org/r/45534/diff/


Testing
---

make check (OS X, clang trunk, not optimized)

I did also benchmark the slowdown of the allocator due to this addition with 
the benchmark allocator https://reviews.apache.org/r/44853. There I saw that 
for an unoptimized build this patch adds up to 70 ms to the time needed to 
query the metrics endpoint (this was for the case of 5000 slaves and 1000 
frameworks), though one could expected that an optimized build might perform 
better. The numbers I got where

`#`slaves | #frameworks | old time [us] | new time [us] | slowdown
|-|---|---|-
   1000 |   1 | 38980 | 23847 |0.6
   1000 |  50 | 27834 | 42091 |1.5
   1000 | 100 | 40060 | 47571 |1.2
   1000 | 200 | 63132 | 75806 |1.2
   1000 | 500 |145170 |171929 |1.2
   1000 |1000 |427721 |473822 |1.1
   5000 |   1 | 23249 | 21426 |0.9
   5000 |  50 | 41032 | 36318 |0.9
   5000 | 100 | 43636 | 45210 |1.0
   5000 | 200 | 60204 | 65570 |1.1
   5000 | 500 |121509 |196894 |1.6
   5000 |1000 |449476 |496641 |1.1


Thanks,

Benjamin Bannier