Re: Review Request 43882: Added allocation metrics for allocation time.

Ben Mahler Wed, 23 Mar 2016 00:44:56 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43882/#review124967
-----------------------------------------------------------




Thanks! I'll make the adjustments from the comments below, please take a look 
at the commit diff.


docs/monitoring.md (lines 906 - 910)
<https://reviews.apache.org/r/43882/#comment187691>

    Hm.. how about allocation_run to be more consistent with our existing timer 
names?



docs/monitoring.md (line 909)
<https://reviews.apache.org/r/43882/#comment187693>

    s/Gauge/Timer/



src/master/allocator/mesos/metrics.hpp (lines 87 - 88)
<https://reviews.apache.org/r/43882/#comment187699>

    Could we put this adjacent to the related `allocation_runs` metric?



src/master/allocator/mesos/metrics.cpp (line 48)
<https://reviews.apache.org/r/43882/#comment187695>

    Since we expect many allocations, we should pass a window here (see the 
registrar's `state_store` metric where we used a window of `Days(1)`). Passing 
a window allows us to obtain percentiles which is critical for events that 
occur on a regular basis where the monitoring system may not be able to catch 
every timing.
    
    Here we could use 1 hour, since it's more freqent than the registry 
operations.



src/tests/hierarchical_allocator_tests.cpp (lines 2720 - 2747)
<https://reviews.apache.org/r/43882/#comment187712>

    I ended up taking a stab at re-writing this given all the other comments:
    
    ```
    // This test checks that the allocation run timer
    // metrics are reported in the metrics endpoint.
    TEST_F(HierarchicalAllocatorTest, AllocationRunTimerMetrics)
    {
      Clock::pause();
    
      initialize();
    
      Clock::settle();
    
      // These time series statistics will be generated
      // once at least 2 allocation runs occur.
      auto statistics = {
        "allocator/mesos/allocation_run_ms/count",
        "allocator/mesos/allocation_run_ms/min",
        "allocator/mesos/allocation_run_ms/max",
        "allocator/mesos/allocation_run_ms/p50",
        "allocator/mesos/allocation_run_ms/p95",
        "allocator/mesos/allocation_run_ms/p99",
        "allocator/mesos/allocation_run_ms/p999",
        "allocator/mesos/allocation_run_ms/p9999",
      };
    
      JSON::Object metrics = Metrics();
      map<string, JSON::Value> values = metrics.values;
    
      EXPECT_EQ(1u, values.count("allocator/mesos/allocation_run_ms"));
    
      // No allocation timing statistics should appear.
      foreach (const string& statistic, statistics) {
        EXPECT_EQ(0u, values.count(statistic))
          << "Expected " << statistic << " to be absent";
      }
    
      // Allow the allocation timer to measure time.
      Clock::resume();
    
      // Trigger at least two allocations to generate
      // the window statistics.
      SlaveInfo agent = createSlaveInfo("cpus:2;mem:1024;disk:0");
      allocator->addSlave(agent.id(), agent, None(), agent.resources(), {});
      AWAIT_READY(allocations.get());
    
      FrameworkInfo framework = createFrameworkInfo("role1");
      allocator->addFramework(framework.id(), framework, {});
      AWAIT_READY(allocations.get());
    
      metrics = Metrics();
      values = metrics.values;
    
      // A non-zero measurement should be present.
      JSON::Value value = metrics.values["allocator/mesos/allocation_time_ms"];
      ASSERT_TRUE(value.is<JSON::Number>());
      JSON::Number timing = value.as<JSON::Number>();
      ASSERT_EQ(JSON::Number::FLOATING, timing.type);
      EXPECT_GT(timing.as<double>(), 0.0);
    
      // The statistics should be generated.
      foreach (const string& statistic, statistics) {
        EXPECT_EQ(1u, values.count(statistic))
          << "Expected " << statistic << " to be absent";
      }
    }
    ```



src/tests/hierarchical_allocator_tests.cpp (lines 2722 - 2723)
<https://reviews.apache.org/r/43882/#comment187714>

    Hm.. the clock should be paused initially if you're assuming the metric is 
absent, otherwise the allocation interval could fire and insert the metric 
before you obtained the metrics.



src/tests/hierarchical_allocator_tests.cpp (line 2724)
<https://reviews.apache.org/r/43882/#comment187700>

    brace on the next line



src/tests/hierarchical_allocator_tests.cpp (line 2733)
<https://reviews.apache.org/r/43882/#comment187713>

    Hm.. it's unfortunate that there are a bunch of tests using "agent" as the 
variable name, seems confusing to the reader. I would have rather we did a 
consistent change across the tests, but not your fault, we can stick with agent 
here since this file seems to use it in many places.



src/tests/hierarchical_allocator_tests.cpp (lines 2736 - 2737)
<https://reviews.apache.org/r/43882/#comment187705>

    Try to wrap to reduce jagedness:
    
    ```
      // Wait until at least one allocation run occurs
      // so the allocation time could be measured.
      Future<Allocation> allocation = allocations.get();
      AWAIT_READY(allocation);
    ```


- Ben Mahler


On March 21, 2016, 11:08 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43882/
> -----------------------------------------------------------
> 
> (Updated March 21, 2016, 11:08 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Ben Mahler.
> 
> 
> Bugs: MESOS-4721
>     https://issues.apache.org/jira/browse/MESOS-4721
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added allocation metrics for allocation time.
> 
> 
> Diffs
> -----
> 
>   docs/monitoring.md 827f7073204fcf8575ca980a5571c8be4f5e4110 
>   src/master/allocator/mesos/hierarchical.cpp 
> 70291075c00a9a557529c2562dedcfc6c6c3ec32 
>   src/master/allocator/mesos/metrics.hpp 
> d04e9a11c77b6fb2d522608e3085f81f8a6657ca 
>   src/master/allocator/mesos/metrics.cpp 
> 46dd7f7b3d8d49c59b42e789fe5dcd49f1693ace 
>   src/tests/hierarchical_allocator_tests.cpp 
> 459e02576f6d05abbbcc83ae5cabac5c66e93f05 
> 
> Diff: https://reviews.apache.org/r/43882/diff/
> 
> 
> Testing
> -------
> 
> make check (OS X)
> 
> I confirmed that this does not lead to general performance regressions in the 
> allocator; this is partially expected since the added code only inserts 
> metrics in the allocator while the actual work is perform asynchronously. 
> These tests where performed with 
> `HierarchicalAllocator_BENCHMARK_Test.DeclineOffers` on an optimized build 
> under OS X using clang(trunk) as compiler.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>

Re: Review Request 43882: Added allocation metrics for allocation time.

Reply via email to