----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43882/#review124967 -----------------------------------------------------------
Thanks! I'll make the adjustments from the comments below, please take a look at the commit diff. docs/monitoring.md (lines 906 - 910) <https://reviews.apache.org/r/43882/#comment187691> Hm.. how about allocation_run to be more consistent with our existing timer names? docs/monitoring.md (line 909) <https://reviews.apache.org/r/43882/#comment187693> s/Gauge/Timer/ src/master/allocator/mesos/metrics.hpp (lines 87 - 88) <https://reviews.apache.org/r/43882/#comment187699> Could we put this adjacent to the related `allocation_runs` metric? src/master/allocator/mesos/metrics.cpp (line 48) <https://reviews.apache.org/r/43882/#comment187695> Since we expect many allocations, we should pass a window here (see the registrar's `state_store` metric where we used a window of `Days(1)`). Passing a window allows us to obtain percentiles which is critical for events that occur on a regular basis where the monitoring system may not be able to catch every timing. Here we could use 1 hour, since it's more freqent than the registry operations. src/tests/hierarchical_allocator_tests.cpp (lines 2720 - 2747) <https://reviews.apache.org/r/43882/#comment187712> I ended up taking a stab at re-writing this given all the other comments: ``` // This test checks that the allocation run timer // metrics are reported in the metrics endpoint. TEST_F(HierarchicalAllocatorTest, AllocationRunTimerMetrics) { Clock::pause(); initialize(); Clock::settle(); // These time series statistics will be generated // once at least 2 allocation runs occur. auto statistics = { "allocator/mesos/allocation_run_ms/count", "allocator/mesos/allocation_run_ms/min", "allocator/mesos/allocation_run_ms/max", "allocator/mesos/allocation_run_ms/p50", "allocator/mesos/allocation_run_ms/p95", "allocator/mesos/allocation_run_ms/p99", "allocator/mesos/allocation_run_ms/p999", "allocator/mesos/allocation_run_ms/p9999", }; JSON::Object metrics = Metrics(); map<string, JSON::Value> values = metrics.values; EXPECT_EQ(1u, values.count("allocator/mesos/allocation_run_ms")); // No allocation timing statistics should appear. foreach (const string& statistic, statistics) { EXPECT_EQ(0u, values.count(statistic)) << "Expected " << statistic << " to be absent"; } // Allow the allocation timer to measure time. Clock::resume(); // Trigger at least two allocations to generate // the window statistics. SlaveInfo agent = createSlaveInfo("cpus:2;mem:1024;disk:0"); allocator->addSlave(agent.id(), agent, None(), agent.resources(), {}); AWAIT_READY(allocations.get()); FrameworkInfo framework = createFrameworkInfo("role1"); allocator->addFramework(framework.id(), framework, {}); AWAIT_READY(allocations.get()); metrics = Metrics(); values = metrics.values; // A non-zero measurement should be present. JSON::Value value = metrics.values["allocator/mesos/allocation_time_ms"]; ASSERT_TRUE(value.is<JSON::Number>()); JSON::Number timing = value.as<JSON::Number>(); ASSERT_EQ(JSON::Number::FLOATING, timing.type); EXPECT_GT(timing.as<double>(), 0.0); // The statistics should be generated. foreach (const string& statistic, statistics) { EXPECT_EQ(1u, values.count(statistic)) << "Expected " << statistic << " to be absent"; } } ``` src/tests/hierarchical_allocator_tests.cpp (lines 2722 - 2723) <https://reviews.apache.org/r/43882/#comment187714> Hm.. the clock should be paused initially if you're assuming the metric is absent, otherwise the allocation interval could fire and insert the metric before you obtained the metrics. src/tests/hierarchical_allocator_tests.cpp (line 2724) <https://reviews.apache.org/r/43882/#comment187700> brace on the next line src/tests/hierarchical_allocator_tests.cpp (line 2733) <https://reviews.apache.org/r/43882/#comment187713> Hm.. it's unfortunate that there are a bunch of tests using "agent" as the variable name, seems confusing to the reader. I would have rather we did a consistent change across the tests, but not your fault, we can stick with agent here since this file seems to use it in many places. src/tests/hierarchical_allocator_tests.cpp (lines 2736 - 2737) <https://reviews.apache.org/r/43882/#comment187705> Try to wrap to reduce jagedness: ``` // Wait until at least one allocation run occurs // so the allocation time could be measured. Future<Allocation> allocation = allocations.get(); AWAIT_READY(allocation); ``` - Ben Mahler On March 21, 2016, 11:08 a.m., Benjamin Bannier wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/43882/ > ----------------------------------------------------------- > > (Updated March 21, 2016, 11:08 a.m.) > > > Review request for mesos, Alexander Rukletsov and Ben Mahler. > > > Bugs: MESOS-4721 > https://issues.apache.org/jira/browse/MESOS-4721 > > > Repository: mesos > > > Description > ------- > > Added allocation metrics for allocation time. > > > Diffs > ----- > > docs/monitoring.md 827f7073204fcf8575ca980a5571c8be4f5e4110 > src/master/allocator/mesos/hierarchical.cpp > 70291075c00a9a557529c2562dedcfc6c6c3ec32 > src/master/allocator/mesos/metrics.hpp > d04e9a11c77b6fb2d522608e3085f81f8a6657ca > src/master/allocator/mesos/metrics.cpp > 46dd7f7b3d8d49c59b42e789fe5dcd49f1693ace > src/tests/hierarchical_allocator_tests.cpp > 459e02576f6d05abbbcc83ae5cabac5c66e93f05 > > Diff: https://reviews.apache.org/r/43882/diff/ > > > Testing > ------- > > make check (OS X) > > I confirmed that this does not lead to general performance regressions in the > allocator; this is partially expected since the added code only inserts > metrics in the allocator while the actual work is perform asynchronously. > These tests where performed with > `HierarchicalAllocator_BENCHMARK_Test.DeclineOffers` on an optimized build > under OS X using clang(trunk) as compiler. > > > Thanks, > > Benjamin Bannier > >
