----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63174/#review189093 -----------------------------------------------------------
A couple of suggestions for speeding up the benchmark overhead: (1) Upgrade protobuf to 3.4.x, this comes with move support and rvalue setters for fields. Which will avoid some copies in the benchmark code and improve performance elsewhere too :) In the interim, you could manually use `Swap(T*)` but it means we'd probably want to re-write the code once move support is available (so that doesn't seem like a good option). (2) You could try using an arena for the test fixture, although I don't know if it's worth the complexity. Probably just reducing copying is simpler. (3) We can avoid re-parsing resources for each task and agent. src/tests/master_benchmarks.cpp Lines 63 (patched) <https://reviews.apache.org/r/63174/#comment266020> Can you avoid parsing resources for each agent? src/tests/master_benchmarks.cpp Lines 84 (patched) <https://reviews.apache.org/r/63174/#comment266018> Can you avoid parsing resources for every task? src/tests/master_benchmarks.cpp Lines 91-92 (patched) <https://reviews.apache.org/r/63174/#comment266019> Code written this way is nice because it will automatically benefit from move support when we upgrade protobuf to 3.4.x. :) Maybe you can write more of the test in such a manner that it would benefit from an upgrade to 3.4.x? I would be happy to review a 3.4.x upgrade since we need it for other performance improvements. We can see who wants to pick that up, I think Dmitry might be interested. src/tests/master_benchmarks.cpp Lines 139-140 (patched) <https://reviews.apache.org/r/63174/#comment266015> Here's an example of where you could move into `message.frameworks` if you upgrade to protobuf 3.4.x: ``` message.mutable_frameworks()->Add(createFrameworkInfo(frameworkId)); ``` Alternatively, pre-3.4.x, you can swap: ``` message.add_frameworks()->Swap(&createFrameworkInfo(frameworkId)); // maybe you need to do: FrameworkInfo f = createFrameworkInfo(frameworkId); message.add_frameworks()->Swap(&f); ``` src/tests/master_benchmarks.cpp Lines 143-147 (patched) <https://reviews.apache.org/r/63174/#comment266016> Ditto copying here. src/tests/master_benchmarks.cpp Lines 163-167 (patched) <https://reviews.apache.org/r/63174/#comment266017> Ditto copying here and elsewhere. src/tests/master_benchmarks.cpp Lines 241-243 (patched) <https://reviews.apache.org/r/63174/#comment266013> Comment about why you're using the replicated log here? src/tests/master_benchmarks.cpp Lines 261 (patched) <https://reviews.apache.org/r/63174/#comment266012> I'm a little concerned about this pattern, because if the test were to fail an assertion, the process would be destructed without terminating / waiting on it. Can you use a wrapper around the process that terminates and waits? Alternatively, if we had a SCOPE_EXIT { ... } abstraction (I had a review but never committed it), we could just do: ``` SCOPE_EXIT { process::terminate(pid); wait(pid); }; ``` E.g. https://github.com/facebook/folly/blob/v2017.10.23.00/folly/ScopeGuard.h#L285-L287 - Benjamin Mahler On Oct. 24, 2017, 6:05 p.m., Jiang Yan Xu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63174/ > ----------------------------------------------------------- > > (Updated Oct. 24, 2017, 6:05 p.m.) > > > Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin. > > > Bugs: MESOS-8098 > https://issues.apache.org/jira/browse/MESOS-8098 > > > Repository: mesos > > > Description > ------- > > The current benchmark is very simple: without framework involvement and > without agent retries but it's possible to add a number of others so I am > creating a new file for them. > > > Diffs > ----- > > src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 > src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 > src/tests/master_benchmarks.cpp PRE-CREATION > > > Diff: https://reviews.apache.org/r/63174/diff/2/ > > > Testing > ------- > > Benchmark based off > https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a > (close to current HEAD). > > ``` > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 100000 running tasks and 100000 > completed tasks in 11.188008209secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > (22404 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 200000 running tasks and 0 completed > tasks in 20.868372615secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > (37981 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > Starting reregistration for all agents > Reregistered 20000 agents with a total of 100000 running tasks and 0 > completed tasks in 15.354579251secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > (33766 ms) > [----------] 3 tests from > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total) > > > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 100000 running tasks and 100000 > completed tasks in 11.045441129secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > (19959 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 200000 running tasks and 0 completed > tasks in 21.324309077secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > (38490 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > Starting reregistration for all agents > Reregistered 20000 agents with a total of 100000 running tasks and 0 > completed tasks in 14.68607521secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > (32073 ms) > [----------] 3 tests from > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total) > > ``` > > Benchmark based off > https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d > (before https://issues.apache.org/jira/browse/MESOS-7713 was merged) > > ``` > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 100000 running tasks and 100000 > completed tasks in 23.217901878secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > (38327 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 200000 running tasks and 0 completed > tasks in 46.158610597secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > (75280 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > Starting reregistration for all agents > Reregistered 20000 agents with a total of 100000 running tasks and 0 > completed tasks in 38.56781112secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > (68006 ms) > [----------] 3 tests from > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total) > > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 100000 running tasks and 100000 > completed tasks in 25.752844224secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > (43509 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 200000 running tasks and 0 completed > tasks in 45.190859035secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > (73966 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > Starting reregistration for all agents > Reregistered 20000 agents with a total of 100000 running tasks and 0 > completed tasks in 36.322992753secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > (66946 ms) > [----------] 3 tests from > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total) > ``` > > The recently patches cut down the time by over 50%. These were built with > `--enable-optimize --enable-lock-free-run-queue > --enable-lock-free-event-queue > --enable-last-in-first-out-fixed-size-semaphore`. > > > Thanks, > > Jiang Yan Xu > >