Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-11-03 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/
---

(Updated Nov. 3, 2017, 11:10 a.m.)


Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.


Changes
---

Addressed comment. NNFR.


Bugs: MESOS-8098
https://issues.apache.org/jira/browse/MESOS-8098


Repository: mesos


Description
---

The current benchmark is very simple: without framework involvement and without 
agent retries but it's possible to add a number of others so I am creating a 
new file for them.


Diffs (updated)
-

  src/Makefile.am 1c97b1fd8151f87c4e9e6d62884b0ef7d582c312 
  src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
  src/tests/master_benchmarks.cpp PRE-CREATION 


Diff: https://reviews.apache.org/r/63174/diff/4/

Changes: https://reviews.apache.org/r/63174/diff/3-4/


Testing
---

Benchmark based off 
https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a 
(close to current HEAD).

```
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 10 running tasks and 10 
completed tasks in 11.188008209secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
 (22404 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
tasks in 20.868372615secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
 (37981 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 2 agents with a total of 10 running tasks and 0 completed 
tasks in 15.354579251secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
 (33766 ms)
[--] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test 
(94151 ms total)


[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 10 running tasks and 10 
completed tasks in 11.045441129secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
 (19959 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
tasks in 21.324309077secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
 (38490 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 2 agents with a total of 10 running tasks and 0 completed 
tasks in 14.68607521secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
 (32073 ms)
[--] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test 
(90523 ms total)

```

Benchmark based off 
https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d 
(before https://issues.apache.org/jira/browse/MESOS-7713 was merged)

```
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 10 running tasks and 10 
completed tasks in 23.217901878secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
 (38327 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
tasks in 46.158610597secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
 (75280 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 2 agents with a total of 10 running tasks and 0 completed 
tasks in 38.56781112secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
 (68006 ms)
[--] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test 
(181613 ms total)

[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 10 running tasks and 10 
completed tasks in 25.752844224secs
[   OK ] 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-11-02 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189978
---


Ship it!




Are you able to also upload some flame graphs to MESOS-8098 for posterity? To 
avoid including unnecessary data, I guess you could temporarily tweak the 
benchmark to test sleep before and after the timed section so that you can 
start/stop profiling for just the parts we care about.


src/tests/master_benchmarks.cpp
Lines 65-66 (patched)


Can you say we use a static here to avoid the cost of re-parsing?



src/tests/master_benchmarks.cpp
Lines 88 (patched)


Ditto here



src/tests/master_benchmarks.cpp
Lines 316 (patched)


Do you need to print this?


- Benjamin Mahler


On Nov. 1, 2017, 10:06 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> ---
> 
> (Updated Nov. 1, 2017, 10:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
> https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 1c97b1fd8151f87c4e9e6d62884b0ef7d582c312 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/3/
> 
> 
> Testing
> ---
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.188008209secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (22404 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 20.868372615secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (37981 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 2 agents with a total of 10 running tasks and 0 
> completed tasks in 15.354579251secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (33766 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.045441129secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (19959 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 21.324309077secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (38490 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 2 agents with a total of 10 running tasks and 0 
> completed tasks in 14.68607521secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (32073 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
>  (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 23.217901878secs
> [   

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-11-02 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189892
---



Patch looks great!

Reviews applied: [63174]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Nov. 1, 2017, 10:06 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> ---
> 
> (Updated Nov. 1, 2017, 10:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
> https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 1c97b1fd8151f87c4e9e6d62884b0ef7d582c312 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/3/
> 
> 
> Testing
> ---
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.188008209secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (22404 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 20.868372615secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (37981 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 2 agents with a total of 10 running tasks and 0 
> completed tasks in 15.354579251secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (33766 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.045441129secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (19959 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 21.324309077secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (38490 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 2 agents with a total of 10 running tasks and 0 
> completed tasks in 14.68607521secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (32073 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
>  (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 23.217901878secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (38327 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 46.158610597secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (75280 ms)
> [ RUN  ] 
> 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-11-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189867
---



FAIL: Mesos tests failed to build.

Reviews applied: `['63174']`

Failed command: `cmake.exe --build . --target mesos-tests --config Debug`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63174

Relevant logs:

- 
[mesos-tests-build-cmake-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63174/logs/mesos-tests-build-cmake-stdout.log):

```
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\uri_fetcher_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\uri_fetcher_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\master/master.hpp(2070): warning C4244: 'return': 
conversion from 'unsigned __int64' to 'double', possible loss of data 
(compiling source file C:\DCOS\mesos\mesos\src\tests\values_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\tests\values_tests.cpp(51): warning C4244: 
'argument': conversion from 'double' to 'float', possible loss of data 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\tests\values_tests.cpp(51): warning C4305: 
'argument': truncation from 'double' to 'float' 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\common\recordio_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\common\recordio_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\common\http_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\common\http_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\common\type_utils_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\common\type_utils_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\master/master.hpp(2070): warning C4244: 'return': 
conversion from 'unsigned __int64' to 'double', possible loss of data 
(compiling source file 
C:\DCOS\mesos\mesos\src\tests\common\type_utils_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\containerizer\docker_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\containerizer\docker_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(59): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\containerizer\containerizer_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\3rdparty\stout\include\stout/windows/os.hpp(448): warning 
C4996: 'GetVersionExW': was declared deprecated (compiling source file 
C:\DCOS\mesos\mesos\src\tests\containerizer\containerizer_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\master/master.hpp(2070): warning C4244: 'return': 
conversion from 'unsigned __int64' to 'double', possible loss of data 
(compiling source file 
C:\DCOS\mesos\mesos\src\tests\containerizer\containerizer_tests.cpp) 
[C:\DCOS\mesos\src\tests\mesos-tests.vcxproj]
  C:\DCOS\mesos\mesos\src\master/master.hpp(2070): warning 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-11-01 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/
---

(Updated Nov. 1, 2017, 3:06 p.m.)


Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.


Changes
---

Addressed review comments, reduces benchmark overhead by 10secs (10%).

```
[--] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 10 running tasks and 10 
completed tasks in 10.387637507secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
 (17506ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
tasks in 21.918619408secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
 (41810 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 2 agents with a total of 10 running tasks and 0 completed 
tasks in 14.680627873secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
 (24801 ms)
[--] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test 
(84117 ms total)

...

[--] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 10 running tasks and 10 
completed tasks in 10.434383702secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
 (17788 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
tasks in 21.597951218secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
 (36953 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 2 agents with a total of 10 running tasks and 0 completed 
tasks in 14.982351549secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
 (25360 ms)
[--] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test 
(80101 ms total)
```


Bugs: MESOS-8098
https://issues.apache.org/jira/browse/MESOS-8098


Repository: mesos


Description
---

The current benchmark is very simple: without framework involvement and without 
agent retries but it's possible to add a number of others so I am creating a 
new file for them.


Diffs (updated)
-

  src/Makefile.am 1c97b1fd8151f87c4e9e6d62884b0ef7d582c312 
  src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
  src/tests/master_benchmarks.cpp PRE-CREATION 


Diff: https://reviews.apache.org/r/63174/diff/3/

Changes: https://reviews.apache.org/r/63174/diff/2-3/


Testing
---

Benchmark based off 
https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a 
(close to current HEAD).

```
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 10 running tasks and 10 
completed tasks in 11.188008209secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
 (22404 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
tasks in 20.868372615secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
 (37981 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 2 agents with a total of 10 running tasks and 0 completed 
tasks in 15.354579251secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
 (33766 ms)
[--] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test 
(94151 ms total)


[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 10 running tasks and 10 
completed tasks in 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-11-01 Thread Jiang Yan Xu


> On Oct. 24, 2017, 2:55 p.m., Benjamin Mahler wrote:
> > A couple of suggestions for speeding up the benchmark overhead:
> > 
> > (1) Upgrade protobuf to 3.4.x, this comes with move support and rvalue 
> > setters for fields. Which will avoid some copies in the benchmark code and 
> > improve performance elsewhere too :) In the interim, you could manually use 
> > `Swap(T*)` but it means we'd probably want to re-write the code once move 
> > support is available (so that doesn't seem like a good option).
> > 
> > (2) You could try using an arena for the test fixture, although I don't 
> > know if it's worth the complexity. Probably just reducing copying is 
> > simpler.
> > 
> > (3) We can avoid re-parsing resources for each task and agent.

Using `Swap` for now and will clean up after proto 3.4.


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189093
---


On Nov. 1, 2017, 3:06 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> ---
> 
> (Updated Nov. 1, 2017, 3:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
> https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 1c97b1fd8151f87c4e9e6d62884b0ef7d582c312 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/3/
> 
> 
> Testing
> ---
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.188008209secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (22404 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 20.868372615secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (37981 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 2 agents with a total of 10 running tasks and 0 
> completed tasks in 15.354579251secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (33766 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.045441129secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (19959 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 21.324309077secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (38490 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 2 agents with a total of 10 running tasks and 0 
> completed tasks in 14.68607521secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (32073 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
>  (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-10-24 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189093
---



A couple of suggestions for speeding up the benchmark overhead:

(1) Upgrade protobuf to 3.4.x, this comes with move support and rvalue setters 
for fields. Which will avoid some copies in the benchmark code and improve 
performance elsewhere too :) In the interim, you could manually use `Swap(T*)` 
but it means we'd probably want to re-write the code once move support is 
available (so that doesn't seem like a good option).

(2) You could try using an arena for the test fixture, although I don't know if 
it's worth the complexity. Probably just reducing copying is simpler.

(3) We can avoid re-parsing resources for each task and agent.


src/tests/master_benchmarks.cpp
Lines 63 (patched)


Can you avoid parsing resources for each agent?



src/tests/master_benchmarks.cpp
Lines 84 (patched)


Can you avoid parsing resources for every task?



src/tests/master_benchmarks.cpp
Lines 91-92 (patched)


Code written this way is nice because it will automatically benefit from 
move support when we upgrade protobuf to 3.4.x. :)

Maybe you can write more of the test in such a manner that it would benefit 
from an upgrade to 3.4.x? I would be happy to review a 3.4.x upgrade since we 
need it for other performance improvements. We can see who wants to pick that 
up, I think Dmitry might be interested.



src/tests/master_benchmarks.cpp
Lines 139-140 (patched)


Here's an example of where you could move into `message.frameworks` if you 
upgrade to protobuf 3.4.x:

```
message.mutable_frameworks()->Add(createFrameworkInfo(frameworkId));
```

Alternatively, pre-3.4.x, you can swap:

```
message.add_frameworks()->Swap((frameworkId));

// maybe you need to do:

FrameworkInfo f = createFrameworkInfo(frameworkId);
message.add_frameworks()->Swap();
```



src/tests/master_benchmarks.cpp
Lines 143-147 (patched)


Ditto copying here.



src/tests/master_benchmarks.cpp
Lines 163-167 (patched)


Ditto copying here and elsewhere.



src/tests/master_benchmarks.cpp
Lines 241-243 (patched)


Comment about why you're using the replicated log here?



src/tests/master_benchmarks.cpp
Lines 261 (patched)


I'm a little concerned about this pattern, because if the test were to fail 
an assertion, the process would be destructed without terminating / waiting on 
it.

Can you use a wrapper around the process that terminates and waits?

Alternatively, if we had a SCOPE_EXIT { ... } abstraction (I had a review 
but never committed it), we could just do:

```
SCOPE_EXIT { process::terminate(pid); wait(pid); };
```

E.g. 
https://github.com/facebook/folly/blob/v2017.10.23.00/folly/ScopeGuard.h#L285-L287


- Benjamin Mahler


On Oct. 24, 2017, 6:05 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> ---
> 
> (Updated Oct. 24, 2017, 6:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
> https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/2/
> 
> 
> Testing
> ---
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.188008209secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (22404 ms)
> [ RUN  ] 
> 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-10-24 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189082
---



PASS: Mesos patch 63174 was successfully built and tested.

Reviews applied: `['63174']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63174

- Mesos Reviewbot Windows


On Oct. 24, 2017, 2:05 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> ---
> 
> (Updated Oct. 24, 2017, 2:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
> https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/2/
> 
> 
> Testing
> ---
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.188008209secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (22404 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 20.868372615secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (37981 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 2 agents with a total of 10 running tasks and 0 
> completed tasks in 15.354579251secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (33766 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.045441129secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (19959 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 21.324309077secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (38490 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 2 agents with a total of 10 running tasks and 0 
> completed tasks in 14.68607521secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (32073 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
>  (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 23.217901878secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (38327 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 46.158610597secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (75280 ms)
> [ RUN  ] 
> 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-10-24 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review189079
---



Patch looks great!

Reviews applied: [63174]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Oct. 24, 2017, 6:05 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> ---
> 
> (Updated Oct. 24, 2017, 6:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
> https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/2/
> 
> 
> Testing
> ---
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.188008209secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (22404 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 20.868372615secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (37981 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 2 agents with a total of 10 running tasks and 0 
> completed tasks in 15.354579251secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (33766 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.045441129secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (19959 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 21.324309077secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (38490 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 2 agents with a total of 10 running tasks and 0 
> completed tasks in 14.68607521secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (32073 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total)
> 
> ```
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
>  (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 23.217901878secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (38327 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 46.158610597secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (75280 ms)
> [ RUN  ] 
> 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-10-24 Thread Jiang Yan Xu


> On Oct. 19, 2017, 6:38 p.m., Benjamin Mahler wrote:
> > Thanks Yan! I will dig in soon.
> > 
> > Just some quick questions:
> > 
> > (1) I thought during the meeting you said it was taking a minute, but 
> > looking at all the benchmark timings they're all under a second? Is it only 
> > the benchmark setup that's expensive here?
> > (2) Is this with the lock free event & run queues? If not, how much do they 
> > help?
> > (3) As an aside, it has come up before, but it would be useful to be able 
> > to force the messages to go through the remote stack rather than the local 
> > stack. No need to think about this yet, but just something to keep in mind 
> > as not being accurate in this benchmark.
> 
> Jiang Yan Xu wrote:
> 1) Yeah looks like it. I used to include the setup time so it was large. 
> 2) Yeah I have used `--enable-optimize --enable-lock-free-run-queue 
> --enable-lock-free-event-queue 
> --enable-last-in-first-out-fixed-size-semaphore`. I could compare with the 
> perf without them.
> 3) Right right I think we should keep that in mind and we should have 
> tests that cover the remote stack. For the case here I thought it would be a 
> simple and good-enough start since the local stack alright coveres the proto 
> (de)serliazation and the rest of the libprocess optimization that we recently 
> have improved.

Haha... actually the sub-second numbers in revision 1 were totally meaningless. 
I did `process::await(reregistered)` instead of 
`process::await(reregistered).await();` when I intended to wait for the 
results...

I did some optimization in rev 2 e.g., parallelize the message preparation, 
allocate from the stack instead of heap but I have to reduce the number of 
tasks to prevent it from running too long. 

PTAL.


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188799
---


On Oct. 24, 2017, 11:05 a.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> ---
> 
> (Updated Oct. 24, 2017, 11:05 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
> https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/2/
> 
> 
> Testing
> ---
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.188008209secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (22404 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 20.868372615secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (37981 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Starting reregistration for all agents
> Reregistered 2 agents with a total of 10 running tasks and 0 
> completed tasks in 15.354579251secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (33766 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total)
> 
> 
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 10 running tasks and 10 
> completed tasks in 11.045441129secs
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (19959 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Starting reregistration for all agents
> Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
> tasks in 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-10-24 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/
---

(Updated Oct. 24, 2017, 11:05 a.m.)


Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.


Changes
---

Refactor to put the message preparation work inside each TestSlave actor so 
they can be parallelized. Also fixed the bug that the test actually didn't wait 
for all the `SlaveReregisteredMessage`s...


Bugs: MESOS-8098
https://issues.apache.org/jira/browse/MESOS-8098


Repository: mesos


Description
---

The current benchmark is very simple: without framework involvement and without 
agent retries but it's possible to add a number of others so I am creating a 
new file for them.


Diffs (updated)
-

  src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 
  src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
  src/tests/master_benchmarks.cpp PRE-CREATION 


Diff: https://reviews.apache.org/r/63174/diff/2/

Changes: https://reviews.apache.org/r/63174/diff/1-2/


Testing (updated)
---

Benchmark based off 
https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a 
(close to current HEAD).

```
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 10 running tasks and 10 
completed tasks in 11.188008209secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
 (22404 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
tasks in 20.868372615secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
 (37981 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 2 agents with a total of 10 running tasks and 0 completed 
tasks in 15.354579251secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
 (33766 ms)
[--] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test 
(94151 ms total)


[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 10 running tasks and 10 
completed tasks in 11.045441129secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
 (19959 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
tasks in 21.324309077secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
 (38490 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 2 agents with a total of 10 running tasks and 0 completed 
tasks in 14.68607521secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
 (32073 ms)
[--] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test 
(90523 ms total)

```

Benchmark based off 
https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d 
(before https://issues.apache.org/jira/browse/MESOS-7713 was merged)

```
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
Starting reregistration for all agents
Reregistered 2000 agents with a total of 10 running tasks and 10 
completed tasks in 23.217901878secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
 (38327 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
Starting reregistration for all agents
Reregistered 2000 agents with a total of 20 running tasks and 0 completed 
tasks in 46.158610597secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
 (75280 ms)
[ RUN  ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
Starting reregistration for all agents
Reregistered 2 agents with a total of 10 running tasks and 0 completed 
tasks in 38.56781112secs
[   OK ] 
AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
 (68006 ms)
[--] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test 
(181613 ms total)

[ RUN  ] 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-10-20 Thread Jiang Yan Xu


> On Oct. 19, 2017, 6:38 p.m., Benjamin Mahler wrote:
> > Thanks Yan! I will dig in soon.
> > 
> > Just some quick questions:
> > 
> > (1) I thought during the meeting you said it was taking a minute, but 
> > looking at all the benchmark timings they're all under a second? Is it only 
> > the benchmark setup that's expensive here?
> > (2) Is this with the lock free event & run queues? If not, how much do they 
> > help?
> > (3) As an aside, it has come up before, but it would be useful to be able 
> > to force the messages to go through the remote stack rather than the local 
> > stack. No need to think about this yet, but just something to keep in mind 
> > as not being accurate in this benchmark.

1) Yeah looks like it. I used to include the setup time so it was large. 
2) Yeah I have used `--enable-optimize --enable-lock-free-run-queue 
--enable-lock-free-event-queue 
--enable-last-in-first-out-fixed-size-semaphore`. I could compare with the perf 
without them.
3) Right right I think we should keep that in mind and we should have tests 
that cover the remote stack. For the case here I thought it would be a simple 
and good-enough start since the local stack alright coveres the proto 
(de)serliazation and the rest of the libprocess optimization that we recently 
have improved.


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188799
---


On Oct. 19, 2017, 4:28 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> ---
> 
> (Updated Oct. 19, 2017, 4:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
> https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/1/
> 
> 
> Testing
> ---
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 50 running tasks and 50 
> completed tasks in 45.075488ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48126 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 100 running tasks and 0 
> completed tasks in 14.172361ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (45979 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 2 agents with a total of 100 running tasks and 0 
> completed tasks in 413.508328ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (49487 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596 ms total)
> 
> ...
> 
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 50 running tasks and 50 
> completed tasks in 32.787363ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48266 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 100 running tasks and 0 
> completed tasks in 19.735003ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (46169 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 2 agents with a total of 100 running tasks and 0 
> completed tasks in 321.267267ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (51550 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987 ms total)
> ```
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
>  (before 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-10-20 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188819
---



PASS: Mesos patch 63174 was successfully built and tested.

Reviews applied: `['63174']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63174

- Mesos Reviewbot Windows


On Oct. 19, 2017, 11:28 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> ---
> 
> (Updated Oct. 19, 2017, 11:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
> https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/1/
> 
> 
> Testing
> ---
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 50 running tasks and 50 
> completed tasks in 45.075488ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48126 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 100 running tasks and 0 
> completed tasks in 14.172361ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (45979 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 2 agents with a total of 100 running tasks and 0 
> completed tasks in 413.508328ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (49487 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596 ms total)
> 
> ...
> 
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 50 running tasks and 50 
> completed tasks in 32.787363ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48266 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 100 running tasks and 0 
> completed tasks in 19.735003ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (46169 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 2 agents with a total of 100 running tasks and 0 
> completed tasks in 321.267267ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (51550 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987 ms total)
> ```
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
>  (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 50 running tasks and 50 
> completed tasks in 85.800335ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (59247 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 100 running tasks and 0 
> completed tasks in 35.342066ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (93662 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 2 agents with a total of 100 running tasks and 0 
> completed tasks in 798.738642ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (116078 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (268987 ms 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-10-19 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188802
---



Patch looks great!

Reviews applied: [63174]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Oct. 19, 2017, 11:28 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> ---
> 
> (Updated Oct. 19, 2017, 11:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
> https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/1/
> 
> 
> Testing
> ---
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 50 running tasks and 50 
> completed tasks in 45.075488ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48126 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 100 running tasks and 0 
> completed tasks in 14.172361ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (45979 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 2 agents with a total of 100 running tasks and 0 
> completed tasks in 413.508328ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (49487 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596 ms total)
> 
> ...
> 
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 50 running tasks and 50 
> completed tasks in 32.787363ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48266 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 100 running tasks and 0 
> completed tasks in 19.735003ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (46169 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 2 agents with a total of 100 running tasks and 0 
> completed tasks in 321.267267ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (51550 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987 ms total)
> ```
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
>  (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 50 running tasks and 50 
> completed tasks in 85.800335ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (59247 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 100 running tasks and 0 
> completed tasks in 35.342066ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (93662 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 2 agents with a total of 100 running tasks and 0 
> completed tasks in 798.738642ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (116078 ms)
> [--] 3 tests from 
> 

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

2017-10-19 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188799
---



Thanks Yan! I will dig in soon.

Just some quick questions:

(1) I thought during the meeting you said it was taking a minute, but looking 
at all the benchmark timings they're all under a second? Is it only the 
benchmark setup that's expensive here?
(2) Is this with the lock free event & run queues? If not, how much do they 
help?
(3) As an aside, it has come up before, but it would be useful to be able to 
force the messages to go through the remote stack rather than the local stack. 
No need to think about this yet, but just something to keep in mind as not 
being accurate in this benchmark.

- Benjamin Mahler


On Oct. 19, 2017, 11:28 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> ---
> 
> (Updated Oct. 19, 2017, 11:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
> https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/1/
> 
> 
> Testing
> ---
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 50 running tasks and 50 
> completed tasks in 45.075488ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48126 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 100 running tasks and 0 
> completed tasks in 14.172361ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (45979 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 2 agents with a total of 100 running tasks and 0 
> completed tasks in 413.508328ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (49487 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596 ms total)
> 
> ...
> 
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 50 running tasks and 50 
> completed tasks in 32.787363ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48266 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 100 running tasks and 0 
> completed tasks in 19.735003ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (46169 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 2 agents with a total of 100 running tasks and 0 
> completed tasks in 321.267267ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (51550 ms)
> [--] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987 ms total)
> ```
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
>  (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> ```
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 50 running tasks and 50 
> completed tasks in 85.800335ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (59247 ms)
> [ RUN  ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 100 running tasks and 0 
> completed tasks in 35.342066ms
> [   OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (93662 ms)
> [ RUN  ] 
>