Re: Review Request 66449: Fixed flaky `ROOT_IsolatorFlags` test.

2018-04-04 Thread Gilbert Song


> On April 4, 2018, 11:23 a.m., Gilbert Song wrote:
> > src/tests/containerizer/linux_capabilities_isolator_tests.cpp
> > Lines 747 (patched)
> > 
> >
> > Should we call `slave.get()->terminate()` before `reset()`? See 
> > SlaveRecoveryTest.
> 
> Andrei Budnik wrote:
> I think it's not necessary since we call `terminate()` in `~Slave()`: 
> https://github.com/apache/mesos/blob/594ee20c2453dad836313769aef9f8655cd75cd5/src/tests/cluster.cpp#L630

gotcha.


- Gilbert


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66449/#review200472
---


On April 4, 2018, 5:17 a.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66449/
> ---
> 
> (Updated April 4, 2018, 5:17 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-8489
> https://issues.apache.org/jira/browse/MESOS-8489
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Starting more than one agent simultaneously in tests leads to a race
> condition between a linux launcher which calls `cgroups::prepare()` for
> the first slave and `LinuxLauncherProcess::recover()` which iterates
> over cgroups hierarchy for the second slave. Therefore, `mesos/test`
> cgroup that is created to check if the kernel supports nested cgroups
> can be detected by a recovery process as they use same cgroup hierarchy
> path by default. That leads to orphan containers and causes flakiness
> of `ROOT_IsolatorFlags` test. To fix the issue, this patch adds
> termination of an agent before starting a new one.
> 
> 
> Diffs
> -
> 
>   src/tests/containerizer/linux_capabilities_isolator_tests.cpp 
> 147f2cc09307cf8c9cf6f71d0175f8a3593c0256 
> 
> 
> Diff: https://reviews.apache.org/r/66449/diff/1/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 66449: Fixed flaky `ROOT_IsolatorFlags` test.

2018-04-04 Thread Andrei Budnik


> On April 4, 2018, 6:23 p.m., Gilbert Song wrote:
> > src/tests/containerizer/linux_capabilities_isolator_tests.cpp
> > Lines 747 (patched)
> > 
> >
> > Should we call `slave.get()->terminate()` before `reset()`? See 
> > SlaveRecoveryTest.

I think it's not necessary since we call `terminate()` in `~Slave()`: 
https://github.com/apache/mesos/blob/594ee20c2453dad836313769aef9f8655cd75cd5/src/tests/cluster.cpp#L630


- Andrei


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66449/#review200472
---


On April 4, 2018, 12:17 p.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66449/
> ---
> 
> (Updated April 4, 2018, 12:17 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-8489
> https://issues.apache.org/jira/browse/MESOS-8489
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Starting more than one agent simultaneously in tests leads to a race
> condition between a linux launcher which calls `cgroups::prepare()` for
> the first slave and `LinuxLauncherProcess::recover()` which iterates
> over cgroups hierarchy for the second slave. Therefore, `mesos/test`
> cgroup that is created to check if the kernel supports nested cgroups
> can be detected by a recovery process as they use same cgroup hierarchy
> path by default. That leads to orphan containers and causes flakiness
> of `ROOT_IsolatorFlags` test. To fix the issue, this patch adds
> termination of an agent before starting a new one.
> 
> 
> Diffs
> -
> 
>   src/tests/containerizer/linux_capabilities_isolator_tests.cpp 
> 147f2cc09307cf8c9cf6f71d0175f8a3593c0256 
> 
> 
> Diff: https://reviews.apache.org/r/66449/diff/1/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 66449: Fixed flaky `ROOT_IsolatorFlags` test.

2018-04-04 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66449/#review200472
---




src/tests/containerizer/linux_capabilities_isolator_tests.cpp
Lines 747 (patched)


Should we call `slave.get()->terminate()` before `reset()`? See 
SlaveRecoveryTest.


- Gilbert Song


On April 4, 2018, 5:17 a.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66449/
> ---
> 
> (Updated April 4, 2018, 5:17 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-8489
> https://issues.apache.org/jira/browse/MESOS-8489
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Starting more than one agent simultaneously in tests leads to a race
> condition between a linux launcher which calls `cgroups::prepare()` for
> the first slave and `LinuxLauncherProcess::recover()` which iterates
> over cgroups hierarchy for the second slave. Therefore, `mesos/test`
> cgroup that is created to check if the kernel supports nested cgroups
> can be detected by a recovery process as they use same cgroup hierarchy
> path by default. That leads to orphan containers and causes flakiness
> of `ROOT_IsolatorFlags` test. To fix the issue, this patch adds
> termination of an agent before starting a new one.
> 
> 
> Diffs
> -
> 
>   src/tests/containerizer/linux_capabilities_isolator_tests.cpp 
> 147f2cc09307cf8c9cf6f71d0175f8a3593c0256 
> 
> 
> Diff: https://reviews.apache.org/r/66449/diff/1/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 66449: Fixed flaky `ROOT_IsolatorFlags` test.

2018-04-04 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66449/#review200453
---



Patch looks great!

Reviews applied: [66449]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On April 4, 2018, 3:17 p.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66449/
> ---
> 
> (Updated April 4, 2018, 3:17 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-8489
> https://issues.apache.org/jira/browse/MESOS-8489
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Starting more than one agent simultaneously in tests leads to a race
> condition between a linux launcher which calls `cgroups::prepare()` for
> the first slave and `LinuxLauncherProcess::recover()` which iterates
> over cgroups hierarchy for the second slave. Therefore, `mesos/test`
> cgroup that is created to check if the kernel supports nested cgroups
> can be detected by a recovery process as they use same cgroup hierarchy
> path by default. That leads to orphan containers and causes flakiness
> of `ROOT_IsolatorFlags` test. To fix the issue, this patch adds
> termination of an agent before starting a new one.
> 
> 
> Diffs
> -
> 
>   src/tests/containerizer/linux_capabilities_isolator_tests.cpp 
> 147f2cc09307cf8c9cf6f71d0175f8a3593c0256 
> 
> 
> Diff: https://reviews.apache.org/r/66449/diff/1/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 66449: Fixed flaky `ROOT_IsolatorFlags` test.

2018-04-04 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66449/#review200449
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['66449']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66449

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66449/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (116 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1067 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (38 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (41 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (81 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (807 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (835 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (836 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (860 ms total)

[--] Global test environment tear-down
[==] 949 tests from 94 test cases ran. (448338 ms total)
[  PASSED  ] 948 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 214 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66449/logs/mesos-tests-stderr.log):

```
I0404 13:15:01.126520  6260 master.cpp:10449] Updating the state of task 
eb30f9c5-e691-4ac4-89de-a00a5f2cea69 of framework 
970932f1-3c0e-4e32-af5d-101b00483f56- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0404 13:15:01.126520 12600 slave.cpp:3877] Shutting down fraI0404 
13:15:00.947546 13736 exec.cpp:162] Version: 1.6.0
I0404 13:15:00.973520  6356 exec.cpp:236] Executor registered on agent 
970932f1-3c0e-4e32-af5d-101b00483f56-S0
I0404 13:15:00.977519 12940 executor.cpp:176] Received SUBSCRIBED event
I0404 13:15:00.982553 12940 executor.cpp:180] Subscribed executor on 
winbldsrv-01.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0404 13:15:00.982553 12940 executor.cpp:176] Received LAUNCH event
I0404 13:15:00.988550 12940 executor.cpp:648] Starting task 
eb30f9c5-e691-4ac4-89de-a00a5f2cea69
I0404 13:15:01.069747 12940 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0404 13:15:01.098561 12940 executor.cpp:661] Forked command at 5616
I0404 13:15:01.129519  5664 exec.cpp:445] Executor asked to shutdown
I0404 13:15:01.129519 12972 executor.cpp:176] Received SHUTDOWN event
I0404 13:15:01.129519 12972 executor.cpp:758] Shutting down
I0404 13:15:01.129519 12972 executor.cpp:868] Sending SIGTERM to process tree 
at pid 5mework 970932f1-3c0e-4e32-af5d-101b00483f56-
I0404 13:15:01.127542 12600 slave.cpp:6574] Shutting down executor 
'eb30f9c5-e691-4ac4-89de-a00a5f2cea69' of framework 
970932f1-3c0e-4e32-af5d-101b00483f56- at executor(1)@10.3.1.8:59570
I0404 13:15:01.128708 13916 slave.cpp:923] Agent terminating
W0404 13:15:01.128708 13916 slave.cpp:3873] Ignoring shutdown framework 
970932f1-3c0e-4e32-af5d-101b00483f56- because it is terminating
I0404 13:15:01.129519  6260 master.cpp:10548] Removing task 
eb30f9c5-e691-4ac4-89de-a00a5f2cea69 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework 970932f1-3c0e-4e32-af5d-101b00483f56- on 
agent 970932f1-3c0e-4e32-af5d-101b00483f56-S0 at slave(418)@10.3.1.8:59549 
(winbldsrv-01.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0404 13:15:01.131566  8520 containerizer.cpp:2338] Destroying container 
16a2a12b-8ed1-4d4b-9f8e-66983e83a42c in RUNNING state
I0404 13:15:01.132565  8520 containerizer.cpp:2952] Transitioning the state of 
container 16a2a12b-8ed1-4d4b-9f8e-66983e83a42c from RUNNING to DESTROYING
I0404 13:15:01.133525  6260 master.cpp:1295] Agent 
970932f1-3c0e-4e32-af5d-101b00483f56-S0 at slave(418)@10.3.1.8:59549 
(winbldsrv-01.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0404 13:15:01.133525  6260 master.cpp:3286] Disconnecting agent 
970932f1-3c0e-4e32-af5d-101b00483f56-S0 at slave(418)@10.3.1.8:59549 

Review Request 66449: Fixed flaky `ROOT_IsolatorFlags` test.

2018-04-04 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66449/
---

Review request for mesos, Alexander Rukletsov, Gilbert Song, and Jie Yu.


Bugs: MESOS-8489
https://issues.apache.org/jira/browse/MESOS-8489


Repository: mesos


Description
---

Starting more than one agent simultaneously in tests leads to a race
condition between a linux launcher which calls `cgroups::prepare()` for
the first slave and `LinuxLauncherProcess::recover()` which iterates
over cgroups hierarchy for the second slave. Therefore, `mesos/test`
cgroup that is created to check if the kernel supports nested cgroups
can be detected by a recovery process as they use same cgroup hierarchy
path by default. That leads to orphan containers and causes flakiness
of `ROOT_IsolatorFlags` test. To fix the issue, this patch adds
termination of an agent before starting a new one.


Diffs
-

  src/tests/containerizer/linux_capabilities_isolator_tests.cpp 
147f2cc09307cf8c9cf6f71d0175f8a3593c0256 


Diff: https://reviews.apache.org/r/66449/diff/1/


Testing
---

internal CI


Thanks,

Andrei Budnik