Re: Review Request 69705: Made agent not read the forked pid and libprocess pid after reboot.

2019-01-10 Thread Qian Zhang


> On Jan. 11, 2019, 7:30 a.m., Gilbert Song wrote:
> > Could we vertify the CI failure above `SlaveRecoveryTest/0.Reboot` is not 
> > caused by our change?

Actually it is caused by this patch :-( And I have fixed it in this patch: 
https://reviews.apache.org/r/69716/ .


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69705/#review211845
---


On Jan. 10, 2019, 10:52 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69705/
> ---
> 
> (Updated Jan. 10, 2019, 10:52 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9501
> https://issues.apache.org/jira/browse/MESOS-9501
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> After agent host is rebooted, the forked pid and libprocess pid in
> agent's meta directory are obsolete, so we should not read them during
> agent recovery, otherwise containerizer may wait for an irrelevant
> process if the forked pid is reused by another process after reboot.
> 
> 
> Diffs
> -
> 
>   src/slave/state.hpp 4f3d4cefb3fdef29cce3a6abe4cf5db04d45301f 
>   src/slave/state.cpp e7cf84993c74cf6da7fe22d5112e86e039780287 
> 
> 
> Diff: https://reviews.apache.org/r/69705/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Review Request 69716: Updated `SlaveRecoveryTest.Reboot` to expect none pids.

2019-01-10 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69716/
---

Review request for mesos, Andrei Budnik, Gilbert Song, and Vinod Kone.


Bugs: MESOS-9501
https://issues.apache.org/jira/browse/MESOS-9501


Repository: mesos


Description
---

After agent host is rebooted, the recovered executor's forked pid and
libprocess pid will be `NONE()` since we do not read them from agent's
meta directory which are actually obsolete after reboot.


Diffs
-

  src/tests/slave_recovery_tests.cpp 0eb47e2bdf6a46fc21b59bb85b4b89181087ccd3 


Diff: https://reviews.apache.org/r/69716/diff/1/


Testing
---


Thanks,

Qian Zhang



Re: Review Request 69715: Fixed the CNI_NETNS handling in port mapper CNI plugin.

2019-01-10 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69715/#review211864
---


Ship it!




Ship It!

- Gilbert Song


On Jan. 10, 2019, 10:14 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69715/
> ---
> 
> (Updated Jan. 10, 2019, 10:14 p.m.)
> 
> 
> Review request for mesos, Deepak Goel, Gilbert Song, and Qian Zhang.
> 
> 
> Bugs: MESOS-9518
> https://issues.apache.org/jira/browse/MESOS-9518
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> According CNI spec, it is possible that the container runtime does not
> set CNI_NETNS environment variable when it is not available. This is
> possible in scenarios like a host reboot. In that case, the CNI plugin
> should do best effort cleanup, instead of failing.
> 
> 
> Diffs
> -
> 
>   
> src/slave/containerizer/mesos/isolators/network/cni/plugins/port_mapper/port_mapper.hpp
>  25f49f4b90ec6d0d55fc306b6ab324ba5b4e7403 
>   
> src/slave/containerizer/mesos/isolators/network/cni/plugins/port_mapper/port_mapper.cpp
>  4e784ffb4ac29861c888fdbed4fcf9902bf4182a 
> 
> 
> Diff: https://reviews.apache.org/r/69715/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 69705: Made agent not read the forked pid and libprocess pid after reboot.

2019-01-10 Thread Qian Zhang


> On Jan. 11, 2019, 7:29 a.m., Gilbert Song wrote:
> > src/slave/state.cpp
> > Line 561 (original), 592 (patched)
> > 
> >
> > Could we confirm we do not care about this marker case after reboot?

After reboot, the field `state.http` will be `None()` and it will be only used 
in `Framework::recoverExecutor`, and in that method if we find `state.http` is 
`None()`, we will set executor's pid to `UPID()` to signify that the connection 
type for this executor is unknown. I think it is OK since agent should not try 
to connect any executors after a reboot.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69705/#review211844
---


On Jan. 10, 2019, 10:52 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69705/
> ---
> 
> (Updated Jan. 10, 2019, 10:52 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9501
> https://issues.apache.org/jira/browse/MESOS-9501
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> After agent host is rebooted, the forked pid and libprocess pid in
> agent's meta directory are obsolete, so we should not read them during
> agent recovery, otherwise containerizer may wait for an irrelevant
> process if the forked pid is reused by another process after reboot.
> 
> 
> Diffs
> -
> 
>   src/slave/state.hpp 4f3d4cefb3fdef29cce3a6abe4cf5db04d45301f 
>   src/slave/state.cpp e7cf84993c74cf6da7fe22d5112e86e039780287 
> 
> 
> Diff: https://reviews.apache.org/r/69705/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 69713: Fixed a bug in health_check_tests.cpp.

2019-01-10 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69713/#review211862
---


Ship it!




Ship It!

- Gilbert Song


On Jan. 10, 2019, 9:34 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69713/
> ---
> 
> (Updated Jan. 10, 2019, 9:34 p.m.)
> 
> 
> Review request for mesos and Qian Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We forgot to call MesosTest::SetUp() and MesosTest::TearDown() in the
> override methods.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 3e9b2da5aa1602a5dd24007d9b14cab74e7d02ae 
> 
> 
> Diff: https://reviews.apache.org/r/69713/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 69714: Fixed a bug in docker_containerizer_tests.cpp.

2019-01-10 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69714/#review211863
---


Ship it!




Ship It!

- Gilbert Song


On Jan. 10, 2019, 10:14 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69714/
> ---
> 
> (Updated Jan. 10, 2019, 10:14 p.m.)
> 
> 
> Review request for mesos, Deepak Goel, Gilbert Song, and Qian Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Forgot to call MesosTest::SetUp() and MesosTest::TearDown() in the
> override methods.
> 
> 
> Diffs
> -
> 
>   src/tests/containerizer/docker_containerizer_tests.cpp 
> 2feead9ace26542821002531a6006fd00f7088b3 
> 
> 
> Diff: https://reviews.apache.org/r/69714/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 69711: Separated runtime dirs from other dirs in MesosTest.

2019-01-10 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69711/#review211861
---


Ship it!




Ship It!

- Gilbert Song


On Jan. 10, 2019, 9:13 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69711/
> ---
> 
> (Updated Jan. 10, 2019, 9:13 p.m.)
> 
> 
> Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
> Zhang.
> 
> 
> Bugs: MESOS-9518
> https://issues.apache.org/jira/browse/MESOS-9518
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously, lots of other directories are created inside the agent's
> runtime_dir. This makes it hard to cleanup agent's runtime dir without
> affecting other files for the test. This patch makes the runtime
> directory a separate directory.
> 
> 
> Diffs
> -
> 
>   src/tests/mesos.cpp 3a1101cf41995733fe7b6492781def6ac09c6130 
> 
> 
> Diff: https://reviews.apache.org/r/69711/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 69710: Switched to use ContainerizerTest for CNI tests.

2019-01-10 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69710/#review211859
---


Ship it!




Ship It!

- Gilbert Song


On Jan. 10, 2019, 9:13 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69710/
> ---
> 
> (Updated Jan. 10, 2019, 9:13 p.m.)
> 
> 
> Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
> Zhang.
> 
> 
> Bugs: MESOS-9518
> https://issues.apache.org/jira/browse/MESOS-9518
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This makes sure that cgroups for each test is independent.
> 
> 
> Diffs
> -
> 
>   src/tests/containerizer/cni_isolator_tests.cpp 
> eb20e637ecbe1b39e2dbb274c5198828f2fdf62f 
> 
> 
> Diff: https://reviews.apache.org/r/69710/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 69713: Fixed a bug in health_check_tests.cpp.

2019-01-10 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69713/#review211858
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69706', '69710', '69711', '69712', '69713']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2755/mesos-review-69713

Relevant logs:

- 
[mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2755/mesos-review-69713/logs/mesos-tests.log):

```
[   OK ] ContainerizerResourcesTest.AutoResourcesZero (15 ms)
[ RUN  ] ContainerizerResourcesTest.AutoResourcesNonZero
[   OK ] ContainerizerResourcesTest.AutoResourcesNonZero (19 ms)
[--] 3 tests from ContainerizerResourcesTest (55 ms total)

[--] 24 tests from DockerContainerizerTest
[ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Launch
I0111 07:03:26.811414 11132 zookeeper_test_server.cpp:116] Shutting down 
ZooKeeperTestServer on port 61148
W0111 07:03:26.918212 11132 docker_containerizer_tests.cpp:201] Pulling 
akagup/nano-admin and akagup/inky. This might take a while...
I0111 07:03:27.200646 11132 cluster.cpp:174] Creating default 'local' authorizer
I0111 07:03:27.207648 14016 master.cpp:416] Master 
d7fe2b18-4782-4660-b3f4-87b1a32d27a7 
(windows-02.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net) started on 
192.10.1.6:60257
I0111 07:03:27.207648 14016 master.cpp:419] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="C:\Jenkins\workspace\mesos-reviewbot-testing\credentials" 
--filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" 
--max_operator_event_stream_subscribers="1000" 
--max_unreachable_tasks_per_framework="1000" --mem
 ory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" 
--port="5050" --publish_per_framework_metrics="true" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --root_submissions="true" 
--version="false" --webui_dir="/webui" 
--work_dir="C:\Jenkins\workspace\mesos-reviewbot-testing\master" 
--zk_session_timeout="10secs"
I0111 07:03:27.209661 14016 master.cpp:468] Master only allowing authenticated 
frameworks to register
I0111 07:03:27.209661 14016 master.cpp:474] Master only allowing authenticated 
agents to register
I0111 07:03:27.209661 14016 master.cpp:480] Master only allowing authenticated 
HTTP frameworks to register
I0111 07:03:27.209661 14016 credentials.hpp:37] Loading credentials for 
authentication from 'C:\Jenkins\workspace\mesos-reviewbot-testing\credentials'
I0111 07:03:27.210652 14016 master.cpp:524] Using default 'crammd5' 
authenticator
I0111 07:03:27.211663 14016 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0111 07:03:27.211663 14016 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0111 07:03:27.211663 14016 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0111 07:03:27.212651 14016 master.cpp:605] Authorization enabled
I0111 07:03:27.221684  9740 master.cpp:2085] Elected as the leading master!
I0111 07:03:27.221684  9740 master.cpp:1640] Recovering from registrar
I0111 07:03:27.222646 11388 registrar.cpp:383] Successfully fetched the 
registry (0B) in 0ns
I0111 07:03:27.223664 11388 registrar.cpp:487] Applied 1 operations in 0ns; 
attempting to update the registry
I0111 07:03:27.224666 15220 registrar.cpp:544] Successfully updated the 
registry in 1.00352ms
I0111 07:03:27.224666 15220 registrar.cpp:416] Successfully recovered registrar
I0111 07:03:27.225649 14016 master.cpp:1754] Recovered 0 agents from the 
registry (235B); allowing 10mins for agents to reregister
Assertion failed: isSome(), file 
d:\dcos\mesos\mesos\3rdparty\stout\include\stout\option.hpp, line 119
```

- Mesos Reviewbot Windows


On

Review Request 69715: Fixed the CNI_NETNS handling in port mapper CNI plugin.

2019-01-10 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69715/
---

Review request for mesos, Deepak Goel, Gilbert Song, and Qian Zhang.


Bugs: MESOS-9518
https://issues.apache.org/jira/browse/MESOS-9518


Repository: mesos


Description
---

According CNI spec, it is possible that the container runtime does not
set CNI_NETNS environment variable when it is not available. This is
possible in scenarios like a host reboot. In that case, the CNI plugin
should do best effort cleanup, instead of failing.


Diffs
-

  
src/slave/containerizer/mesos/isolators/network/cni/plugins/port_mapper/port_mapper.hpp
 25f49f4b90ec6d0d55fc306b6ab324ba5b4e7403 
  
src/slave/containerizer/mesos/isolators/network/cni/plugins/port_mapper/port_mapper.cpp
 4e784ffb4ac29861c888fdbed4fcf9902bf4182a 


Diff: https://reviews.apache.org/r/69715/diff/1/


Testing
---

sudo make check


Thanks,

Jie Yu



Review Request 69714: Fixed a bug in docker_containerizer_tests.cpp.

2019-01-10 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69714/
---

Review request for mesos, Deepak Goel, Gilbert Song, and Qian Zhang.


Repository: mesos


Description
---

Forgot to call MesosTest::SetUp() and MesosTest::TearDown() in the
override methods.


Diffs
-

  src/tests/containerizer/docker_containerizer_tests.cpp 
2feead9ace26542821002531a6006fd00f7088b3 


Diff: https://reviews.apache.org/r/69714/diff/1/


Testing
---

sudo make check


Thanks,

Jie Yu



Re: Review Request 69712: Added a CNI reboot test.

2019-01-10 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69712/#review211856
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69706', '69710', '69711', '69712']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2754/mesos-review-69712

Relevant logs:

- 
[mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2754/mesos-review-69712/logs/mesos-tests.log):

```
I0111 06:06:37.088624 13492 containerizer.cpp:2975] Container 
bc085dee-9f66-4f61-806c-8c6b9c756611 has exited
I0111 06:06:37.118649  9304 master.cpp:] Master [   OK ] 
HealthCheckTest.HealthyTaskViaTCP (1191 ms)
[--] 12 tests from HealthCheckTest (17437 ms total)

[--] 3 tests from DockerContainerizerHealthCheckTest
[ RUN  ] DockerContainerizerHealthCheckTest.ROOT_DOCKER_DockerHealthyTask
terminating
I0111 06:06:37.119628 13840 hierarchical.cpp:644] Removed agent 
bc64bbf3-846e-4e54-aa84-041d7185dc57-S0
W0111 06:06:37.174670  9304 health_check_tests.cpp:2116] Pulling 
akagup/nano-admin, akagup/https-server and akagup/https-server. This might take 
a while...
I0111 06:06:37.412647  9304 cluster.cpp:174] Creating default 'local' authorizer
I0111 06:06:37.420631 13840 master.cpp:416] Master 
e70d2911-8c8b-4f0f-b686-31657aef2249 
(windows-02.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net) started on 
192.10.1.6:59861
I0111 06:06:37.420631 13840 master.cpp:419] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="C:\Jenkins\workspace\mesos-reviewbot-testing\credentials" 
--filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" 
--max_operator_event_stream_subscribers="1000" 
--max_unreachable_tasks_per_framework="1000" --mem
 ory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" 
--port="5050" --publish_per_framework_metrics="true" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --root_submissions="true" 
--version="false" --webui_dir="/webui" 
--work_dir="C:\Jenkins\workspace\mesos-reviewbot-testing\master" 
--zk_session_timeout="10secs"
I0111 06:06:37.422675 13840 master.cpp:468] Master only allowing authenticated 
frameworks to register
I0111 06:06:37.422675 13840 master.cpp:474] Master only allowing authenticated 
agents to register
I0111 06:06:37.422675 13840 master.cpp:480] Master only allowing authenticated 
HTTP frameworks to register
I0111 06:06:37.422675 13840 credentials.hpp:37] Loading credentials for 
authentication from 'C:\Jenkins\workspace\mesos-reviewbot-testing\credentials'
I0111 06:06:37.423686 13840 master.cpp:524] Using default 'crammd5' 
authenticator
I0111 06:06:37.424644 13840 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0111 06:06:37.424644 13840 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0111 06:06:37.425648 13840 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0111 06:06:37.425648 13840 master.cpp:605] Authorization enabled
I0111 06:06:37.435660  8228 master.cpp:2085] Elected as the leading master!
I0111 06:06:37.435660  8228 master.cpp:1640] Recovering from registrar
I0111 06:06:37.436669  8240 registrar.cpp:383] Successfully fetched the 
registry (0B) in 1.008896ms
I0111 06:06:37.436669  8240 registrar.cpp:487] Applied 1 operations in 0ns; 
attempting to update the registry
I0111 06:06:37.437690 13492 registrar.cpp:544] Successfully updated the 
registry in 1.02144ms
I0111 06:06:37.438663 13492 registrar.cpp:416] Successfully recovered registrar
I0111 06:06:37.439663 14180 master.cpp:1754] Recovered 0 agents from the 
registry (235B); allowing 10mins for agents to reregister
Assertion failed: isSome(), file 
d:\dcos\mesos\mesos\3rdpa

Review Request 69713: Fixed a bug in health_check_tests.cpp.

2019-01-10 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69713/
---

Review request for mesos and Qian Zhang.


Repository: mesos


Description
---

We forgot to call MesosTest::SetUp() and MesosTest::TearDown() in the
override methods.


Diffs
-

  src/tests/health_check_tests.cpp 3e9b2da5aa1602a5dd24007d9b14cab74e7d02ae 


Diff: https://reviews.apache.org/r/69713/diff/1/


Testing
---

sudo make check


Thanks,

Jie Yu



Re: Review Request 69708: Fixed gRPC CMake build issue on Ubuntu 14.04.

2019-01-10 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69708/#review211855
---



PASS: Mesos patch 69708 was successfully built and tested.

Reviews applied: `['69708']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2753/mesos-review-69708

- Mesos Reviewbot Windows


On Jan. 11, 2019, 3:38 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69708/
> ---
> 
> (Updated Jan. 11, 2019, 3:38 a.m.)
> 
> 
> Review request for mesos, Gastón Kleiman and Vinod Kone.
> 
> 
> Bugs: MESOS-9519
> https://issues.apache.org/jira/browse/MESOS-9519
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed gRPC CMake build issue on Ubuntu 14.04.
> 
> 
> Diffs
> -
> 
>   3rdparty/CMakeLists.txt 703808d063e4bba58f647b5d48b78724003bcc4e 
>   3rdparty/grpc-1.10.0.patch 655f00387a6b308b653b23053419ec05c8b22144 
>   3rdparty/grpc.md e06843c8b6038eb9fb809241686fd611d1daedc8 
> 
> 
> Diff: https://reviews.apache.org/r/69708/diff/1/
> 
> 
> Testing
> ---
> 
> OS=ubuntu:14.04 BUILDTOOL=cmake COMPILER=gcc CONFIGURATION='--verbose 
> --enable-libevent --enable-ssl' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1' 
> support/docker-build.sh
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69710: Switched to use ContainerizerTest for CNI tests.

2019-01-10 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69710/
---

(Updated Jan. 11, 2019, 5:13 a.m.)


Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
Zhang.


Bugs: MESOS-9518
https://issues.apache.org/jira/browse/MESOS-9518


Repository: mesos


Description
---

This makes sure that cgroups for each test is independent.


Diffs
-

  src/tests/containerizer/cni_isolator_tests.cpp 
eb20e637ecbe1b39e2dbb274c5198828f2fdf62f 


Diff: https://reviews.apache.org/r/69710/diff/1/


Testing
---

sudo make check


Thanks,

Jie Yu



Re: Review Request 69711: Separated runtime dirs from other dirs in MesosTest.

2019-01-10 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69711/
---

(Updated Jan. 11, 2019, 5:13 a.m.)


Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
Zhang.


Bugs: MESOS-9518
https://issues.apache.org/jira/browse/MESOS-9518


Repository: mesos


Description
---

Previously, lots of other directories are created inside the agent's
runtime_dir. This makes it hard to cleanup agent's runtime dir without
affecting other files for the test. This patch makes the runtime
directory a separate directory.


Diffs
-

  src/tests/mesos.cpp 3a1101cf41995733fe7b6492781def6ac09c6130 


Diff: https://reviews.apache.org/r/69711/diff/1/


Testing
---

sudo make check


Thanks,

Jie Yu



Review Request 69710: Switched to use ContainerizerTest for CNI tests.

2019-01-10 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69710/
---

Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
Zhang.


Repository: mesos


Description
---

This makes sure that cgroups for each test is independent.


Diffs
-

  src/tests/containerizer/cni_isolator_tests.cpp 
eb20e637ecbe1b39e2dbb274c5198828f2fdf62f 


Diff: https://reviews.apache.org/r/69710/diff/1/


Testing
---

sudo make check


Thanks,

Jie Yu



Review Request 69711: Separated runtime dirs from other dirs in MesosTest.

2019-01-10 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69711/
---

Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
Zhang.


Repository: mesos


Description
---

Previously, lots of other directories are created inside the agent's
runtime_dir. This makes it hard to cleanup agent's runtime dir without
affecting other files for the test. This patch makes the runtime
directory a separate directory.


Diffs
-

  src/tests/mesos.cpp 3a1101cf41995733fe7b6492781def6ac09c6130 


Diff: https://reviews.apache.org/r/69711/diff/1/


Testing
---

sudo make check


Thanks,

Jie Yu



Review Request 69712: Added a CNI reboot test.

2019-01-10 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69712/
---

Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
Zhang.


Bugs: MESOS-9518
https://issues.apache.org/jira/browse/MESOS-9518


Repository: mesos


Description
---

This test verifies that CNI DEL is properly called even after the agent
host is rebooted, assuming `--network_cni_root_dir_persist` flag is set
to true.


Diffs
-

  src/tests/containerizer/cni_isolator_tests.cpp 
eb20e637ecbe1b39e2dbb274c5198828f2fdf62f 


Diff: https://reviews.apache.org/r/69712/diff/1/


Testing
---

sudo make check


Thanks,

Jie Yu



Re: Review Request 69706: Kept `CNI_NETNS` unset in detach if network namespace is gone.

2019-01-10 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69706/
---

(Updated Jan. 11, 2019, 5:08 a.m.)


Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
Zhang.


Changes
---

Need to reorder the code a bit so that `stat` is not called if detach is to be 
skipped.

Also, factored out a helper, which is used in cleanup to skip unmount if needed.


Bugs: MESOS-9518
https://issues.apache.org/jira/browse/MESOS-9518


Repository: mesos


Description
---

We introduced a new agent flag in MESOS-9492 so that CNI configs can be
persisted across reboot. This is for some CNI plugins to be able to
cleanup IP allocated to the containers after a sudden reboot of the host
(not all CNI plugins need this).

It's important to unset `CNI_NETNS` environment variable after reboot
when invoking CNI plugin "DEL" command so that it conforms to the spec.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
cc23428d27d40be8c4ff1476e6e984c7d12760c4 


Diff: https://reviews.apache.org/r/69706/diff/2/

Changes: https://reviews.apache.org/r/69706/diff/1-2/


Testing
---

sudo make check


Thanks,

Jie Yu



Review Request 69708: Fixed gRPC CMake build issue on Ubuntu 14.04.

2019-01-10 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69708/
---

Review request for mesos and Gastón Kleiman.


Bugs: MESOS-9519
https://issues.apache.org/jira/browse/MESOS-9519


Repository: mesos


Description
---

Fixed gRPC CMake build issue on Ubuntu 14.04.


Diffs
-

  3rdparty/CMakeLists.txt 703808d063e4bba58f647b5d48b78724003bcc4e 
  3rdparty/grpc-1.10.0.patch 655f00387a6b308b653b23053419ec05c8b22144 
  3rdparty/grpc.md e06843c8b6038eb9fb809241686fd611d1daedc8 


Diff: https://reviews.apache.org/r/69708/diff/1/


Testing
---

OS=ubuntu:14.04 BUILDTOOL=cmake COMPILER=gcc CONFIGURATION='--verbose 
--enable-libevent --enable-ssl' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1' 
support/docker-build.sh


Thanks,

Chun-Hung Hsiao



Re: Review Request 69694: Tester.

2019-01-10 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69694/#review211851
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69694']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2752/mesos-review-69694

Relevant logs:

- 
[stout-tests-cmake.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2752/mesos-review-69694/logs/stout-tests-cmake.log):

```
Microsoft (R) Build Engine version 15.9.20+g88f5fadfbe for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

MSBUILD : error MSB1009: Project file does not exist.
Switch: stout-tests.vcxproj
```

- 
[libprocess-tests-cmake.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2752/mesos-review-69694/logs/libprocess-tests-cmake.log):

```
Microsoft (R) Build Engine version 15.9.20+g88f5fadfbe for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

MSBUILD : error MSB1009: Project file does not exist.
Switch: libprocess-tests.vcxproj
```

- 
[mesos-tests-cmake.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2752/mesos-review-69694/logs/mesos-tests-cmake.log):

```
Microsoft (R) Build Engine version 15.9.20+g88f5fadfbe for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

MSBUILD : error MSB1009: Project file does not exist.
Switch: mesos-tests.vcxproj
```

- Mesos Reviewbot Windows


On Jan. 10, 2019, 3:52 a.m., Till Toenshoff wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69694/
> ---
> 
> (Updated Jan. 10, 2019, 3:52 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Tester.
> 
> 
> Diffs
> -
> 
>   CMakeLists.txt f2885faa25f1d718d0f451aa2199e3f7692317a3 
>   configure.ac 6778f119570def1838e26cddf7b0192bfe6e37d4 
> 
> 
> Diff: https://reviews.apache.org/r/69694/diff/4/
> 
> 
> Testing
> ---
> 
> Dont review - just a test!
> 
> 
> Thanks,
> 
> Till Toenshoff
> 
>



Re: Review Request 69705: Made agent not read the forked pid and libprocess pid after reboot.

2019-01-10 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69705/#review211845
---



Could we vertify the CI failure above `SlaveRecoveryTest/0.Reboot` is not 
caused by our change?

- Gilbert Song


On Jan. 10, 2019, 6:52 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69705/
> ---
> 
> (Updated Jan. 10, 2019, 6:52 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9501
> https://issues.apache.org/jira/browse/MESOS-9501
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> After agent host is rebooted, the forked pid and libprocess pid in
> agent's meta directory are obsolete, so we should not read them during
> agent recovery, otherwise containerizer may wait for an irrelevant
> process if the forked pid is reused by another process after reboot.
> 
> 
> Diffs
> -
> 
>   src/slave/state.hpp 4f3d4cefb3fdef29cce3a6abe4cf5db04d45301f 
>   src/slave/state.cpp e7cf84993c74cf6da7fe22d5112e86e039780287 
> 
> 
> Diff: https://reviews.apache.org/r/69705/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 69705: Made agent not read the forked pid and libprocess pid after reboot.

2019-01-10 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69705/#review211844
---


Ship it!





src/slave/state.cpp
Line 561 (original), 592 (patched)


Could we confirm we do not care about this marker case after reboot?


- Gilbert Song


On Jan. 10, 2019, 6:52 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69705/
> ---
> 
> (Updated Jan. 10, 2019, 6:52 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9501
> https://issues.apache.org/jira/browse/MESOS-9501
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> After agent host is rebooted, the forked pid and libprocess pid in
> agent's meta directory are obsolete, so we should not read them during
> agent recovery, otherwise containerizer may wait for an irrelevant
> process if the forked pid is reused by another process after reboot.
> 
> 
> Diffs
> -
> 
>   src/slave/state.hpp 4f3d4cefb3fdef29cce3a6abe4cf5db04d45301f 
>   src/slave/state.cpp e7cf84993c74cf6da7fe22d5112e86e039780287 
> 
> 
> Diff: https://reviews.apache.org/r/69705/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 69706: Kept `CNI_NETNS` unset in detach if network namespace is gone.

2019-01-10 Thread Deepak Goel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69706/#review211842
---


Ship it!




Ship It!

- Deepak Goel


On Jan. 10, 2019, 8:44 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69706/
> ---
> 
> (Updated Jan. 10, 2019, 8:44 p.m.)
> 
> 
> Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
> Zhang.
> 
> 
> Bugs: MESOS-9518
> https://issues.apache.org/jira/browse/MESOS-9518
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We introduced a new agent flag in MESOS-9492 so that CNI configs can be
> persisted across reboot. This is for some CNI plugins to be able to
> cleanup IP allocated to the containers after a sudden reboot of the host
> (not all CNI plugins need this).
> 
> It's important to unset `CNI_NETNS` environment variable after reboot
> when invoking CNI plugin "DEL" command so that it conforms to the spec.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
> cc23428d27d40be8c4ff1476e6e984c7d12760c4 
> 
> 
> Diff: https://reviews.apache.org/r/69706/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 69706: Kept `CNI_NETNS` unset in detach if network namespace is gone.

2019-01-10 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69706/#review211841
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69706']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2751/mesos-review-69706

Relevant logs:

- 
[mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2751/mesos-review-69706/logs/mesos-tests.log):

```
W0110 21:42:01.689435 11952 slave.cpp:3933] Ignoring shutdown framework 
985048f2-6962-4c57-986b-ff2db0065705- because it is terminating
I0110 21:42:01.692430 11124 master.cpp:1271] Agent 
985048f2-6962-4c57-986b-ff2db0065705-S0 at slave(466)@192.10.1.6:55395 
(windows-02.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net) disconnected
I0110 21:42:01.692430 11124 master.cpp:3274] Disconnecting agent 
985048f2-6962-4c57-986b-ff2db0065705-S0 at slave(466)@192.10.1.6:55395 
(windows-02.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0110 21:42:01.692430 11124 master.cpp:3293] Deactivating agent 
985048f2-6962-4c57-986b-ff2db0065705-S0 at slave(466)@192.10.1.6:55395 
(windows-02.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0110 21:42:01.692430  4892 hierarchical.cpp:358] Removed framework 
985048f2-6962-4c57-986b-ff2db0065705-
I0110 21:42:01.693429  4892 hierarchical.cpp:802] Agent 
985048f2-6962-4c57-986b-ff2db0065705-S0 deactivated
I0110 21:42:01.693429 10676 containerizer.cpp:2469] Destroying container 
d1210809-7a13-4cbd-b1ee-fd20fa9c5e54 in RUNNING state
I0110 21:42:01.694437 10676 containerizer.cpp:3136] Transitioning the state of 
container d1210809-7a13-4cbd-b1ee-fd20fa9c5e54 from RUNNING to DESTROYING
I0110 21:42:01.694437 10676 launcher.cpp:161] Asked to destroy container 
d1210809-7a13-4cbd-b1ee-fd20fa9c5e54
W0110 21:42:01.695430  5416 process.cpp:1423] Failed to recv on socket 
WindowsF[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (686 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (705 ms total)

[--] Global test environment tear-down
[==] 1084 tests from 104 test cases ran. (500763 ms total)
[  PASSED  ] 1083 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] DockerFetcherPluginTest.INTERNET_CURL_InvokeFetchByName

 1 FAILED TEST
  YOU HAVE 231 DISABLED TESTS

D::Type::SOCKET=9188 to peer '192.10.1.6:57240': IO failed with error code: The 
specified network name is no longer available.

W0110 21:42:01.696420  5416 process.cpp:838] Failed to recv on socket 
WindowsFD::Type::SOCKET=9200 to peer '192.10.1.6:57241': IO failed with error 
code: The specified network name is no longer available.

I0110 21:42:01.789948  9704 containerizer.cpp:2975] Container 
d1210809-7a13-4cbd-b1ee-fd20fa9c5e54 has exited
I0110 21:42:01.820937 11864 master.cpp:] Master terminating
I0110 21:42:01.822932 10092 hierarchical.cpp:644] Removed agent 
985048f2-6962-4c57-986b-ff2db0065705-S0
I0110 21:42:02.097929  5416 process.cpp:927] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On Jan. 10, 2019, 12:44 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69706/
> ---
> 
> (Updated Jan. 10, 2019, 12:44 p.m.)
> 
> 
> Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
> Zhang.
> 
> 
> Bugs: MESOS-9518
> https://issues.apache.org/jira/browse/MESOS-9518
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We introduced a new agent flag in MESOS-9492 so that CNI configs can be
> persisted across reboot. This is for some CNI plugins to be able to
> cleanup IP allocated to the containers after a sudden reboot of the host
> (not all CNI plugins need this).
> 
> It's important to unset `CNI_NETNS` environment variable after reboot
> when invoking CNI plugin "DEL" command so that it conforms to the spec.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
> cc23428d27d40be8c4ff1476e6e984c7d12760c4 
> 
> 
> Diff: https://reviews.apache.org/r/69706/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 69706: Kept `CNI_NETNS` unset in detach if network namespace is gone.

2019-01-10 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69706/#review211840
---


Ship it!




Ship It!

- Gilbert Song


On Jan. 10, 2019, 12:44 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69706/
> ---
> 
> (Updated Jan. 10, 2019, 12:44 p.m.)
> 
> 
> Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
> Zhang.
> 
> 
> Bugs: MESOS-9518
> https://issues.apache.org/jira/browse/MESOS-9518
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We introduced a new agent flag in MESOS-9492 so that CNI configs can be
> persisted across reboot. This is for some CNI plugins to be able to
> cleanup IP allocated to the containers after a sudden reboot of the host
> (not all CNI plugins need this).
> 
> It's important to unset `CNI_NETNS` environment variable after reboot
> when invoking CNI plugin "DEL" command so that it conforms to the spec.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
> cc23428d27d40be8c4ff1476e6e984c7d12760c4 
> 
> 
> Diff: https://reviews.apache.org/r/69706/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 69706: Kept `CNI_NETNS` unset in detach if network namespace is gone.

2019-01-10 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69706/#review211838
---




src/slave/containerizer/mesos/isolators/network/cni/cni.cpp
Lines 1715 (patched)


`CNI_NETNS`,

comma to period?



src/slave/containerizer/mesos/isolators/network/cni/cni.cpp
Lines 1717 (patched)


`gone.  According` double spaces?

delete one?


- Gilbert Song


On Jan. 10, 2019, 12:44 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69706/
> ---
> 
> (Updated Jan. 10, 2019, 12:44 p.m.)
> 
> 
> Review request for mesos, Deepak Goel, Gilbert Song, James Peach, and Qian 
> Zhang.
> 
> 
> Bugs: MESOS-9518
> https://issues.apache.org/jira/browse/MESOS-9518
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We introduced a new agent flag in MESOS-9492 so that CNI configs can be
> persisted across reboot. This is for some CNI plugins to be able to
> cleanup IP allocated to the containers after a sudden reboot of the host
> (not all CNI plugins need this).
> 
> It's important to unset `CNI_NETNS` environment variable after reboot
> when invoking CNI plugin "DEL" command so that it conforms to the spec.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
> cc23428d27d40be8c4ff1476e6e984c7d12760c4 
> 
> 
> Diff: https://reviews.apache.org/r/69706/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

2019-01-10 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211836
---




src/master/master.cpp
Line 8954 (original), 8982-8984 (patched)


Nit: fits on one line.



src/tests/api_tests.cpp
Lines 5127 (patched)


Since we don't reference the contents of `slaveFlags` anywhere, you can 
omit this variable; `StartSlave()` will use the default argument value of 
`None()` and create the slave flags itself before calling 
`cluster::Slave::create()`.



src/tests/api_tests.cpp
Lines 5161-5164 (patched)


I think this variable is unused?



src/tests/api_tests.cpp
Lines 5203-5204 (patched)


Let's get rid of the parentheses:

"Try to reserve the resources managed by the resource provider, because 
currently operation feedback is only supported for that case."



src/tests/api_tests.cpp
Lines 5236-5239 (patched)


Could we just pause the clock for the whole test? It might be necessary to 
retain the `slaveFlags` variable if you do this.

We should also resume the clock at the end of the test.


- Greg Mann


On Jan. 4, 2019, 4:57 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> ---
> 
> (Updated Jan. 4, 2019, 4:57 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1 
>   src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316 
>   src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/1/
> 
> 
> Testing
> ---
> 
> Internal CI run.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Review Request 69706: Kept `CNI_NETNS` unset in detach if network namespace is gone.

2019-01-10 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69706/
---

Review request for mesos, Deepak Goel, James Peach, and Qian Zhang.


Bugs: MESOS-9518
https://issues.apache.org/jira/browse/MESOS-9518


Repository: mesos


Description
---

We introduced a new agent flag in MESOS-9492 so that CNI configs can be
persisted across reboot. This is for some CNI plugins to be able to
cleanup IP allocated to the containers after a sudden reboot of the host
(not all CNI plugins need this).

It's important to unset `CNI_NETNS` environment variable after reboot
when invoking CNI plugin "DEL" command so that it conforms to the spec.


Diffs
-

  src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
cc23428d27d40be8c4ff1476e6e984c7d12760c4 


Diff: https://reviews.apache.org/r/69706/diff/1/


Testing
---

sudo make check


Thanks,

Jie Yu



Re: Review Request 69615: Disable containerizer ptrace attach.

2019-01-10 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69615/#review211831
---




src/slave/slave.cpp
Lines 6183 (patched)


```
#ifdef __linux__
```


- Andrei Budnik


On Jan. 2, 2019, 5:15 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69615/
> ---
> 
> (Updated Jan. 2, 2019, 5:15 p.m.)
> 
> 
> Review request for mesos, Xudong Ni, Gilbert Song, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-9349
> https://issues.apache.org/jira/browse/MESOS-9349
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Use `prctl(PR_SET_DUMPABLE)` to disable the ability to attach to
> the containerizer process(es) on Linux systems. This prevents
> unprivileged containerized processes from reading information
> about the containerizer process(es) from `/proc`. This gives an
> additional layer of protection against leaking information to
> untrusted container processes.
> 
> 
> Diffs
> -
> 
>   docs/configuration/agent.md 330283f4e3957075dd4310de4a841feac23de36c 
>   src/launcher/executor.cpp f962e800f23d5582b1bc04a263253893492a5054 
>   src/slave/containerizer/mesos/containerizer.cpp 
> a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
>   src/slave/containerizer/mesos/launch.hpp 
> 0a6394d56321948ad760ac69c05456319a254842 
>   src/slave/containerizer/mesos/launch.cpp 
> 2f1c9e7a8748c9d7eab25bc8567ca68308e680f9 
>   src/slave/flags.hpp 494ae02ab5eb365e2cda5017be573691107c3f28 
>   src/slave/flags.cpp 6bac8e1409f04d639204c45eda8a90c098e3dbd0 
>   src/slave/slave.cpp ad3b693a716cf6103345a157bf28dd60a7b07d32 
>   src/tests/containerizer/mesos_containerizer_tests.cpp 
> 449928c10b897061642af8ad267f8b70695940e6 
>   src/tests/slave_tests.cpp 4aed5d68e9a408821880ffaede482937be1999f4 
> 
> 
> Diff: https://reviews.apache.org/r/69615/diff/2/
> 
> 
> Testing
> ---
> 
> make check (Fedora 29)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 69705: Made agent not read the forked pid and libprocess pid after reboot.

2019-01-10 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69705/#review211830
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69705']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2750/mesos-review-69705

Relevant logs:

- 
[mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2750/mesos-review-69705/logs/mesos-tests.log):

```
W0110 16:01:56.494325 12084 slave.cpp:3933] Ignoring shutdown framework 
68d97b13-a7c5-4a37-933b-31279343ea71- because it is terminating
I0110 16:01:56.496320   864 master.cpp:1271] Agent 
68d97b13-a7c5-4a37-933b-31279343ea71-S0 at slave(464)@192.10.1.6:51097 
(windows-02.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net) disconnected
I0110 16:01:56.496320   864 master.cpp:3274] Disconnecting agent 
68d97b13-a7c5-4a37-933b-31279343ea71-S0 at slave(464)@192.10.1.6:51097 
(windows-02.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0110 16:01:56.496320   864 master.cpp:3293] Deactivating agent 
68d97b13-a7c5-4a37-933b-31279343ea71-S0 at slave(464)@192.10.1.6:51097 
(windows-02.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0110 16:01:56.497376  2248 hierarchical.cpp:358] Removed framework 
68d97b13-a7c5-4a37-933b-31279343ea71-
I0110 16:01:56.497376  2248 hierarchical.cpp:802] Agent 
68d97b13-a7c5-4a37-933b-31279343ea71-S0 deactivated
I0110 16:01:56.498402   864 containerizer.cpp:2469] Destroying container 
ad70a447-bd3b-49ad-918f-54a871fe6be9 in RUNNING state
I0110 16:01:56.498402   864 containerizer.cpp:3136] Transitioning the state of 
container ad70a447-bd3b-49ad-918f-54a871fe6be9 from RUNNING to DESTROYING
I0110 16:01:56.499404   864 launcher.cpp:161] Asked to destroy container 
ad70a447-bd3b-49ad-918f-54a871fe6be9
W0110 16:01:56.500319  6488 process.cpp:1423] Failed to recv on socket 
WindowsFD::Type::SOCKET=2588 to peer '192.10.1.6:52930': IO failed with er[ 
  OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (685 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (704 ms total)

[--] Global test environment tear-down
[==] 1082 tests from 104 test cases ran. (491907 ms total)
[  PASSED  ] 1081 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.Reboot, where TypeParam = class 
mesos::internal::slave::MesosContainerizer

 1 FAILED TEST
  YOU HAVE 231 DISABLED TESTS

ror code: The specified network name is no longer available.

W0110 16:01:56.500319  6488 process.cpp:838] Failed to recv on socket 
WindowsFD::Type::SOCKET=2500 to peer '192.10.1.6:52931': IO failed with error 
code: The specified network name is no longer available.

I0110 16:01:56.597012  2248 containerizer.cpp:2975] Container 
ad70a447-bd3b-49ad-918f-54a871fe6be9 has exited
I0110 16:01:56.625031  8360 master.cpp:] Master terminating
I0110 16:01:56.627017  6148 hierarchical.cpp:644] Removed agent 
68d97b13-a7c5-4a37-933b-31279343ea71-S0
I0110 16:01:56.899015  6488 process.cpp:927] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On Jan. 10, 2019, 2:52 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69705/
> ---
> 
> (Updated Jan. 10, 2019, 2:52 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9501
> https://issues.apache.org/jira/browse/MESOS-9501
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> After agent host is rebooted, the forked pid and libprocess pid in
> agent's meta directory are obsolete, so we should not read them during
> agent recovery, otherwise containerizer may wait for an irrelevant
> process if the forked pid is reused by another process after reboot.
> 
> 
> Diffs
> -
> 
>   src/slave/state.hpp 4f3d4cefb3fdef29cce3a6abe4cf5db04d45301f 
>   src/slave/state.cpp e7cf84993c74cf6da7fe22d5112e86e039780287 
> 
> 
> Diff: https://reviews.apache.org/r/69705/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 69705: Made agent not read the forked pid and libprocess pid after reboot.

2019-01-10 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69705/#review211829
---



Can you write a unit test for this by spoofing the reboot?

- Vinod Kone


On Jan. 10, 2019, 2:52 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69705/
> ---
> 
> (Updated Jan. 10, 2019, 2:52 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9501
> https://issues.apache.org/jira/browse/MESOS-9501
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> After agent host is rebooted, the forked pid and libprocess pid in
> agent's meta directory are obsolete, so we should not read them during
> agent recovery, otherwise containerizer may wait for an irrelevant
> process if the forked pid is reused by another process after reboot.
> 
> 
> Diffs
> -
> 
>   src/slave/state.hpp 4f3d4cefb3fdef29cce3a6abe4cf5db04d45301f 
>   src/slave/state.cpp e7cf84993c74cf6da7fe22d5112e86e039780287 
> 
> 
> Diff: https://reviews.apache.org/r/69705/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Review Request 69705: Made agent not read the forked pid and libprocess pid after reboot.

2019-01-10 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69705/
---

Review request for mesos, Andrei Budnik and Gilbert Song.


Bugs: MESOS-9501
https://issues.apache.org/jira/browse/MESOS-9501


Repository: mesos


Description
---

After agent host is rebooted, the forked pid and libprocess pid in
agent's meta directory are obsolete, so we should not read them during
agent recovery, otherwise containerizer may wait for an irrelevant
process if the forked pid is reused by another process after reboot.


Diffs
-

  src/slave/state.hpp 4f3d4cefb3fdef29cce3a6abe4cf5db04d45301f 
  src/slave/state.cpp e7cf84993c74cf6da7fe22d5112e86e039780287 


Diff: https://reviews.apache.org/r/69705/diff/1/


Testing
---


Thanks,

Qian Zhang



Re: Review Request 69662: Displayed resource provider information in the Mesos webui.

2019-01-10 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69662/#review211825
---



PASS: Mesos patch 69662 was successfully built and tested.

Reviews applied: `['69661', '69662']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2749/mesos-review-69662

- Mesos Reviewbot Windows


On Jan. 10, 2019, 5:19 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69662/
> ---
> 
> (Updated Jan. 10, 2019, 5:19 p.m.)
> 
> 
> Review request for mesos, Armand Grillet, Benjamin Mahler, and Chun-Hung 
> Hsiao.
> 
> 
> Bugs: MESOS-8380
> https://issues.apache.org/jira/browse/MESOS-8380
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Displayed resource provider information in the Mesos webui.
> 
> 
> Diffs
> -
> 
>   src/webui/app/agents/agent.html a101a93dcdb95f257fe0ee967c92d2cdc1c84f84 
>   src/webui/app/controllers.js 8049cf611895edea7c54b3c58d71e00d823a1fd3 
> 
> 
> Diff: https://reviews.apache.org/r/69662/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Ran a local test with a `./src/test-csi-plugin`.
> 
> 
> File Attachments
> 
> 
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/04/ed920e7b-4072-49be-8801-3b875d529fad__Screen_Shot_2019-01-04_at_11.11.51_AM.png
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/07/8f494c0f-1c76-4734-9aae-7fb899589120__Screen_Shot_2019-01-07_at_9.42.51_PM.png
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/07/8a850bcd-dd30-4d25-bb37-60cd872ddd62__Screen_Shot_2019-01-07_at_11.27.49_PM.png
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69337: Garbage collected disappeared resource providers from master state.

2019-01-10 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69337/
---

(Updated Jan. 10, 2019, 11:29 a.m.)


Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Changes
---

Consolidated RP GC in block only executed in RP info is received.


Bugs: MESOS-9384
https://issues.apache.org/jira/browse/MESOS-9384


Repository: mesos


Description
---

The master previously kept information on resource providers
indefinitely. This was confusing to API users who saw resource
providers reported which where not present anymore, and also made it
harder to derive actual cluster state.

With this patch we remove resource providers not reported by the agent
from master state. We still need to update the agent to not report
removed resource providers in a follow-up patch.


Diffs (updated)
-

  src/master/master.cpp 49b6e5c7d257bd9304215d47da13c9406c723cd8 
  src/messages/messages.proto 41e6a8a2eab0ae7c2878c1d3286c5dea0eb68ed7 


Diff: https://reviews.apache.org/r/69337/diff/4/

Changes: https://reviews.apache.org/r/69337/diff/3-4/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 69662: Displayed resource provider information in the Mesos webui.

2019-01-10 Thread Benjamin Bannier


> On Jan. 9, 2019, 11:19 p.m., Chun-Hung Hsiao wrote:
> > src/webui/app/controllers.js
> > Lines 697 (patched)
> > 
> >
> > Instead of setting `agent.resource_providers` with 
> > `state.resource_providers` and mutating each item, how about initializing 
> > it to `{}` and constructing each item one by one, like what we do for 
> > `agent.frameworks`, for consistency?

Done. I didn't go for an extra function since IMO there is little gain ATM.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69662/#review211802
---


On Jan. 10, 2019, 10:19 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69662/
> ---
> 
> (Updated Jan. 10, 2019, 10:19 a.m.)
> 
> 
> Review request for mesos, Armand Grillet, Benjamin Mahler, and Chun-Hung 
> Hsiao.
> 
> 
> Bugs: MESOS-8380
> https://issues.apache.org/jira/browse/MESOS-8380
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Displayed resource provider information in the Mesos webui.
> 
> 
> Diffs
> -
> 
>   src/webui/app/agents/agent.html a101a93dcdb95f257fe0ee967c92d2cdc1c84f84 
>   src/webui/app/controllers.js 8049cf611895edea7c54b3c58d71e00d823a1fd3 
> 
> 
> Diff: https://reviews.apache.org/r/69662/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Ran a local test with a `./src/test-csi-plugin`.
> 
> 
> File Attachments
> 
> 
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/04/ed920e7b-4072-49be-8801-3b875d529fad__Screen_Shot_2019-01-04_at_11.11.51_AM.png
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/07/8f494c0f-1c76-4734-9aae-7fb899589120__Screen_Shot_2019-01-07_at_9.42.51_PM.png
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/07/8a850bcd-dd30-4d25-bb37-60cd872ddd62__Screen_Shot_2019-01-07_at_11.27.49_PM.png
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69662: Displayed resource provider information in the Mesos webui.

2019-01-10 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69662/
---

(Updated Jan. 10, 2019, 10:19 a.m.)


Review request for mesos, Armand Grillet, Benjamin Mahler, and Chun-Hung Hsiao.


Changes
---

Addressed remaining issues from bmahler and chhsia0.


Bugs: MESOS-8380
https://issues.apache.org/jira/browse/MESOS-8380


Repository: mesos


Description
---

Displayed resource provider information in the Mesos webui.


Diffs (updated)
-

  src/webui/app/agents/agent.html a101a93dcdb95f257fe0ee967c92d2cdc1c84f84 
  src/webui/app/controllers.js 8049cf611895edea7c54b3c58d71e00d823a1fd3 


Diff: https://reviews.apache.org/r/69662/diff/5/

Changes: https://reviews.apache.org/r/69662/diff/4-5/


Testing
---

`make check`

Ran a local test with a `./src/test-csi-plugin`.


File Attachments


Screenshot Agent screen
  
https://reviews.apache.org/media/uploaded/files/2019/01/04/ed920e7b-4072-49be-8801-3b875d529fad__Screen_Shot_2019-01-04_at_11.11.51_AM.png
Screenshot Agent screen
  
https://reviews.apache.org/media/uploaded/files/2019/01/07/8f494c0f-1c76-4734-9aae-7fb899589120__Screen_Shot_2019-01-07_at_9.42.51_PM.png
Screenshot Agent screen
  
https://reviews.apache.org/media/uploaded/files/2019/01/07/8a850bcd-dd30-4d25-bb37-60cd872ddd62__Screen_Shot_2019-01-07_at_11.27.49_PM.png


Thanks,

Benjamin Bannier