Re: Review Request 69681: Fixed the FD leak if containerizer::_launch() failed or discarded.

2019-01-09 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69681/#review211792
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69681']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2730/mesos-review-69681

Relevant logs:

- 
[mesos-tests-cmake.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2730/mesos-review-69681/logs/mesos-tests-cmake.log):

```
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\zookeeper.c(3501):
 warning C4996: 'sprintf': This function or variable may be unsafe. Consider 
using sprintf_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. 
See online help for details. 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj]
 [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\zookeeper.c(3479):
 warning C4101: 'addrstr': unreferenced local variable 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj]
 [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\recordio.c(170):
 warning C4267: '=': conversion from 'size_t' to 'int32_t', possible loss of 
data 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj]
 [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\mt_adaptor.c(496):
 warning C4244: '=': conversion from 'time_t' to 'int32_t', possible loss of 
data 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj]
 [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\winport.c(256):
 warning C4090: 'function': different 'const' qualifiers 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj]
 [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\winport.c(166):
 warning C4716: 'pthread_cond_broadcast': must return a value 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj]
 [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\winport.c(205):
 warning C4716: 'pthread_cond_wait': must return a value 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj]
 [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(124):
 warning C4996: 'fopen': This function or variable may be unsafe. Consider 
using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See 
online help for details. 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(128):
 warning C4267: 'initializing': conversion from 'size_t' to 'int', possible 
loss of data 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(279):
 warning C4267: 'function': conversion from 'size_t' to 'int', possible loss of 
data 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(301):
 warning C4267: 'initializing': conversion from 'size_t' to 'int', possible 
loss of data 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(368):
 warning C4267: 'function': conversion from 'size_t' to 'int', possible loss of 
data 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(372):
 warning C4267: 'function': conversion from 'size_t' to 'int', possible loss of 
data 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] 
[D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
 
d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(512):
 warning C4267: 'function': conversion from 'size_t' to 'int

Re: Review Request 69681: Fixed the FD leak if containerizer::_launch() failed or discarded.

2019-01-07 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69681/#review211726
---




src/slave/containerizer/mesos/containerizer.cpp
Lines 1375 (patched)


This will *copy* the value of `_containerIO`, hence it is *always* empty.



src/slave/containerizer/mesos/containerizer.cpp
Lines 1377 (patched)


This might fail in case of discard is called *before* calling `.then` 
handler for `prepare()`.

So, it should be better to:
```
if (_containerIO.isSome()) {
  closeContainerIoFds(_containerIO.get());
}
```



src/slave/containerizer/mesos/containerizer.cpp
Lines 1378 (patched)


Will this lead to problem with a double `close` of the same FDs? We close 
FDs here, but these FDs will be closed in destructor when the IOSB terminates 
and then removes the same IO item from its hashmap.


- Andrei Budnik


On Jan. 7, 2019, 12:22 p.m., Gilbert Song wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69681/
> ---
> 
> (Updated Jan. 7, 2019, 12:22 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Jie Yu, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6632
> https://issues.apache.org/jira/browse/MESOS-6632
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> For the period between IOSB server is up and container process exec,
> if the containerizer launch returns failure or discarded, the FD used
> for signaling from the container process to the IOSB finish redirect
> will be leaked, which would cause the IOSB stuck at io::redirect
> forever. It would block the containerizer from cleaning up this
> container at IOSB.
> 
> This issue could be exposed if there are frequent check containers
> launch on an agent with heavy loads.
> 
> 
> Diffs
> -
> 
>   include/mesos/slave/containerizer.hpp 
> c34f1296586ea5d26619605f8f6a5ae9444f1a96 
>   src/slave/containerizer/mesos/containerizer.cpp 
> a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
>   src/slave/containerizer/mesos/io/switchboard.cpp 
> c445a8f09d7671d5763281bac9881489b3cc9c39 
>   src/slave/containerizer/mesos/utils.hpp 
> bfd07e28c78fc140e395ffccd11d65545bf007fc 
>   src/slave/containerizer/mesos/utils.cpp 
> d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 
> 
> 
> Diff: https://reviews.apache.org/r/69681/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>



Re: Review Request 69681: Fixed the FD leak if containerizer::_launch() failed or discarded.

2019-01-07 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69681/#review211724
---




include/mesos/slave/containerizer.hpp
Lines 85 (patched)


s/Option()/Optionnone()/



include/mesos/slave/containerizer.hpp
Lines 90 (patched)


Ditto.



src/slave/containerizer/mesos/containerizer.cpp
Lines 1366-1371 (original), 1368-1373 (patched)


Can we do the `repair` inside this `then()`? Something like:
```
  .then(defer(self(), [=](const Option& containerIO) {
return _launch(containerId, containerIO, environment, 
pidCheckpointPath)
  .repair([&containerIO](
  const Future& result) {
CHECK(containerIO.isSome());
closeContainerIoFds(containerIO.get());
return result;
  });
  }));
```
In this way we do not need the local variable `_containerIO`.



src/slave/containerizer/mesos/utils.cpp
Lines 22 (patched)


Why do we need this header?



src/slave/containerizer/mesos/utils.cpp
Lines 153-154 (patched)


We could merge these two lines into one.


- Qian Zhang


On Jan. 7, 2019, 8:22 p.m., Gilbert Song wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69681/
> ---
> 
> (Updated Jan. 7, 2019, 8:22 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Jie Yu, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6632
> https://issues.apache.org/jira/browse/MESOS-6632
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> For the period between IOSB server is up and container process exec,
> if the containerizer launch returns failure or discarded, the FD used
> for signaling from the container process to the IOSB finish redirect
> will be leaked, which would cause the IOSB stuck at io::redirect
> forever. It would block the containerizer from cleaning up this
> container at IOSB.
> 
> This issue could be exposed if there are frequent check containers
> launch on an agent with heavy loads.
> 
> 
> Diffs
> -
> 
>   include/mesos/slave/containerizer.hpp 
> c34f1296586ea5d26619605f8f6a5ae9444f1a96 
>   src/slave/containerizer/mesos/containerizer.cpp 
> a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
>   src/slave/containerizer/mesos/io/switchboard.cpp 
> c445a8f09d7671d5763281bac9881489b3cc9c39 
>   src/slave/containerizer/mesos/utils.hpp 
> bfd07e28c78fc140e395ffccd11d65545bf007fc 
>   src/slave/containerizer/mesos/utils.cpp 
> d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 
> 
> 
> Diff: https://reviews.apache.org/r/69681/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>



Review Request 69681: Fixed the FD leak if containerizer::_launch() failed or discarded.

2019-01-07 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69681/
---

Review request for mesos, Andrei Budnik, Jie Yu, Qian Zhang, and Vinod Kone.


Bugs: MESOS-6632
https://issues.apache.org/jira/browse/MESOS-6632


Repository: mesos


Description
---

For the period between IOSB server is up and container process exec,
if the containerizer launch returns failure or discarded, the FD used
for signaling from the container process to the IOSB finish redirect
will be leaked, which would cause the IOSB stuck at io::redirect
forever. It would block the containerizer from cleaning up this
container at IOSB.

This issue could be exposed if there are frequent check containers
launch on an agent with heavy loads.


Diffs
-

  include/mesos/slave/containerizer.hpp 
c34f1296586ea5d26619605f8f6a5ae9444f1a96 
  src/slave/containerizer/mesos/containerizer.cpp 
a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
  src/slave/containerizer/mesos/io/switchboard.cpp 
c445a8f09d7671d5763281bac9881489b3cc9c39 
  src/slave/containerizer/mesos/utils.hpp 
bfd07e28c78fc140e395ffccd11d65545bf007fc 
  src/slave/containerizer/mesos/utils.cpp 
d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 


Diff: https://reviews.apache.org/r/69681/diff/1/


Testing
---

make check


Thanks,

Gilbert Song