Re: Review Request 69667: Sent SIGKILL to I/O switchboard server as a safeguard.

2019-01-07 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69667/#review211750
---


Ship it!




Ship It!

- Gilbert Song


On Jan. 4, 2019, 5:25 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69667/
> ---
> 
> (Updated Jan. 4, 2019, 5:25 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7042
> https://issues.apache.org/jira/browse/MESOS-7042
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Sent SIGKILL to I/O switchboard server as a safeguard.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/io/switchboard.cpp 
> c445a8f09d7671d5763281bac9881489b3cc9c39 
> 
> 
> Diff: https://reviews.apache.org/r/69667/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 69689: Fixed a flaky master volume authorization failure test.

2019-01-07 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69689/#review211749
---



Thanks for fixing this! Can you describe what made the test flaky and how the 
patch fixes the issue (similar to your comment in the code)?


src/tests/master_tests.cpp
Lines 10336-10341 (patched)


While we are here, can you pause the clock at the beginning and advance the 
clock to trigger the agent registration and etc.


- Meng Zhu


On Jan. 7, 2019, 10:52 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69689/
> ---
> 
> (Updated Jan. 7, 2019, 10:52 a.m.)
> 
> 
> Review request for mesos and Meng Zhu.
> 
> 
> Bugs: MESOS-9495
> https://issues.apache.org/jira/browse/MESOS-9495
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 51fcbe8d72fce1e56790d22ab7deced4fdb3bf0e 
> 
> 
> Diff: https://reviews.apache.org/r/69689/diff/1/
> 
> 
> Testing
> ---
> 
> Ran in repetition before (could trigger the failure) and after (no failures 
> yet).
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 69662: Displayed resource provider information in the Mesos webui.

2019-01-07 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69662/
---

(Updated Jan. 7, 2019, 11:33 p.m.)


Review request for mesos, Armand Grillet, Benjamin Mahler, and Chun-Hung Hsiao.


Changes
---

Fixed attribute typo.


Bugs: MESOS-8380
https://issues.apache.org/jira/browse/MESOS-8380


Repository: mesos


Description
---

Displayed resource provider information in the Mesos webui.


Diffs (updated)
-

  src/webui/app/agents/agent.html a101a93dcdb95f257fe0ee967c92d2cdc1c84f84 
  src/webui/app/controllers.js 8049cf611895edea7c54b3c58d71e00d823a1fd3 


Diff: https://reviews.apache.org/r/69662/diff/4/

Changes: https://reviews.apache.org/r/69662/diff/3-4/


Testing
---

`make check`

Ran a local test with a `./src/test-csi-plugin`.


File Attachments (updated)


Screenshot Agent screen
  
https://reviews.apache.org/media/uploaded/files/2019/01/04/ed920e7b-4072-49be-8801-3b875d529fad__Screen_Shot_2019-01-04_at_11.11.51_AM.png
Screenshot Agent screen
  
https://reviews.apache.org/media/uploaded/files/2019/01/07/8f494c0f-1c76-4734-9aae-7fb899589120__Screen_Shot_2019-01-07_at_9.42.51_PM.png
Screenshot Agent screen
  
https://reviews.apache.org/media/uploaded/files/2019/01/07/8a850bcd-dd30-4d25-bb37-60cd872ddd62__Screen_Shot_2019-01-07_at_11.27.49_PM.png


Thanks,

Benjamin Bannier



Re: Review Request 69662: Displayed resource provider information in the Mesos webui.

2019-01-07 Thread Benjamin Bannier


> On Jan. 7, 2019, 10:52 p.m., Benjamin Mahler wrote:
> > Hm.. it looks like the rows are bigger in this table? (I measured in the 
> > screenshot and it appears to be the case). is it missing some attributes 
> > that the reservation table has?

Very good catch, thanks! There was a typo in one of the table attributes 
(`table-triped` instead of `table-striped`). For some reason this lead to 
bootstrap not applying the properties for `table-condensed` (i.e., `padding: 
8px` instead of `padding: 5px`). This should be fixed now, screenshot is 
updated. Note that this table is modelled after the agents table in 
`agents.html`, but it should still look similar to the resources table below.


To debug this without even starting a resource provider (`test-csi-plugin` is 
not supported on macos), I hardcoded the RP information during development and 
confirmed with a real RP,
```
diff --git a/src/webui/app/controllers.js b/src/webui/app/controllers.js
index 08c9f1c0b..57491c6bd 100644
--- a/src/webui/app/controllers.js
+++ b/src/webui/app/controllers.js
@@ -694,7 +694,97 @@
   $scope.agent = {};
   $scope.agent.frameworks = {};
   $scope.agent.completed_frameworks = {};
-  $scope.agent.resource_providers = $scope.state.resource_providers;
+  $scope.agent.resource_providers = [{
+"resource_provider_info": {
+  "id": {
+"value": "bea4ab63-5a61-45ad-a00b-1a3345f46ca2"
+  },
+  "type": "org.apache.mesos.rp.local.storage",
+  "name": "test_slrp",
+  "default_reservations": [
+{
+  "type": "DYNAMIC",
+  "role": "test-role"
+}
+  ],
+  "storage": {
+"plugin": {
+  "type": "org.apache.mesos.csi.test",
+  "name": "test_plugin",
+  "containers": [
+{
+  "services": [
+"CONTROLLER_SERVICE",
+"NODE_SERVICE"
+  ],
+  "command": {
+"uris": [
+  {
+"value": 
"/home/bbannier/src/mesos/_/src/test-csi-plugin",
+"executable": true,
+"extract": true
+  }
+],
+"shell": true,
+"value": "./test-csi-plugin --available_capacity=2GB 
--work_dir=workdir --volumes='volume1:1GB;volume2:2GB'"
+  },
+  "resources": [
+{
+  "name": "cpus",
+  "type": "SCALAR",
+  "scalar": {
+"value": 0.1
+  }
+},
+{
+  "name": "mem",
+  "type": "SCALAR",
+  "scalar": {
+"value": 200
+  }
+}
+  ]
+}
+  ]
+}
+  }
+},
+"total_resources": [
+  {
+"provider_id": {
+  "value": "bea4ab63-5a61-45ad-a00b-1a3345f46ca2"
+},
+"name": "disk",
+"type": "SCALAR",
+"scalar": {
+  "value": 1024
+},
+"reservations": [
+  {
+"type": "DYNAMIC",
+"role": "test-role"
+  }
+],
+"disk": {
+  "source": {
+"type": "RAW",
+"vendor": "org.apache.mesos.csi.test.test_plugin",
+"id": "volume1",
+"metadata": {
+  "labels": [
+{
+  "key": "path",
+  "value": "workdir/1GB-volume1"
+}
+  ]
+}
+  }
+}
+  }
+]
+  }
+  ];
+
   $scope.agent.url_prefix = agentURLPrefix(agent, false);
 
   // The agent attaches a "/slave/log" file when either
```


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69662/#review211741
---


On Jan. 7, 2019, 11:33 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69662/
> 

Re: Review Request 69662: Displayed resource provider information in the Mesos webui.

2019-01-07 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69662/#review211741
---



Hm.. it looks like the rows are bigger in this table? (I measured in the 
screenshot and it appears to be the case). is it missing some attributes that 
the reservation table has?

- Benjamin Mahler


On Jan. 7, 2019, 9:02 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69662/
> ---
> 
> (Updated Jan. 7, 2019, 9:02 p.m.)
> 
> 
> Review request for mesos, Armand Grillet, Benjamin Mahler, and Chun-Hung 
> Hsiao.
> 
> 
> Bugs: MESOS-8380
> https://issues.apache.org/jira/browse/MESOS-8380
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Displayed resource provider information in the Mesos webui.
> 
> 
> Diffs
> -
> 
>   src/webui/app/agents/agent.html a101a93dcdb95f257fe0ee967c92d2cdc1c84f84 
>   src/webui/app/controllers.js 8049cf611895edea7c54b3c58d71e00d823a1fd3 
> 
> 
> Diff: https://reviews.apache.org/r/69662/diff/3/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Ran a local test with a `./src/test-csi-plugin`.
> 
> 
> File Attachments
> 
> 
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/04/ed920e7b-4072-49be-8801-3b875d529fad__Screen_Shot_2019-01-04_at_11.11.51_AM.png
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/07/8f494c0f-1c76-4734-9aae-7fb899589120__Screen_Shot_2019-01-07_at_9.42.51_PM.png
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69690: Changed the gRPC make variables for protoc detection.

2019-01-07 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69690/#review211738
---


Ship it!




Ship It!

- Jie Yu


On Jan. 7, 2019, 8:19 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69690/
> ---
> 
> (Updated Jan. 7, 2019, 8:19 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Since gRPC's Makefile use `HAS_XXX` as public-facing third-party
> library control flags, we should use them to detect protoc instead of
> setting the `NO_PROTOC` internal variable.
> 
> 
> Diffs
> -
> 
>   3rdparty/Makefile.am a14216cf98f6638da06aa3dfc49e6b319fea7f87 
> 
> 
> Diff: https://reviews.apache.org/r/69690/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69662: Displayed resource provider information in the Mesos webui.

2019-01-07 Thread Benjamin Bannier


> On Jan. 7, 2019, 7:14 p.m., Benjamin Mahler wrote:
> > src/webui/app/agents/agent.html
> > Lines 176-179 (patched)
> > 
> >
> > Is this hidden if there are no providers? Can you show a screenshot of 
> > the entire agent UI rather than just a cropped version? One where there are 
> > no providers would be helpful as well (my inclination is to hide it in that 
> > case)

Hiding the table now if no resource providers are present.

During the transition period this might be confusing to users since it is hard 
to tell whether their UI does not display any resource providers, or if no 
resource providers are present, but hiding it is the right long-term solution 
and I believe acceptable for an experimental feature.


> On Jan. 7, 2019, 7:14 p.m., Benjamin Mahler wrote:
> > src/webui/app/agents/agent.html
> > Lines 184-186 (patched)
> > 
> >
> > Do we actually expect resource providers to be providing cpus and mem? 
> > I would suggest removing these to simplify the UI
> > 
> > Maybe at some point we want to show other resource names being 
> > provided? Maybe the column should just be 'Resources' and a stringified 
> > version is shown?

re:stringification, I am hesitant to put a stringified resource string into 
`/state`, and we currently do not provide a resource stringification function 
in the webui javascript code. As writing one would be a bigger undertaking 
(e.g., requiring extensive testing to ensure the javascript implementation 
stays in sync with what we use in Mesos), I'd prefer to punt on that for now; I 
could also imagine that we would eventually want a resource representation 
which exposes more low-level information (for SLRPs resources e.g., the 
individual disks and their metadata).

I am just exposing aggregated `disk` resources in the UI now since only storage 
resource providers are currently implemented (which I gratuitously assert only 
provide `disk` resources). I tried to add a `console.warn` to the controller 
code so that it warns should other resource kinds show up, but this is not 
allowed by the linter (and maybe not a good idea, either).


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69662/#review211732
---


On Jan. 7, 2019, 10:02 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69662/
> ---
> 
> (Updated Jan. 7, 2019, 10:02 p.m.)
> 
> 
> Review request for mesos, Armand Grillet, Benjamin Mahler, and Chun-Hung 
> Hsiao.
> 
> 
> Bugs: MESOS-8380
> https://issues.apache.org/jira/browse/MESOS-8380
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Displayed resource provider information in the Mesos webui.
> 
> 
> Diffs
> -
> 
>   src/webui/app/agents/agent.html a101a93dcdb95f257fe0ee967c92d2cdc1c84f84 
>   src/webui/app/controllers.js 8049cf611895edea7c54b3c58d71e00d823a1fd3 
> 
> 
> Diff: https://reviews.apache.org/r/69662/diff/3/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Ran a local test with a `./src/test-csi-plugin`.
> 
> 
> File Attachments
> 
> 
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/04/ed920e7b-4072-49be-8801-3b875d529fad__Screen_Shot_2019-01-04_at_11.11.51_AM.png
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/07/8f494c0f-1c76-4734-9aae-7fb899589120__Screen_Shot_2019-01-07_at_9.42.51_PM.png
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69662: Displayed resource provider information in the Mesos webui.

2019-01-07 Thread Benjamin Bannier


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69662/#review211732
---


On Jan. 7, 2019, 10:02 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69662/
> ---
> 
> (Updated Jan. 7, 2019, 10:02 p.m.)
> 
> 
> Review request for mesos, Armand Grillet, Benjamin Mahler, and Chun-Hung 
> Hsiao.
> 
> 
> Bugs: MESOS-8380
> https://issues.apache.org/jira/browse/MESOS-8380
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Displayed resource provider information in the Mesos webui.
> 
> 
> Diffs
> -
> 
>   src/webui/app/agents/agent.html a101a93dcdb95f257fe0ee967c92d2cdc1c84f84 
>   src/webui/app/controllers.js 8049cf611895edea7c54b3c58d71e00d823a1fd3 
> 
> 
> Diff: https://reviews.apache.org/r/69662/diff/3/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Ran a local test with a `./src/test-csi-plugin`.
> 
> 
> File Attachments
> 
> 
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/04/ed920e7b-4072-49be-8801-3b875d529fad__Screen_Shot_2019-01-04_at_11.11.51_AM.png
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/07/8f494c0f-1c76-4734-9aae-7fb899589120__Screen_Shot_2019-01-07_at_9.42.51_PM.png
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69662: Displayed resource provider information in the Mesos webui.

2019-01-07 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69662/
---

(Updated Jan. 7, 2019, 10:02 p.m.)


Review request for mesos, Armand Grillet, Benjamin Mahler, and Chun-Hung Hsiao.


Changes
---

Addressed issues raised by bmahler.


Bugs: MESOS-8380
https://issues.apache.org/jira/browse/MESOS-8380


Repository: mesos


Description
---

Displayed resource provider information in the Mesos webui.


Diffs (updated)
-

  src/webui/app/agents/agent.html a101a93dcdb95f257fe0ee967c92d2cdc1c84f84 
  src/webui/app/controllers.js 8049cf611895edea7c54b3c58d71e00d823a1fd3 


Diff: https://reviews.apache.org/r/69662/diff/3/

Changes: https://reviews.apache.org/r/69662/diff/2-3/


Testing
---

`make check`

Ran a local test with a `./src/test-csi-plugin`.


File Attachments (updated)


Screenshot Agent screen
  
https://reviews.apache.org/media/uploaded/files/2019/01/04/ed920e7b-4072-49be-8801-3b875d529fad__Screen_Shot_2019-01-04_at_11.11.51_AM.png
Screenshot Agent screen
  
https://reviews.apache.org/media/uploaded/files/2019/01/07/8f494c0f-1c76-4734-9aae-7fb899589120__Screen_Shot_2019-01-07_at_9.42.51_PM.png


Thanks,

Benjamin Bannier



Re: Review Request 69615: Disable containerizer ptrace attach.

2019-01-07 Thread Xudong Ni via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69615/#review211735
---


Ship it!




Ship It!

- Xudong Ni


On Jan. 2, 2019, 5:15 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69615/
> ---
> 
> (Updated Jan. 2, 2019, 5:15 p.m.)
> 
> 
> Review request for mesos, Xudong Ni, Gilbert Song, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-9349
> https://issues.apache.org/jira/browse/MESOS-9349
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Use `prctl(PR_SET_DUMPABLE)` to disable the ability to attach to
> the containerizer process(es) on Linux systems. This prevents
> unprivileged containerized processes from reading information
> about the containerizer process(es) from `/proc`. This gives an
> additional layer of protection against leaking information to
> untrusted container processes.
> 
> 
> Diffs
> -
> 
>   docs/configuration/agent.md 330283f4e3957075dd4310de4a841feac23de36c 
>   src/launcher/executor.cpp f962e800f23d5582b1bc04a263253893492a5054 
>   src/slave/containerizer/mesos/containerizer.cpp 
> a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
>   src/slave/containerizer/mesos/launch.hpp 
> 0a6394d56321948ad760ac69c05456319a254842 
>   src/slave/containerizer/mesos/launch.cpp 
> 2f1c9e7a8748c9d7eab25bc8567ca68308e680f9 
>   src/slave/flags.hpp 494ae02ab5eb365e2cda5017be573691107c3f28 
>   src/slave/flags.cpp 6bac8e1409f04d639204c45eda8a90c098e3dbd0 
>   src/slave/slave.cpp ad3b693a716cf6103345a157bf28dd60a7b07d32 
>   src/tests/containerizer/mesos_containerizer_tests.cpp 
> 449928c10b897061642af8ad267f8b70695940e6 
>   src/tests/slave_tests.cpp 4aed5d68e9a408821880ffaede482937be1999f4 
> 
> 
> Diff: https://reviews.apache.org/r/69615/diff/2/
> 
> 
> Testing
> ---
> 
> make check (Fedora 29)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Review Request 69690: Changed the gRPC make variables for protoc detection.

2019-01-07 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69690/
---

Review request for mesos and Jie Yu.


Repository: mesos


Description
---

Since gRPC's Makefile use `HAS_XXX` as public-facing third-party
library control flags, we should use them to detect protoc instead of
setting the `NO_PROTOC` internal variable.


Diffs
-

  3rdparty/Makefile.am a14216cf98f6638da06aa3dfc49e6b319fea7f87 


Diff: https://reviews.apache.org/r/69690/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 69671: Always build gRPC with its embedded c-ares.

2019-01-07 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69671/
---

(Updated Jan. 7, 2019, 8:18 p.m.)


Review request for mesos and Jie Yu.


Changes
---

Addressed Jie's comment.


Bugs: MESOS-9505
https://issues.apache.org/jira/browse/MESOS-9505


Repository: mesos


Description
---

We compile gRPC with its embedded c-ares library to avoid a link error
because Mesos is not aware of the library.


Diffs (updated)
-

  3rdparty/Makefile.am a14216cf98f6638da06aa3dfc49e6b319fea7f87 


Diff: https://reviews.apache.org/r/69671/diff/2/

Changes: https://reviews.apache.org/r/69671/diff/1-2/


Testing
---

make check with c-ares pre-installed on mac and ubuntu.


Thanks,

Chun-Hung Hsiao



Re: Review Request 69667: Sent SIGKILL to I/O switchboard server as a safeguard.

2019-01-07 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69667/#review211733
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 5, 2019, 1:25 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69667/
> ---
> 
> (Updated Jan. 5, 2019, 1:25 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7042
> https://issues.apache.org/jira/browse/MESOS-7042
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Sent SIGKILL to I/O switchboard server as a safeguard.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/io/switchboard.cpp 
> c445a8f09d7671d5763281bac9881489b3cc9c39 
> 
> 
> Diff: https://reviews.apache.org/r/69667/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Review Request 69689: Fixed a flaky master volume authorization failure test.

2019-01-07 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69689/
---

Review request for mesos and Meng Zhu.


Bugs: MESOS-9495
https://issues.apache.org/jira/browse/MESOS-9495


Repository: mesos


Description
---

See summary.


Diffs
-

  src/tests/master_tests.cpp 51fcbe8d72fce1e56790d22ab7deced4fdb3bf0e 


Diff: https://reviews.apache.org/r/69689/diff/1/


Testing
---

Ran in repetition before (could trigger the failure) and after (no failures 
yet).


Thanks,

Benjamin Mahler



Re: Review Request 69662: Displayed resource provider information in the Mesos webui.

2019-01-07 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69662/#review211732
---




src/webui/app/agents/agent.html
Lines 176-179 (patched)


Is this hidden if there are no providers? Can you show a screenshot of the 
entire agent UI rather than just a cropped version? One where there are no 
providers would be helpful as well (my inclination is to hide it in that case)



src/webui/app/agents/agent.html
Lines 184-186 (patched)


Do we actually expect resource providers to be providing cpus and mem? I 
would suggest removing these to simplify the UI

Maybe at some point we want to show other resource names being provided? 
Maybe the column should just be 'Resources' and a stringified version is shown?


- Benjamin Mahler


On Jan. 4, 2019, 10:14 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69662/
> ---
> 
> (Updated Jan. 4, 2019, 10:14 a.m.)
> 
> 
> Review request for mesos, Armand Grillet, Benjamin Mahler, and Chun-Hung 
> Hsiao.
> 
> 
> Bugs: MESOS-8380
> https://issues.apache.org/jira/browse/MESOS-8380
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Displayed resource provider information in the Mesos webui.
> 
> 
> Diffs
> -
> 
>   src/webui/app/agents/agent.html a101a93dcdb95f257fe0ee967c92d2cdc1c84f84 
>   src/webui/app/controllers.js 8049cf611895edea7c54b3c58d71e00d823a1fd3 
> 
> 
> Diff: https://reviews.apache.org/r/69662/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Ran a local test with a `./src/test-csi-plugin`.
> 
> 
> File Attachments
> 
> 
> Screenshot Agent screen
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/04/ed920e7b-4072-49be-8801-3b875d529fad__Screen_Shot_2019-01-04_at_11.11.51_AM.png
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69599: Pulled up the resource quantities class for more general use.

2019-01-07 Thread Benjamin Mahler


> On Jan. 7, 2019, 5:14 a.m., Benjamin Mahler wrote:
> > src/master/allocator/sorter/drf/sorter.cpp
> > Line 671 (original), 671 (patched)
> > 
> >
> > Why do you say to consider this, we actually don't do anything to 
> > ensure this AFAIK, so it would fail, no?
> 
> Meng Zhu wrote:
> This denotes total resources in the cluster, conceptually should never be 
> negative? I glanced the code, this invariant seems to hold (make check also 
> passes). But I removed it. Let's revisit it later.

> conceptually should never be negative?

Whoops, I initially read the check to be > 0 (and that's what my comment was 
referring to: 0 seems to be possible).


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69599/#review211714
---


On Jan. 7, 2019, 6:23 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69599/
> ---
> 
> (Updated Jan. 7, 2019, 6:23 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There are many places that we need to express the concept of
> resource quantities such as enforcing allocation quantities
> in the allocator and set guaranteed resource quantities with quota.
> However, the current code usually uses the complex Resources
> class which is unnecessary and inefficient.
> 
> This patch pulls class ScalarResourceQuantities in sorter.hpp
> up, aiming to use it for all resource quantity related usages.
> We mostly preserve the map interface and added other utilities such
> as parsing.
> 
> 
> Diffs
> -
> 
>   src/CMakeLists.txt bde070445b644e15d46c390d1c983caabaa1fec8 
>   src/Makefile.am 7a4904a3d67479267087fd2313a263d8218843fa 
>   src/common/resource_quantities.hpp PRE-CREATION 
>   src/common/resource_quantities.cpp PRE-CREATION 
>   src/master/allocator/sorter/drf/sorter.hpp 
> 084df82baef91eca5775b0bd17d943f3fb8df70b 
>   src/master/allocator/sorter/drf/sorter.cpp 
> a648fac7e922ab0aefdf9363624d7242f1fc6588 
>   src/master/allocator/sorter/random/sorter.hpp 
> 800b22c67126c2b14c5259d9d122d2e196cc80d8 
>   src/master/allocator/sorter/sorter.hpp 
> 68cf6988ef1a156cf16951e3164261234e9abeeb 
> 
> 
> Diff: https://reviews.apache.org/r/69599/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> Dedicated test in r/69600
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 69599: Pulled up the resource quantities class for more general use.

2019-01-07 Thread Meng Zhu


> On Jan. 6, 2019, 9:14 p.m., Benjamin Mahler wrote:
> >

oops, forgot to publish the comment.


> On Jan. 6, 2019, 9:14 p.m., Benjamin Mahler wrote:
> > src/common/resource_quantities.hpp
> > Lines 81 (patched)
> > 
> >
> > This TODO looks like it belongs elsewhere, like at the top or after 
> > this function, or just remove it?

Removed.


> On Jan. 6, 2019, 9:14 p.m., Benjamin Mahler wrote:
> > src/master/allocator/sorter/drf/sorter.cpp
> > Line 671 (original), 671 (patched)
> > 
> >
> > Why do you say to consider this, we actually don't do anything to 
> > ensure this AFAIK, so it would fail, no?

This denotes total resources in the cluster, conceptually should never be 
negative? I glanced the code, this invariant seems to hold (make check also 
passes). But I removed it. Let's revisit it later.


> On Jan. 6, 2019, 9:14 p.m., Benjamin Mahler wrote:
> > src/master/allocator/sorter/drf/sorter.cpp
> > Lines 675-676 (original), 678-681 (patched)
> > 
> >
> > I find the if condition here adds unnecessary load on the reader, there 
> > isn't really anything special about 0 to warrant a special case?
> > 
> > E.g.
> > 
> > ```
> > Value::Scalar allocation = node->allocation.totals
> >   .get(resourceName)
> >   .getOrElse(Value::Scalar()); // Absent means zero.
> > 
> > share = std::max(share, allocation.value() / total.value());
> > ```
> > 
> > If this is a performance optimization, can you show it's helpful?
> > 
> > It looks like this comment was missed in the earlier review?

Sorry, I forgot to reply to the comment earlier. Yeah, I was mainly thinking of 
performance since this is a performance critical place. Took your suggestion, 
will do a profiling soon.


> On Jan. 6, 2019, 9:14 p.m., Benjamin Mahler wrote:
> > src/master/allocator/sorter/drf/sorter.cpp
> > Lines 680 (patched)
> > 
> >
> > As a general rule I would suggest using CHECK_NOTNONE in cases where we 
> > do unguarded retrieval, e.g.
> > 
> > ```
> > CHECK_NOTNONE(Resources::parse("cpus:1"))
> > ```
> > 
> > In cases where the logic around whether it's some is complex and we 
> > want to sanity check it, probably CHECK_SOME is better:
> > 
> > ```
> > // Many cases but should always lead to some
> > 
> > CHECK_SOME(v);
> > 
> > some code
> > v->foo();
> > ```
> > 
> > But in cases like this, where we're actually inside the guarded block, 
> > just use ->
> > 
> > ```
> > if (allocation.isSome()) { 
> >   share = std::max(share, allocation->value() / scalar.value());
> > ```

Thanks! Will keep this in mind.


- Meng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69599/#review211714
---


On Jan. 6, 2019, 10:23 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69599/
> ---
> 
> (Updated Jan. 6, 2019, 10:23 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There are many places that we need to express the concept of
> resource quantities such as enforcing allocation quantities
> in the allocator and set guaranteed resource quantities with quota.
> However, the current code usually uses the complex Resources
> class which is unnecessary and inefficient.
> 
> This patch pulls class ScalarResourceQuantities in sorter.hpp
> up, aiming to use it for all resource quantity related usages.
> We mostly preserve the map interface and added other utilities such
> as parsing.
> 
> 
> Diffs
> -
> 
>   src/CMakeLists.txt bde070445b644e15d46c390d1c983caabaa1fec8 
>   src/Makefile.am 7a4904a3d67479267087fd2313a263d8218843fa 
>   src/common/resource_quantities.hpp PRE-CREATION 
>   src/common/resource_quantities.cpp PRE-CREATION 
>   src/master/allocator/sorter/drf/sorter.hpp 
> 084df82baef91eca5775b0bd17d943f3fb8df70b 
>   src/master/allocator/sorter/drf/sorter.cpp 
> a648fac7e922ab0aefdf9363624d7242f1fc6588 
>   src/master/allocator/sorter/random/sorter.hpp 
> 800b22c67126c2b14c5259d9d122d2e196cc80d8 
>   src/master/allocator/sorter/sorter.hpp 
> 68cf6988ef1a156cf16951e3164261234e9abeeb 
> 
> 
> Diff: https://reviews.apache.org/r/69599/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> Dedicated test in r/69600
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 69599: Pulled up the resource quantities class for more general use.

2019-01-07 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69599/#review211727
---


Ship it!




Ship It!

- Benjamin Mahler


On Jan. 7, 2019, 6:23 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69599/
> ---
> 
> (Updated Jan. 7, 2019, 6:23 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There are many places that we need to express the concept of
> resource quantities such as enforcing allocation quantities
> in the allocator and set guaranteed resource quantities with quota.
> However, the current code usually uses the complex Resources
> class which is unnecessary and inefficient.
> 
> This patch pulls class ScalarResourceQuantities in sorter.hpp
> up, aiming to use it for all resource quantity related usages.
> We mostly preserve the map interface and added other utilities such
> as parsing.
> 
> 
> Diffs
> -
> 
>   src/CMakeLists.txt bde070445b644e15d46c390d1c983caabaa1fec8 
>   src/Makefile.am 7a4904a3d67479267087fd2313a263d8218843fa 
>   src/common/resource_quantities.hpp PRE-CREATION 
>   src/common/resource_quantities.cpp PRE-CREATION 
>   src/master/allocator/sorter/drf/sorter.hpp 
> 084df82baef91eca5775b0bd17d943f3fb8df70b 
>   src/master/allocator/sorter/drf/sorter.cpp 
> a648fac7e922ab0aefdf9363624d7242f1fc6588 
>   src/master/allocator/sorter/random/sorter.hpp 
> 800b22c67126c2b14c5259d9d122d2e196cc80d8 
>   src/master/allocator/sorter/sorter.hpp 
> 68cf6988ef1a156cf16951e3164261234e9abeeb 
> 
> 
> Diff: https://reviews.apache.org/r/69599/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> Dedicated test in r/69600
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Review Request 69687: Fixed flakiness of resource provider ContainerTerminationMetric test.

2019-01-07 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69687/
---

Review request for mesos and Chun-Hung Hsiao.


Bugs: MESOS-9130
https://issues.apache.org/jira/browse/MESOS-9130


Repository: mesos


Description
---

Plugin restarts can currently fail if the resource provider is killed
before it has performed minimal reconciliation steps. This patch adds
additional synchronization to ensure the plugin container can safely be
killed to safely perform the desired test.


Diffs
-

  src/tests/storage_local_resource_provider_tests.cpp 
e8ed20f818ed7f1a3ce15758ea3c366520443377 


Diff: https://reviews.apache.org/r/69687/diff/1/


Testing
---

* `make check`
* ran test under stress for ~1000 iterations w/o issues


Thanks,

Benjamin Bannier



Re: Review Request 69681: Fixed the FD leak if containerizer::_launch() failed or discarded.

2019-01-07 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69681/#review211726
---




src/slave/containerizer/mesos/containerizer.cpp
Lines 1375 (patched)


This will *copy* the value of `_containerIO`, hence it is *always* empty.



src/slave/containerizer/mesos/containerizer.cpp
Lines 1377 (patched)


This might fail in case of discard is called *before* calling `.then` 
handler for `prepare()`.

So, it should be better to:
```
if (_containerIO.isSome()) {
  closeContainerIoFds(_containerIO.get());
}
```



src/slave/containerizer/mesos/containerizer.cpp
Lines 1378 (patched)


Will this lead to problem with a double `close` of the same FDs? We close 
FDs here, but these FDs will be closed in destructor when the IOSB terminates 
and then removes the same IO item from its hashmap.


- Andrei Budnik


On Jan. 7, 2019, 12:22 p.m., Gilbert Song wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69681/
> ---
> 
> (Updated Jan. 7, 2019, 12:22 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Jie Yu, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6632
> https://issues.apache.org/jira/browse/MESOS-6632
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> For the period between IOSB server is up and container process exec,
> if the containerizer launch returns failure or discarded, the FD used
> for signaling from the container process to the IOSB finish redirect
> will be leaked, which would cause the IOSB stuck at io::redirect
> forever. It would block the containerizer from cleaning up this
> container at IOSB.
> 
> This issue could be exposed if there are frequent check containers
> launch on an agent with heavy loads.
> 
> 
> Diffs
> -
> 
>   include/mesos/slave/containerizer.hpp 
> c34f1296586ea5d26619605f8f6a5ae9444f1a96 
>   src/slave/containerizer/mesos/containerizer.cpp 
> a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
>   src/slave/containerizer/mesos/io/switchboard.cpp 
> c445a8f09d7671d5763281bac9881489b3cc9c39 
>   src/slave/containerizer/mesos/utils.hpp 
> bfd07e28c78fc140e395ffccd11d65545bf007fc 
>   src/slave/containerizer/mesos/utils.cpp 
> d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 
> 
> 
> Diff: https://reviews.apache.org/r/69681/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>



Review Request 69684: Fixed the FD leak if containerizer::_launch() failed or discarded.

2019-01-07 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69684/
---

Review request for mesos, Gilbert Song, Jie Yu, Qian Zhang, and Vinod Kone.


Bugs: MESOS-6632
https://issues.apache.org/jira/browse/MESOS-6632


Repository: mesos


Description
---

For the period between IOSB server is up and container process exec,
if the containerizer launch returns failure or discarded, the FD used
for signaling from the container process to the IOSB finish redirect
will be leaked, which would cause the IOSB stuck at `io::redirect`
forever. It would block the containerizer from cleaning up this
container at IOSB.

This issue could be exposed if there are frequent check containers
launch on an agent with heavy loads.


Diffs
-

  src/slave/containerizer/mesos/containerizer.cpp 
a5cf2da55c046c5c45e0c2ca3400f64de12de62b 


Diff: https://reviews.apache.org/r/69684/diff/1/


Testing
---

Manual testing:

Add `return Failure("Test!");` to `IOSwitchboard::_prepare()`: 
https://github.com/apache/mesos/blob/7a378597d6a4359de0b1eabb563d4c12cdb4290e/src/slave/containerizer/mesos/io/switchboard.cpp#L657

1) Compile without this patch applied
2) Run `src/mesos-tests --verbose 
--gtest_filter=ContentType/AgentAPITest.AttachContainerInputRepeat/0`
3) This test fails due to a hanging IOSB (15 seconds timeout) after trying to 
destroy IOSB on the HTTP error "Test!"

1) Compile with this patch applied
2) Run `src/mesos-tests --verbose 
--gtest_filter=ContentType/AgentAPITest.AttachContainerInputRepeat/0`
3) This test fails due to the HTTP error "Test!" and terminates immediately (as 
expected)


Thanks,

Andrei Budnik



Re: Review Request 69681: Fixed the FD leak if containerizer::_launch() failed or discarded.

2019-01-07 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69681/#review211724
---




include/mesos/slave/containerizer.hpp
Lines 85 (patched)


s/Option()/Optionnone()/



include/mesos/slave/containerizer.hpp
Lines 90 (patched)


Ditto.



src/slave/containerizer/mesos/containerizer.cpp
Lines 1366-1371 (original), 1368-1373 (patched)


Can we do the `repair` inside this `then()`? Something like:
```
  .then(defer(self(), [=](const Option& containerIO) {
return _launch(containerId, containerIO, environment, 
pidCheckpointPath)
  .repair([](
  const Future& result) {
CHECK(containerIO.isSome());
closeContainerIoFds(containerIO.get());
return result;
  });
  }));
```
In this way we do not need the local variable `_containerIO`.



src/slave/containerizer/mesos/utils.cpp
Lines 22 (patched)


Why do we need this header?



src/slave/containerizer/mesos/utils.cpp
Lines 153-154 (patched)


We could merge these two lines into one.


- Qian Zhang


On Jan. 7, 2019, 8:22 p.m., Gilbert Song wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69681/
> ---
> 
> (Updated Jan. 7, 2019, 8:22 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Jie Yu, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6632
> https://issues.apache.org/jira/browse/MESOS-6632
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> For the period between IOSB server is up and container process exec,
> if the containerizer launch returns failure or discarded, the FD used
> for signaling from the container process to the IOSB finish redirect
> will be leaked, which would cause the IOSB stuck at io::redirect
> forever. It would block the containerizer from cleaning up this
> container at IOSB.
> 
> This issue could be exposed if there are frequent check containers
> launch on an agent with heavy loads.
> 
> 
> Diffs
> -
> 
>   include/mesos/slave/containerizer.hpp 
> c34f1296586ea5d26619605f8f6a5ae9444f1a96 
>   src/slave/containerizer/mesos/containerizer.cpp 
> a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
>   src/slave/containerizer/mesos/io/switchboard.cpp 
> c445a8f09d7671d5763281bac9881489b3cc9c39 
>   src/slave/containerizer/mesos/utils.hpp 
> bfd07e28c78fc140e395ffccd11d65545bf007fc 
>   src/slave/containerizer/mesos/utils.cpp 
> d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 
> 
> 
> Diff: https://reviews.apache.org/r/69681/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>



Re: Review Request 69597: Remove outstanding operations when removing agents.

2019-01-07 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69597/
---

(Updated Jan. 7, 2019, 12:49 p.m.)


Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and 
Joseph Wu.


Bugs: MESOS-9506
https://issues.apache.org/jira/browse/MESOS-9506


Repository: mesos


Description
---

Usually, offer operations are removed when the framework acknowledges
a terminal operation status update.

However, currently only operations on registered agents can be
acknowledged.

This commit explicitly deletes all outstanding operations from an agent
when it is removed.


Diffs
-

  src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316 


Diff: https://reviews.apache.org/r/69597/diff/4/


Testing
---

Internal CI results:
```
Test Result
0 failures
38,673 tests

Configuration Name  Duration All Failed Skipped
FLAG=Plain,label=mesos-ec2-ubuntu-16.04 18 min  32020   0
FLAG=SSL,label=mac  9 min 27 sec26980   0
FLAG=CMake,label=mesos-ec2-centos-7 21 min  31460   0
FLAG=CMake,label=mesos-ec2-ubuntu-16.04 18 min  31550   0
FLAG=SSL,label=mesos-ec2-debian-8   21 min  32610   0
FLAG=SSL,label=mesos-ec2-centos-7   20 min  32560   0
FLAG=Plain,label=mesos-ec2-centos-7 20 min  31930   0
FLAG=SSL,label=mesos-ec2-ubuntu-16.04   18 min  32650   0
FLAG=SSL,label=mesos-ec2-ubuntu-14.04   19 min  32580   0
FLAG=Clang,label=mesos-ec2-ubuntu-16.04 18 min  32020   0
FLAG=SSL,label=mesos-ec2-debian-9   18 min  32630   0
FLAG=SSL,label=mesos-ec2-centos-6   15 min  31680   0
FLAG=BUILD_ISOLATORS,label=mesos-ec2-ubuntu-16.04   5.6 sec 606 0   0
```


Thanks,

Benno Evers



Review Request 69680: Have master acknowledge operation updates of completed frameworks.

2019-01-07 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69680/
---

Review request for mesos, Gastón Kleiman and Greg Mann.


Bugs: MESOS-9434
https://issues.apache.org/jira/browse/MESOS-9434


Repository: mesos


Description
---

After a framework was removed and has unacknowledged operations status
updates, it was impossible to remove terminal operations as nobody could
acknowledge them.

In this patch we make the master acknowledge operation status updates
for frameworks it knows are removed so that e.g., terminal operations
can be removed. Since masters do not persist completed frameworks this
is not reliable (e.g., an agent was partitioned for a long time and
still tracks a completed framework's `FrameworkInfo`, and comes back
only after the master knowing about the framework's completion has
failed over). We merely extend the existing master behavior (e.g., send
`ShutdownFrameworkMessage` to all currently registered agents) to
operations.


Diffs
-

  src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316 
  src/tests/master_tests.cpp 51fcbe8d72fce1e56790d22ab7deced4fdb3bf0e 


Diff: https://reviews.apache.org/r/69680/diff/1/


Testing
---

* `make check`
* tested on a number of configurations in internal CI
* ran added test in repetition, both with and without additional stress


Thanks,

Benjamin Bannier



Review Request 69681: Fixed the FD leak if containerizer::_launch() failed or discarded.

2019-01-07 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69681/
---

Review request for mesos, Andrei Budnik, Jie Yu, Qian Zhang, and Vinod Kone.


Bugs: MESOS-6632
https://issues.apache.org/jira/browse/MESOS-6632


Repository: mesos


Description
---

For the period between IOSB server is up and container process exec,
if the containerizer launch returns failure or discarded, the FD used
for signaling from the container process to the IOSB finish redirect
will be leaked, which would cause the IOSB stuck at io::redirect
forever. It would block the containerizer from cleaning up this
container at IOSB.

This issue could be exposed if there are frequent check containers
launch on an agent with heavy loads.


Diffs
-

  include/mesos/slave/containerizer.hpp 
c34f1296586ea5d26619605f8f6a5ae9444f1a96 
  src/slave/containerizer/mesos/containerizer.cpp 
a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
  src/slave/containerizer/mesos/io/switchboard.cpp 
c445a8f09d7671d5763281bac9881489b3cc9c39 
  src/slave/containerizer/mesos/utils.hpp 
bfd07e28c78fc140e395ffccd11d65545bf007fc 
  src/slave/containerizer/mesos/utils.cpp 
d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 


Diff: https://reviews.apache.org/r/69681/diff/1/


Testing
---

make check


Thanks,

Gilbert Song