Re: Review Request 69615: Disable containerizer ptrace attach.

2019-02-13 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69615/#review212822
---




src/slave/containerizer/mesos/launch.cpp
Lines 556 (patched)


The main thing I wanted to confirm is what we chatted about offline but I 
didn't write down: the interpretation of this paragraph in 
http://man7.org/linux/man-pages/man2/prctl.2.html

```
PR_SET_DUMPABLE
...
  Normally, this flag is set to 1.  However, it is reset to the
  current value contained in the file /proc/sys/fs/suid_dumpable
  (which by default has the value 0), in the following
  circumstances:

  *  The process's effective user or group ID is changed.
...
```

This reads to me like, in a typical setup which runs Mesos agent as uid 0 
and the task with another uid (hence the call `os::setuid(uid.get());` below 
does change uid):

- If `/proc/sys/fs/suid_dumpable` is the default 0, then regardless of 
`--allow_ptrace` the result is that the launcher process becomes undumpable.
- If `/proc/sys/fs/suid_dumpable` is 1, then regardless of `--allow_ptrace` 
the result is that the launcher process becomes dumpable.

i.e., the `prctl()` call here is a noop.

Maybe I misunderstood the doc but perhaps we can tweak the test to be a 
`ROOT_*` test and change to a different task user in order to verify the 
behavior? If you have tested it manually that's fine too just wanted to double 
check :) 

FWIW, I checked a sample container that we run (without this patch) and see 
that

```
# ls -l /proc/1/
```

shows the files under it are all owned by root, which does appear to mean 
that the process' dumpable flag is not 1 according to 
http://man7.org/linux/man-pages/man5/proc.5.html?


- Jiang Yan Xu


On Feb. 8, 2019, 1:09 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69615/
> ---
> 
> (Updated Feb. 8, 2019, 1:09 p.m.)
> 
> 
> Review request for mesos, Xudong Ni, Gilbert Song, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-9349
> https://issues.apache.org/jira/browse/MESOS-9349
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Use `prctl(PR_SET_DUMPABLE)` to disable the ability to attach to
> the containerizer process(es) on Linux systems. This prevents
> unprivileged containerized processes from reading information
> about the containerizer process(es) from `/proc`. This gives an
> additional layer of protection against leaking information to
> untrusted container processes.
> 
> 
> Diffs
> -
> 
>   docs/configuration/agent.md a4015c409d00a4c117df6c869d5ba5072bcfe58e 
>   src/launcher/default_executor.cpp 5837cfa4deba557cae43112092ff24b97137951f 
>   src/launcher/executor.cpp f962e800f23d5582b1bc04a263253893492a5054 
>   src/slave/containerizer/mesos/containerizer.cpp 
> 35f51ad33da53b3e6a8eec275fbf3e77782b0fba 
>   src/slave/containerizer/mesos/launch.hpp 
> 0a6394d56321948ad760ac69c05456319a254842 
>   src/slave/containerizer/mesos/launch.cpp 
> 7f401cdf481123b8c6cc500ac02bb7daf2613d2c 
>   src/slave/flags.hpp 7346ba5b711a8353a4bc1d7dcd2f6184b777ddd0 
>   src/slave/flags.cpp 066b84f528b4c888dde399e0b5d5fe5531934de6 
>   src/slave/slave.cpp 0182dd2ca326723e96eef8c072696ad3c873de0b 
>   src/tests/containerizer/mesos_containerizer_tests.cpp 
> 449928c10b897061642af8ad267f8b70695940e6 
>   src/tests/slave_tests.cpp 22a0295086ae4f4ec26df00a0e077eecfa27f1fb 
> 
> 
> Diff: https://reviews.apache.org/r/69615/diff/3/
> 
> 
> Testing
> ---
> 
> make check (Fedora 29)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 69980: Modified when master responds to operation status updates.

2019-02-13 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69980/#review212823
---



Patch looks great!

Reviews applied: [69968, 69960, 69961, 69962, 69963, 69967, 69980]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 14, 2019, 12:23 a.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69980/
> ---
> 
> (Updated Feb. 14, 2019, 12:23 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
> https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When dealing with orphaned operation status updates, there are two
> cases the master must deal with:
> - The simple case is when the master knows the framework is completed.
>   These status updates can be acknowledged by the master.
> - However, a completed framework can be rotated out of the master's
>   memory.  In addition, after master failover, if an agent reregisters
>   before the framework, an operation can appear to be orphaned until
>   the framework reregisters.
> 
> This adds a fixed delay between agent reregistration and when the
> master acknowledges operation status updates from unknown frameworks.
> The delay should give frameworks ample time to reregister.
> 
> The delay is based on agent reregistration in order to mitigate the
> delay of acknowledging status updates of frameworks rotated out of
> the completed frameworks buffer.
> 
> 
> Diffs
> -
> 
>   src/master/constants.hpp b0ab9187b8c672180e2ffb8b63cb7349dbe43ac4 
>   src/master/master.cpp 014e0e053cdf5c53a5ef8d63300205a121bed319 
> 
> 
> Diff: https://reviews.apache.org/r/69980/diff/1/
> 
> 
> Testing
> ---
> 
> TODO: This case needs unit tests.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 69890: Added test for per-framework, per-role minimal allocatable resources.

2019-02-13 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69890/#review212820
---



Patch looks great!

Reviews applied: [69900, 69902, 69818, 69862, 69889, 69821, 69890]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 13, 2019, 10:31 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69890/
> ---
> 
> (Updated Feb. 13, 2019, 10:31 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-9523
> https://issues.apache.org/jira/browse/MESOS-9523
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds tests for the behavior of per-framework, per-role
> minimal allocatable resources. We validate the behavior of framework
> minimal allocatable resources below the globally configured limits and
> the behavior of empty filters or minimal resource quantities set up for
> a framework.
> 
> 
> Diffs
> -
> 
>   src/tests/hierarchical_allocator_tests.cpp 
> 1366cc51332484d7ff3b5e5b7ba767efb97172c0 
> 
> 
> Diff: https://reviews.apache.org/r/69890/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69957: Updated master operation handling to account for new agent capability.

2019-02-13 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69957/
---

(Updated Feb. 14, 2019, 4:09 a.m.)


Review request for mesos and Gastón Kleiman.


Bugs: MESOS-9535
https://issues.apache.org/jira/browse/MESOS-9535


Repository: mesos


Description
---

Since the new AGENT_OPERATION_FEEDBACK capability has been
made required for agent startup, we need to update the way
the master handles validation of incoming operations and
acknowledgement calls to account for cases where each of
the AGENT_OPERATION_FEEDBACK and RESOURCE_PROVIDER
capabilities are enabled/disabled.


Diffs (updated)
-

  src/master/master.cpp cf2210ec26642028d5e4fb7fc1841eb0a1ed3396 


Diff: https://reviews.apache.org/r/69957/diff/4/

Changes: https://reviews.apache.org/r/69957/diff/3-4/


Testing
---

Testing details at the end of this chain.


Thanks,

Greg Mann



Re: Review Request 69981: Fixed a flaky test `MasterQuotaTest.RemoveSingleQuota`.

2019-02-13 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69981/#review212819
---



PASS: Mesos patch 69981 was successfully built and tested.

Reviews applied: `['69981']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2888/mesos-review-69981

- Mesos Reviewbot Windows


On Feb. 14, 2019, 12:27 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69981/
> ---
> 
> (Updated Feb. 14, 2019, 12:27 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9143
> https://issues.apache.org/jira/browse/MESOS-9143
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The test is flaky due to a race between metrics update
> and metrics query.
> 
> This patch adds clock settle to ensure quota update and
> removal are fully processed (including metrics updates) before
> continuing with the metrics query.
> 
> 
> Diffs
> -
> 
>   src/tests/master_quota_tests.cpp 354a9e988c3e4cdf94c99170d2abb7cb17f9e11a 
> 
> 
> Diff: https://reviews.apache.org/r/69981/diff/2/
> 
> 
> Testing
> ---
> 
> Test ran continuously without failure
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-13 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/#review212817
---



Patch looks great!

Reviews applied: [69977, 69978]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 13, 2019, 3:15 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69978/
> ---
> 
> (Updated Feb. 13, 2019, 3:15 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-9574
> https://issues.apache.org/jira/browse/MESOS-9574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Make the agent garbage collect an operation status update stream once
> a terminal status update is acknowledged.
> 
> This patch also improves the logging of acknowledgement failures and the
> readability of the `Slave::operationStatusAcknowledgement` method.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 
> 
> 
> Diff: https://reviews.apache.org/r/69978/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing + existing tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69980: Modified when master responds to operation status updates.

2019-02-13 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69980/#review212818
---



PASS: Mesos patch 69980 was successfully built and tested.

Reviews applied: `['69960', '69961', '69962', '69963', '69967', '69980']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2887/mesos-review-69980

- Mesos Reviewbot Windows


On Feb. 13, 2019, 3:23 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69980/
> ---
> 
> (Updated Feb. 13, 2019, 3:23 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
> https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When dealing with orphaned operation status updates, there are two
> cases the master must deal with:
> - The simple case is when the master knows the framework is completed.
>   These status updates can be acknowledged by the master.
> - However, a completed framework can be rotated out of the master's
>   memory.  In addition, after master failover, if an agent reregisters
>   before the framework, an operation can appear to be orphaned until
>   the framework reregisters.
> 
> This adds a fixed delay between agent reregistration and when the
> master acknowledges operation status updates from unknown frameworks.
> The delay should give frameworks ample time to reregister.
> 
> The delay is based on agent reregistration in order to mitigate the
> delay of acknowledging status updates of frameworks rotated out of
> the completed frameworks buffer.
> 
> 
> Diffs
> -
> 
>   src/master/constants.hpp b0ab9187b8c672180e2ffb8b63cb7349dbe43ac4 
>   src/master/master.cpp 014e0e053cdf5c53a5ef8d63300205a121bed319 
> 
> 
> Diff: https://reviews.apache.org/r/69980/diff/1/
> 
> 
> Testing
> ---
> 
> TODO: This case needs unit tests.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-13 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/#review212816
---



PASS: Mesos patch 69978 was successfully built and tested.

Reviews applied: `['69977', '69978']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2886/mesos-review-69978

- Mesos Reviewbot Windows


On Feb. 13, 2019, 3:15 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69978/
> ---
> 
> (Updated Feb. 13, 2019, 3:15 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-9574
> https://issues.apache.org/jira/browse/MESOS-9574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Make the agent garbage collect an operation status update stream once
> a terminal status update is acknowledged.
> 
> This patch also improves the logging of acknowledgement failures and the
> readability of the `Slave::operationStatusAcknowledgement` method.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 
> 
> 
> Diff: https://reviews.apache.org/r/69978/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing + existing tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69656: Fixed test `MesosContainerizerSlaveRecoveryTest.ResourceStatistics`.

2019-02-13 Thread Meng Zhu


> On Jan. 14, 2019, 11:41 a.m., Joseph Wu wrote:
> > src/tests/slave_recovery_tests.cpp
> > Lines 5152-5153 (original), 5153-5154 (patched)
> > 
> >
> > Not terribly impactful for the test, but since you are capturing the 
> > TASK_STARTING update, you could check the status before the TASK_RUNNING 
> > one.
> > ```
> >   AWAIT_READY(statusStarting);
> >   EXPECT_EQ(TASK_STARTING, statusStarting->state());
> > ```

Done.


- Meng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69656/#review211962
---


On Jan. 2, 2019, 9:16 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69656/
> ---
> 
> (Updated Jan. 2, 2019, 9:16 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Till Toenshoff.
> 
> 
> Bugs: MESOS-5048
> https://issues.apache.org/jira/browse/MESOS-5048
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> `MesosContainerizerSlaveRecoveryTest.ResourceStatistics` is flaky
> due to a race between executor shutdown (due to never getting any
> tasks) and the test querying resource statistics. If the executor
> is shutdown before the statistics query, the test will fail.
> 
> This patch fixes the test by explicitly waiting for the task to
> be delivered and task status transition to `TASK_RUNNING` before
> restarting the agent. This way, the executor will not be shutdown
> after agent restart. Hence there will be no race.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_recovery_tests.cpp 0eb47e2bdf6a46fc21b59bb85b4b89181087ccd3 
> 
> 
> Diff: https://reviews.apache.org/r/69656/diff/1/
> 
> 
> Testing
> ---
> 
> ran `MesosContainerizerSlaveRecoveryTest.ResourceStatistics` continuously 
> without failure.
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 69969: Updated docs related to agent capabilities.

2019-02-13 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69969/#review212814
---


Ship it!




Ship It!

- Gastón Kleiman


On Feb. 12, 2019, 6:46 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69969/
> ---
> 
> (Updated Feb. 12, 2019, 6:46 p.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated docs related to agent capabilities.
> 
> 
> Diffs
> -
> 
>   docs/configuration/agent.md a4015c409d00a4c117df6c869d5ba5072bcfe58e 
>   include/mesos/mesos.proto 9d5c2b53164adb44eb4a6f12a507e009bd607940 
>   include/mesos/v1/mesos.proto aef31eb5b8781182d3f42d935b12470b319027ed 
>   src/slave/flags.cpp 066b84f528b4c888dde399e0b5d5fe5531934de6 
> 
> 
> Diff: https://reviews.apache.org/r/69969/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69964: Added tests related to operation feedback agent capabilities.

2019-02-13 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69964/#review212813
---


Ship it!




Ship It!

- Gastón Kleiman


On Feb. 12, 2019, 6:39 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69964/
> ---
> 
> (Updated Feb. 12, 2019, 6:39 p.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> These tests verify that master and agent behavior is correct
> during a downgrade of an agent to one which does not have the
> AGENT_OPERATION_FEEDBACK capability, and when a new agent's
> capabilities are altered such that it no longer has the
> RESOURCE_PROVIDER capability set.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 9584fa38c6e9ea26b5b6937199fe4679325ef55e 
> 
> 
> Diff: https://reviews.apache.org/r/69964/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Review Request 69981: Fixed a flaky test `MasterQuotaTest.RemoveSingleQuota`.

2019-02-13 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69981/
---

Review request for mesos, Alexander Rukletsov and Benjamin Mahler.


Bugs: MESOS-9143
https://issues.apache.org/jira/browse/MESOS-9143


Repository: mesos


Description
---

The test is flaky due to a race between metrics update
and metrics query.

This patch adds clock settle to ensure quota update and
removal are fully processed (including metrics updates) before
continuing with the metrics query.


Diffs
-

  src/tests/master_quota_tests.cpp 354a9e988c3e4cdf94c99170d2abb7cb17f9e11a 


Diff: https://reviews.apache.org/r/69981/diff/1/


Testing
---

Test ran continuously without failure


Thanks,

Meng Zhu



Re: Review Request 69890: Added test for per-framework, per-role minimal allocatable resources.

2019-02-13 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69890/#review212811
---



PASS: Mesos patch 69890 was successfully built and tested.

Reviews applied: `['69818', '69862', '69889', '69821', '69890']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2885/mesos-review-69890

- Mesos Reviewbot Windows


On Feb. 13, 2019, 2:31 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69890/
> ---
> 
> (Updated Feb. 13, 2019, 2:31 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-9523
> https://issues.apache.org/jira/browse/MESOS-9523
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds tests for the behavior of per-framework, per-role
> minimal allocatable resources. We validate the behavior of framework
> minimal allocatable resources below the globally configured limits and
> the behavior of empty filters or minimal resource quantities set up for
> a framework.
> 
> 
> Diffs
> -
> 
>   src/tests/hierarchical_allocator_tests.cpp 
> 1366cc51332484d7ff3b5e5b7ba767efb97172c0 
> 
> 
> Diff: https://reviews.apache.org/r/69890/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69963: Handled terminal operation status updates for orphans.

2019-02-13 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69963/
---

(Updated Feb. 13, 2019, 3:24 p.m.)


Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Changes
---

Removed AllocationInfo from resources before modifying TotalResources.


Bugs: MESOS-9542
https://issues.apache.org/jira/browse/MESOS-9542


Repository: mesos


Description
---

When an orphaned operation is transitioned from non-terminal to
terminal, the operation's resources must be added back to the agent's
total resources, while accounting for resource conversion if the
operation is successful.

There is an odd case where an orphan is transitioned to terminal
via an UpdateSlaveMessage (instead of UpdateOperationStatusMessage).
When this happens, the required resource math is actually done by
the agent.


Diffs (updated)
-

  src/master/master.cpp 014e0e053cdf5c53a5ef8d63300205a121bed319 


Diff: https://reviews.apache.org/r/69963/diff/2/

Changes: https://reviews.apache.org/r/69963/diff/1-2/


Testing
---

See last patch in chain.


Thanks,

Joseph Wu



Review Request 69980: Modified when master responds to operation status updates.

2019-02-13 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69980/
---

Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Bugs: MESOS-9542
https://issues.apache.org/jira/browse/MESOS-9542


Repository: mesos


Description
---

When dealing with orphaned operation status updates, there are two
cases the master must deal with:
- The simple case is when the master knows the framework is completed.
  These status updates can be acknowledged by the master.
- However, a completed framework can be rotated out of the master's
  memory.  In addition, after master failover, if an agent reregisters
  before the framework, an operation can appear to be orphaned until
  the framework reregisters.

This adds a fixed delay between agent reregistration and when the
master acknowledges operation status updates from unknown frameworks.
The delay should give frameworks ample time to reregister.

The delay is based on agent reregistration in order to mitigate the
delay of acknowledging status updates of frameworks rotated out of
the completed frameworks buffer.


Diffs
-

  src/master/constants.hpp b0ab9187b8c672180e2ffb8b63cb7349dbe43ac4 
  src/master/master.cpp 014e0e053cdf5c53a5ef8d63300205a121bed319 


Diff: https://reviews.apache.org/r/69980/diff/1/


Testing
---

TODO: This case needs unit tests.


Thanks,

Joseph Wu



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-13 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/
---

(Updated Feb. 13, 2019, 3:15 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
---

Fixed typos in the commit message.


Bugs: MESOS-9574
https://issues.apache.org/jira/browse/MESOS-9574


Repository: mesos


Description (updated)
---

Make the agent garbage collect an operation status update stream once
a terminal status update is acknowledged.

This patch also improves the logging of acknowledgement failures and the
readability of the `Slave::operationStatusAcknowledgement` method.


Diffs (updated)
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69978/diff/2/

Changes: https://reviews.apache.org/r/69978/diff/1-2/


Testing
---

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman



Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-13 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/
---

Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-9574
https://issues.apache.org/jira/browse/MESOS-9574


Repository: mesos


Description
---

Make the agent garbage collect an operation stauts update stream once
a terminal status update is acknowledged.

This patch also improves the logging of acknowledgmenet failures and the
readability of the `Slave::operationStatusAcknowledgement` method.


Diffs
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69978/diff/1/


Testing
---

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69968: Copied operation removal logic in agent removal code path.

2019-02-13 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69968/
---

(Updated Feb. 13, 2019, 2:46 p.m.)


Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Changes
---

Did some local testing to commit this patch independently of the chain.


Bugs: MESOS-9557
https://issues.apache.org/jira/browse/MESOS-9557


Repository: mesos


Description
---

There are two code paths the master can follow to remove an agent
from its memory:
- When the agent sends an UnregisterSlaveMessage or tries to register
  as a new agent,
- Or, when an agent is marked unreachable/gone.

The first code path did not clean up operation state completely,
resulting in leaking some memory in the master's Framework structs.


Diffs
-

  src/master/master.cpp 014e0e053cdf5c53a5ef8d63300205a121bed319 


Diff: https://reviews.apache.org/r/69968/diff/1/


Testing (updated)
---

make check


Thanks,

Joseph Wu



Review Request 69977: Improved agent operation recovery process.

2019-02-13 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69977/
---

Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-8054
https://issues.apache.org/jira/browse/MESOS-8054


Repository: mesos


Description
---

This patch makes the agent walk the operation status update streams
directories in order to generate the list of streams to recover, instead
of generating it from the checkpointed `ResourceState` message.

This prevents the agent from asking the operation status update manager
to recover streams that haven't been created yet.

The patch also makes the agent garbage collect operation status update
streams if no correspondng operation is present in the checkpointed
state. This can happen after the agent fails over while processing the
acknowledgement of a terminal operation status update.


Diffs
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69977/diff/1/


Testing
---

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69890: Added test for per-framework, per-role minimal allocatable resources.

2019-02-13 Thread Benjamin Bannier


> On Feb. 8, 2019, 9:47 p.m., Benjamin Mahler wrote:
> > src/tests/hierarchical_allocator_tests.cpp
> > Lines 2357-2374 (patched)
> > 
> >
> > Both "empty set" and "NullResourcesAllocatable" don't seem to 
> > accurately describe what's done?
> > 
> >  // Add a framework which specifies minimum allocatable resources
> >   // with a single, empty resource.
> > 
> > Single empty seems more appropriate? Come to think of it, should we 
> > validate that all entries are non-empty? There doesn't seem to be any point 
> > in allowing an empty entry, since any single empty entry renders it 
> > equivalent to the empty set case?
> 
> Benjamin Bannier wrote:
> Fixed the test docstring.
> 
> I think there is some value in allowing this going forward. An empty set 
> of quantities maps onto an empty set of requirements while a set with a 
> single, empty quantity maps onto a requirement for anything. WDYT?
> 
> Dropping for now, please feel free to reopen.
> 
> Benjamin Mahler wrote:
> Well, what's different about those the two requirements? They sound the 
> same?

I updated the validation code in https://reviews.apache.org/r/69862/ to reject 
such empty quantity settings now, and removed the two test cases here.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69890/#review212678
---


On Feb. 13, 2019, 11:31 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69890/
> ---
> 
> (Updated Feb. 13, 2019, 11:31 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-9523
> https://issues.apache.org/jira/browse/MESOS-9523
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds tests for the behavior of per-framework, per-role
> minimal allocatable resources. We validate the behavior of framework
> minimal allocatable resources below the globally configured limits and
> the behavior of empty filters or minimal resource quantities set up for
> a framework.
> 
> 
> Diffs
> -
> 
>   src/tests/hierarchical_allocator_tests.cpp 
> 1366cc51332484d7ff3b5e5b7ba767efb97172c0 
> 
> 
> Diff: https://reviews.apache.org/r/69890/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69890: Added test for per-framework, per-role minimal allocatable resources.

2019-02-13 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69890/
---

(Updated Feb. 13, 2019, 11:31 p.m.)


Review request for mesos, Benjamin Mahler and Meng Zhu.


Changes
---

Remove test cases which are not needed anymore


Bugs: MESOS-9523
https://issues.apache.org/jira/browse/MESOS-9523


Repository: mesos


Description
---

This patch adds tests for the behavior of per-framework, per-role
minimal allocatable resources. We validate the behavior of framework
minimal allocatable resources below the globally configured limits and
the behavior of empty filters or minimal resource quantities set up for
a framework.


Diffs (updated)
-

  src/tests/hierarchical_allocator_tests.cpp 
1366cc51332484d7ff3b5e5b7ba767efb97172c0 


Diff: https://reviews.apache.org/r/69890/diff/4/

Changes: https://reviews.apache.org/r/69890/diff/3-4/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 69862: Validated static framework offer filters.

2019-02-13 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69862/
---

(Updated Feb. 13, 2019, 11:31 p.m.)


Review request for mesos, Benjamin Mahler and Meng Zhu.


Changes
---

Disallow empty quantities in min. allocatable resources, or empty quantities 
itself, https://reviews.apache.org/r/69890/#comment298526


Bugs: MESOS-9523
https://issues.apache.org/jira/browse/MESOS-9523


Repository: mesos


Description
---

This patch adds validation to framework-specified static offer filters.
In particular we enforce that minimal resource quantities cannot be
negative.


Diffs (updated)
-

  src/common/validation.hpp 5929ac24fa7d7d2f7f08bc30c5071716e6cf46cb 
  src/common/validation.cpp 4a64fc8ac83fd741eb2cbc70772322b347dbcb4e 
  src/master/validation.hpp a5bdff6a9301631dac9970568a1346460939c861 
  src/master/validation.cpp 4a699f06a3e47ff50b356ce6ff470d1c4e13b02e 
  src/tests/common_validation_tests.cpp 
97466ab38f3156155655520c81b17cc6ec2cd779 
  src/tests/master_validation_tests.cpp 
b34e88f12a8959989aea5c180b15f8ae2933d33f 


Diff: https://reviews.apache.org/r/69862/diff/4/

Changes: https://reviews.apache.org/r/69862/diff/3-4/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 69904: Extracted common offer matching functions from SLRP tests.

2019-02-13 Thread Chun-Hung Hsiao


> On Feb. 12, 2019, 3:16 p.m., Benjamin Bannier wrote:
> > src/tests/storage_local_resource_provider_tests.cpp
> > Lines 334 (patched)
> > 
> >
> > Could you briefly document what these do?
> > 
> > We could just declare these outside of the class in this file instead 
> > of them being `static`.
> 
> Chun-Hung Hsiao wrote:
> Don't we usually put utilities in the test fixture?

Hmm yeah I see some local helpers declared as static global functions. Done.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69904/#review212744
---


On Feb. 12, 2019, 5:26 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69904/
> ---
> 
> (Updated Feb. 12, 2019, 5:26 a.m.)
> 
> 
> Review request for mesos and Benjamin Bannier.
> 
> 
> Bugs: MESOS-9565
> https://issues.apache.org/jira/browse/MESOS-9565
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch extracts lambda functions `isStoragePool`, `isMountDisk` and
> `isPreprovisionedVolume` from all tests and makes them common utitily
> functions in the test fixture.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> fb001aa8d32d1a0a03014a35772fe10b65ce8d9a 
> 
> 
> Diff: https://reviews.apache.org/r/69904/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69904: Extracted common offer matching functions from SLRP tests.

2019-02-13 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69904/
---

(Updated Feb. 13, 2019, 9:26 p.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Addressed Benjamin's comment.


Bugs: MESOS-9565
https://issues.apache.org/jira/browse/MESOS-9565


Repository: mesos


Description
---

This patch extracts lambda functions `isStoragePool`, `isMountDisk` and
`isPreprovisionedVolume` from all tests and makes them common utitily
functions in the test fixture.


Diffs (updated)
-

  src/tests/storage_local_resource_provider_tests.cpp 
fb001aa8d32d1a0a03014a35772fe10b65ce8d9a 


Diff: https://reviews.apache.org/r/69904/diff/3/

Changes: https://reviews.apache.org/r/69904/diff/2-3/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 69898: Updated `CreateDestroyDisk*` SLRP tests to test pipelined operations.

2019-02-13 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69898/
---

(Updated Feb. 13, 2019, 9:24 p.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Addressed Benjamin's comments and fixed a compilation error.


Bugs: MESOS-9565
https://issues.apache.org/jira/browse/MESOS-9565


Repository: mesos


Description
---

This patch extends the code coverage of the `CreateDestroyDisk` and
`CreateDestroyDiskRecovery` tests by testing pipelined `RESERVE`,
`CREATE`, `DESTROY` and `UNRESERVE` operations along with `CREATE_DISK`
and `DESTROY_DISK`. It also renames `CreateDestroyDiskRecovery` to
`CreateDestroyDiskWithRecovery` for consistency.


Diffs (updated)
-

  src/tests/storage_local_resource_provider_tests.cpp 
fb001aa8d32d1a0a03014a35772fe10b65ce8d9a 


Diff: https://reviews.apache.org/r/69898/diff/3/

Changes: https://reviews.apache.org/r/69898/diff/2-3/


Testing
---

`make check`

Run in repetition


Thanks,

Chun-Hung Hsiao



Re: Review Request 69897: Made SLRP `PublishResourcesRecovery` test to check volume cleanup.

2019-02-13 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69897/
---

(Updated Feb. 13, 2019, 9:22 p.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Addressed Benjamin's comments and fixed a compilation error.


Bugs: MESOS-9565
https://issues.apache.org/jira/browse/MESOS-9565


Repository: mesos


Description (updated)
---

This patch renames the `ROOT_PublishResourcesRecovery` test to
`ROOT_CreateDestroyPersistentMountVolumeWithRecovery` and makes it
verify that the persistent volume is cleaned up with `DESTROY` after
recovery.

NOTE: The `filesystem/linux` isolator has been removed from the test
because it is not necessary for the test. However, the root privilege is
still required by the test CSI plugin for bind-mounting.


Diffs (updated)
-

  src/tests/storage_local_resource_provider_tests.cpp 
fb001aa8d32d1a0a03014a35772fe10b65ce8d9a 


Diff: https://reviews.apache.org/r/69897/diff/3/

Changes: https://reviews.apache.org/r/69897/diff/2-3/


Testing
---

`make check`

Run in repetition with ~250 iterations. (It will fail with "too many open 
files" with more iterations because of 
https://issues.apache.org/jira/browse/MESOS-8428.


Thanks,

Chun-Hung Hsiao



Re: Review Request 69896: Made SLRP `PublishResourcesReboot` test to check persistent volume cleanup.

2019-02-13 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69896/
---

(Updated Feb. 13, 2019, 9:20 p.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Addressed Benjamin's comments and fixed a compilation error.


Bugs: MESOS-9565
https://issues.apache.org/jira/browse/MESOS-9565


Repository: mesos


Description (updated)
---

This patch renames the `ROOT_PublishResourcesReboot` test to
`ROOT_CreateDestroyPersistentMountVolumeWithReboot` and makes it verify
that the persistent volume is cleaned up with `DESTROY` after a reboot.

NOTE: The `filesystem/linux` isolator has been removed from the test
because it is not necessary for the test. However, the root privilege is
still required by the test CSI plugin for bind-mounting.


Diffs (updated)
-

  src/tests/storage_local_resource_provider_tests.cpp 
fb001aa8d32d1a0a03014a35772fe10b65ce8d9a 


Diff: https://reviews.apache.org/r/69896/diff/3/

Changes: https://reviews.apache.org/r/69896/diff/2-3/


Testing
---

`make check`

Run in repetition with ~200 repetitions.

Manually injected some delay in the SLRP profile reconciliation loop to test 
the behavior. Without waiting for the 3rd `UpdateSlaveMessage` the test flake 
would manifest, and with the trick the flake is gone.


Thanks,

Chun-Hung Hsiao



Re: Review Request 69895: Made SLRP `PublishResources` test to check persistent volume cleanup.

2019-02-13 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69895/
---

(Updated Feb. 13, 2019, 9:19 p.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Addressed Benjamin's comments.


Bugs: MESOS-9565
https://issues.apache.org/jira/browse/MESOS-9565


Repository: mesos


Description (updated)
---

This patch renames the `ROOT_PublishResources` test to
`ROOT_CreateDestroyPersistentMountVolume` and makes it verify that the
persistent volume is cleaned up after `DESTROY`.

NOTE: The `filesystem/linux` isolator has been removed from the test
because it is not necessary for the test. However, the root privilege is
still required by the test CSI plugin for bind-mounting.


Diffs (updated)
-

  src/tests/storage_local_resource_provider_tests.cpp 
fb001aa8d32d1a0a03014a35772fe10b65ce8d9a 


Diff: https://reviews.apache.org/r/69895/diff/3/

Changes: https://reviews.apache.org/r/69895/diff/2-3/


Testing
---

`make check`


Thanks,

Chun-Hung Hsiao



Re: Review Request 69968: Copied operation removal logic in agent removal code path.

2019-02-13 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69968/#review212806
---


Ship it!




Ship It!

- Greg Mann


On Feb. 13, 2019, 2:05 a.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69968/
> ---
> 
> (Updated Feb. 13, 2019, 2:05 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9557
> https://issues.apache.org/jira/browse/MESOS-9557
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There are two code paths the master can follow to remove an agent
> from its memory:
> - When the agent sends an UnregisterSlaveMessage or tries to register
>   as a new agent,
> - Or, when an agent is marked unreachable/gone.
> 
> The first code path did not clean up operation state completely,
> resulting in leaking some memory in the master's Framework structs.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 014e0e053cdf5c53a5ef8d63300205a121bed319 
> 
> 
> Diff: https://reviews.apache.org/r/69968/diff/1/
> 
> 
> Testing
> ---
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 69821: Enforced minimal allocatable resources in the hierarchical allocator.

2019-02-13 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69821/#review212804
---


Ship it!




Ship It!

- Benjamin Mahler


On Feb. 13, 2019, 3:44 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69821/
> ---
> 
> (Updated Feb. 13, 2019, 3:44 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-9523
> https://issues.apache.org/jira/browse/MESOS-9523
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch modifies the hierarchical allocator to take
> framework-specified minimal allocatable resources into account.
> 
> While previously the allocator was inspecting the minimal allocatable
> resources specified in its global options, it can now also inspects
> framework-specific resource requirements. With that frameworks can e.g.,
> configure resource requirements above the default minimal allocatable
> resource, or opt into receiving resources considered too small to be
> allocatable by the allocator in its default behavior.
> 
> For that we change the hierarchical allocator's `allocatable` function
> to be framework and role-specific. As that does in some places not allow
> us to abort iterating over candidate resource consumers like before an
> additional check on whether any resources are left in an allocation
> cycle is added as a break condition.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> 1420c2638786d85f7b04379e5d79e59990c3e6cf 
>   src/master/allocator/mesos/hierarchical.cpp 
> 862dbb90bdfa39ead4b185104a308eabe249d734 
> 
> 
> Diff: https://reviews.apache.org/r/69821/diff/12/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> File Attachments
> 
> 
> Ratio new/old timings
>   
> https://reviews.apache.org/media/uploaded/files/2019/01/31/d76189de-8882-4aff-956b-090dab729358__new_over_old.png
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69972: Skipped the container which has no checkpointed volumes during recovery.

2019-02-13 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69972/#review212803
---



Patch looks great!

Reviews applied: [69972]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 13, 2019, 8:26 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69972/
> ---
> 
> (Updated Feb. 13, 2019, 8:26 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9507
> https://issues.apache.org/jira/browse/MESOS-9507
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There are two cases we need to handle:
>   1. The checkpointed docker volumes file does not exist.
>   2. The checkpointed docker volumes file is empty.
> For both of the two cases, in the recovery of `docker/volume` isolator,
> we should remove the container's checkpoint directory and then skip the
> container.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/docker/volume/isolator.cpp 
> a72fc84da6fb0f24d363dd4c635500510da675d8 
> 
> 
> Diff: https://reviews.apache.org/r/69972/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 69971: Removed an outdated comment in SLRP.

2019-02-13 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69971/#review212796
---



Patch looks great!

Reviews applied: [69970, 69971]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 13, 2019, 5:55 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69971/
> ---
> 
> (Updated Feb. 13, 2019, 5:55 a.m.)
> 
> 
> Review request for mesos and Benjamin Bannier.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Removed an outdated comment in SLRP.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> 09a710d668a5a7460b6c4e4fa32d3829dca7ac55 
> 
> 
> Diff: https://reviews.apache.org/r/69971/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69821: Enforced minimal allocatable resources in the hierarchical allocator.

2019-02-13 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69821/
---

(Updated Feb. 13, 2019, 4:44 p.m.)


Review request for mesos, Benjamin Mahler and Meng Zhu.


Changes
---

Fix rebased errors


Bugs: MESOS-9523
https://issues.apache.org/jira/browse/MESOS-9523


Repository: mesos


Description
---

This patch modifies the hierarchical allocator to take
framework-specified minimal allocatable resources into account.

While previously the allocator was inspecting the minimal allocatable
resources specified in its global options, it can now also inspects
framework-specific resource requirements. With that frameworks can e.g.,
configure resource requirements above the default minimal allocatable
resource, or opt into receiving resources considered too small to be
allocatable by the allocator in its default behavior.

For that we change the hierarchical allocator's `allocatable` function
to be framework and role-specific. As that does in some places not allow
us to abort iterating over candidate resource consumers like before an
additional check on whether any resources are left in an allocation
cycle is added as a break condition.


Diffs (updated)
-

  src/master/allocator/mesos/hierarchical.hpp 
1420c2638786d85f7b04379e5d79e59990c3e6cf 
  src/master/allocator/mesos/hierarchical.cpp 
862dbb90bdfa39ead4b185104a308eabe249d734 


Diff: https://reviews.apache.org/r/69821/diff/12/

Changes: https://reviews.apache.org/r/69821/diff/11-12/


Testing
---

`make check`


File Attachments


Ratio new/old timings
  
https://reviews.apache.org/media/uploaded/files/2019/01/31/d76189de-8882-4aff-956b-090dab729358__new_over_old.png


Thanks,

Benjamin Bannier



Re: Review Request 69972: Skipped the container which has no checkpointed volumes during recovery.

2019-02-13 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69972/#review212795
---



Thanks for the patch!
I think we should implement a test for this. Otherwise, it would be very 
dangerous to _refactor_ this part of code in the future.
If you have no chance to implement a test for this now, please feel free to 
file a ticket.


src/slave/containerizer/mesos/isolators/docker/volume/isolator.cpp
Lines 300 (patched)


Given that `state::checkpoint` is **atomic**, we can not end up in the 
state where the file is empty because the agent did not finish writing to it.

However, an empty file might occur in case of hard reboot of the agent's 
host. This happens because page cache is dumped every 20 seconds by default in 
Linux. There is a chance that the file is created, but data has not yet synced 
on disk.

As we have agreed with Gilbert, we need to ignore empty files **only** in 
case of orphan containers.


- Andrei Budnik


On Feb. 13, 2019, 8:26 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69972/
> ---
> 
> (Updated Feb. 13, 2019, 8:26 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9507
> https://issues.apache.org/jira/browse/MESOS-9507
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There are two cases we need to handle:
>   1. The checkpointed docker volumes file does not exist.
>   2. The checkpointed docker volumes file is empty.
> For both of the two cases, in the recovery of `docker/volume` isolator,
> we should remove the container's checkpoint directory and then skip the
> container.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/docker/volume/isolator.cpp 
> a72fc84da6fb0f24d363dd4c635500510da675d8 
> 
> 
> Diff: https://reviews.apache.org/r/69972/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 69972: Skipped the container which has no checkpointed volumes during recovery.

2019-02-13 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69972/#review212791
---



PASS: Mesos patch 69972 was successfully built and tested.

Reviews applied: `['69972']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2884/mesos-review-69972

- Mesos Reviewbot Windows


On Feb. 13, 2019, 8:26 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69972/
> ---
> 
> (Updated Feb. 13, 2019, 8:26 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9507
> https://issues.apache.org/jira/browse/MESOS-9507
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There are two cases we need to handle:
>   1. The checkpointed docker volumes file does not exist.
>   2. The checkpointed docker volumes file is empty.
> For both of the two cases, in the recovery of `docker/volume` isolator,
> we should remove the container's checkpoint directory and then skip the
> container.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/docker/volume/isolator.cpp 
> a72fc84da6fb0f24d363dd4c635500510da675d8 
> 
> 
> Diff: https://reviews.apache.org/r/69972/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 69969: Updated docs related to agent capabilities.

2019-02-13 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69969/#review212790
---



Patch looks great!

Reviews applied: [69958, 69957, 69876, 69964, 69969]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 12, 2019, 6:46 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69969/
> ---
> 
> (Updated Feb. 12, 2019, 6:46 p.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated docs related to agent capabilities.
> 
> 
> Diffs
> -
> 
>   docs/configuration/agent.md a4015c409d00a4c117df6c869d5ba5072bcfe58e 
>   include/mesos/mesos.proto 9d5c2b53164adb44eb4a6f12a507e009bd607940 
>   include/mesos/v1/mesos.proto aef31eb5b8781182d3f42d935b12470b319027ed 
>   src/slave/flags.cpp 066b84f528b4c888dde399e0b5d5fe5531934de6 
> 
> 
> Diff: https://reviews.apache.org/r/69969/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Review Request 69972: Skipped the container which has no checkpointed volumes during recovery.

2019-02-13 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69972/
---

Review request for mesos, Andrei Budnik and Gilbert Song.


Bugs: MESOS-9507
https://issues.apache.org/jira/browse/MESOS-9507


Repository: mesos


Description
---

There are two cases we need to handle:
  1. The checkpointed docker volumes file does not exist.
  2. The checkpointed docker volumes file is empty.
For both of the two cases, in the recovery of `docker/volume` isolator,
we should remove the container's checkpoint directory and then skip the
container.


Diffs
-

  src/slave/containerizer/mesos/isolators/docker/volume/isolator.cpp 
a72fc84da6fb0f24d363dd4c635500510da675d8 


Diff: https://reviews.apache.org/r/69972/diff/1/


Testing
---


Thanks,

Qian Zhang