Re: Review Request 55901: Added support for command health checks to the default executor.

2017-02-10 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review165230
---



Patch looks great!

Reviews applied: [55900, 56288, 55901]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 10, 2017, 6:40 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> ---
> 
> (Updated Feb. 10, 2017, 6:40 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent 
> huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
> https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -
> 
>   src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
>   src/checks/health_checker.cpp a5225ff1f4b071ed4182d41fa8ecc705fa4dbe00 
>   src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> ---
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It 
> passes on Linux, but not on macOS, because of MESOS-7050.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 56370: Added a test to ensure multi-role framework receiving offers.

2017-02-10 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56370/#review165229
---


Fix it, then Ship it!




Ship It!


src/tests/master_tests.cpp (line 6379)


s/->/.get().



src/tests/master_tests.cpp (line 6382)


ditto


- Guangya Liu


On 二月 9, 2017, 2:28 a.m., Jay Guo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56370/
> ---
> 
> (Updated 二月 9, 2017, 2:28 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Guangya Liu, and 
> Michael Park.
> 
> 
> Bugs: MESOS-7062
> https://issues.apache.org/jira/browse/MESOS-7062
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test ensures that multi-role framework should receive offers for
> each of its roles.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 3b4123b49ee32c902a5d2a01fcc7026da21fdd18 
> 
> Diff: https://reviews.apache.org/r/56370/diff/
> 
> 
> Testing
> ---
> 
> Added a test.
> 
> make check GTEST_FILTER="MasterTest.MultiRoleFrameworkReceivesOffers"
> 
> 
> Thanks,
> 
> Jay Guo
> 
>



Re: Review Request 56551: Updated CHANGELOG with still experimental features from 1.1.0.

2017-02-10 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56551/#review165228
---



Patch looks great!

Reviews applied: [56551]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 10, 2017, 4:48 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56551/
> ---
> 
> (Updated Feb. 10, 2017, 4:48 p.m.)
> 
> 
> Review request for mesos and Adam B.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   CHANGELOG 1478ad4dd3b20fc768fd679f81d803acf0995ad2 
> 
> Diff: https://reviews.apache.org/r/56551/diff/
> 
> 
> Testing
> ---
> 
> None: not a functional change.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 56362: Windows: Create the Windows container launcher.

2017-02-10 Thread Andrew Schwartzmeyer

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56362/
---

(Updated Feb. 11, 2017, 2:04 a.m.)


Review request for mesos, Alex Clemmer and Joseph Wu.


Changes
---

Requested changes.


Bugs: MESOS-6892
https://issues.apache.org/jira/browse/MESOS-6892


Repository: mesos


Description
---

Instead of deriving `WindowsLauncher` from `PosixLauncher`,
this commit implements a separate `WindowsLauncher` derived
from `Launcher` parallel to the `PosixLauncher`.

This launcher is then refactored to use Windows' "Job Objects," which
are similar to Linux's cgroups, and enable us to reason about a group
of processes associated with a task/container as a "Job Object"
instead of a root PID and the tree containing its children. The latter
is not a reasonable approach on Windows, and has been the source of
subtle bugs.

The Job Object approach creates a named job with a one-to-one mapping
to the containerizer, and assigns the first process started for the
task to the job object. After being assigned, the Windows kernel
ensures all spawned child processes
(specifically those made with the `CreateProcess` system call) are
also assigned to the named job object. Thus this job object can then
be used to query the process group's resource usage, kill the process
group, and set limits on the process group.

So instead of seeing a process group as a tree and referring to it by
the root process's PID, the `WindowsLauncher` sees a process group as
a named Job Object, the same way the Windows kernel sees it. However,
the containerizer code which interacts with the launcher still refers
to a task group by the singular PID, and so we have a sort of shim
which maps the initial PID to the name of the job object. This is a an
unfortunate consequence of the shared containerizer code being
originally written for POSIX-like systems.

This abstraction sets us up for implementing resource usage limits on
the process group as a whole.


Diffs (updated)
-

  src/CMakeLists.txt 3a4ace9c8011ac8eec5067cd085fa7fe4166b9ee 
  src/slave/containerizer/mesos/containerizer.cpp 
d2b4f75a55dbe4746bc2dfc180335fa831a554ef 
  src/slave/containerizer/mesos/launcher.hpp 
79f6eea0ee8e564e90b36208672df150dbc5d540 
  src/slave/containerizer/mesos/launcher.cpp 
5114c130efbfb252dde1e85c081f5174e66f57af 
  src/slave/containerizer/mesos/windows_launcher.hpp PRE-CREATION 
  src/slave/containerizer/mesos/windows_launcher.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/56362/diff/


Testing
---


Thanks,

Andrew Schwartzmeyer



Re: Review Request 56569: Made ignoring logging use WARNING in master.

2017-02-10 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56569/
---

(Updated 二月 11, 2017, 1:52 a.m.)


Review request for mesos and Benjamin Mahler.


Changes
---

Updated commit message and also more ignore log message use WARNING.


Summary (updated)
-

Made ignoring logging use WARNING in master.


Repository: mesos


Description (updated)
---

Made ignoring logging use WARNING in master.


Diffs (updated)
-

  src/master/master.cpp 0cf81adeb1d087f298a9c70cfb40179ad457bed2 

Diff: https://reviews.apache.org/r/56569/diff/


Testing
---

make
make check


Thanks,

Guangya Liu



Re: Review Request 56052: Added the 'Secret' protobuf message.

2017-02-10 Thread Vinod Kone


> On Feb. 9, 2017, 6:31 p.m., Jie Yu wrote:
> > include/mesos/mesos.proto, line 1994
> > 
> >
> > Actually, one thought here. Given that environment varaible's source 
> > cannot be a byte stream. Do we want to have explicit typing: REFERENCE, 
> > TEXT, BYTES for secret so that the validation for environment source can be 
> > easier and more explicit?
> 
> Greg Mann wrote:
> When a user specifies a REFERENCE-type Secret, would they need to be able 
> to specify text vs. bytes in that case? If so, then we might want two 
> different sets of type information. I don't know if a user's module 
> implementation might need to know the encoding (or lack thereof) of a 
> REFERENCE Secret when fetching it?
> 
> If we don't think the REFERENCE-type secret needs any encoding 
> information, then we could use {REFERENCE, TEXT} for the enum types in this 
> patch, so that we could add a BYTES type at the top level later. Another 
> option would be to include type information inside the Value message, but 
> this adds a bit more complexity to the message.
> 
> Jie Yu wrote:
> Yes, you're right that this also applies to REFERENCE type secret. What 
> happen in the future that we start to support REFERENCE type through modules. 
> What the agent should do for environment variable. Does the agent need to do 
> base64 encoding (as you mentioned, how does the agent know that the content 
> received is a string or byte stream, and how does the user know?), or leave 
> it as a plain string?
> 
> Looks like k8s secret always do base64 encoding? Maybe that's how they 
> solve the issue?
> 
> Let's do our due diligence to evaluate all the options thoroughly because 
> this is a very important API.
> 
> Greg Mann wrote:
> Yea it looks like k8s solves this by always base64-encoding the secret 
> data. Since k8s provides its own secret-store, they can establish this 
> constraint easily. In our case, if we want to support arbitrary back-ends, we 
> can't necessarily make this assumption. For example, see the API section of 
> the docs for the Hashicorp Vault PKI backend: 
> https://www.vaultproject.io/docs/secrets/pki/. Some of these endpoints return 
> raw binary data. If we require that a secret fetcher module be used to 
> populate REFERENCE type secrets, then we could still require base64-encoding, 
> and the secret fetcher could be responsible for encoding fetched secrets 
> appropriately.
> 
> Let's consider a concrete example: a cluster using a Hashicorp Vault 
> secret store which has two back-ends loaded: 'generic' (the default) and 
> 'pki'. Given the current `Secret` design in this patch, a fetcher module 
> could infer the correct back-end to use via the `reference.name`. For 
> example, the module could map these names directly onto the Vault API: a name 
> of '/pki/ca' would grab the PKI back-end's CA, and a name of 
> '/pki/cert/' could grab a particular certificate from the PKI 
> back-end. A name of '/generic/my_secret' could fetch the secret named 
> "my_secret" in the generic back-end.
> 
> In any case, if we wanted to mandate only base64-encoded secrets within 
> Mesos, the secret fetcher could be aware of the type of data received from 
> the back-end's API and encode it appropriately, if necessary. Or, we could 
> allow binary secrets and include explicit type information to distinguish 
> between text and binary. If we do this, however, it's not clear to me that 
> the user launching a task should need to specify whether the secret is text 
> or bytes - perhaps the secret fetcher could determine this from a particular 
> secret name, or perhaps some secret stores may return both text and binary 
> secrets from the same API call? I'm not sure, need to do some more research.
> 
> If we want to provide a more generic HTTP-based secret-fetching model in 
> which a user could specify an arbitrary endpoint that would return the 
> desired secret, then it makes sense for that user to specify text vs. bytes. 
> However, in that case it's not clear to me where the generic fetcher gets the 
> credentials needed to fetch the secret itself.
> 
> This brings me to another important issue regarding secret fetching: how 
> does the fetcher establish that the executor or task in question is 
> authorized to fetch the specified secret? With this proto, we're attempting 
> to provide a first-class object permitting the secure fetching of secrets; do 
> we also want to provide a first-class identity associated with the 
> executor/task for the purpose of authorizing the secret-fetching?

Instead of first-classing bytes, I think it is simpler to only allow string in 
the API and let the user do encoding/decoding while doing storing/using.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:

Re: Review Request 56569: Updated `drop` log message from `ERROR` to `WARNING`.

2017-02-10 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56569/#review165226
---


Ship it!




I didn't do an audit but you should probably look to see if our existing 
message ignoring logging (not all of them use drop) use WARNING.

- Benjamin Mahler


On Feb. 11, 2017, 1:07 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56569/
> ---
> 
> (Updated Feb. 11, 2017, 1:07 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated `drop` log message from `ERROR` to `WARNING`.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 80d481b2ea1435147cd213383008435f35112d92 
> 
> Diff: https://reviews.apache.org/r/56569/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 56524: Added `drop` for overload to avoid custom logging.

2017-02-10 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56524/#review165225
---


Fix it, then Ship it!





src/master/master.cpp (line 3278)


-> instead of .get(). ?



src/master/master.cpp (line 4897)


ditto here.


- Benjamin Mahler


On Feb. 11, 2017, 1:02 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56524/
> ---
> 
> (Updated Feb. 11, 2017, 1:02 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Jay Guo.
> 
> 
> Bugs: MESOS-6638
> https://issues.apache.org/jira/browse/MESOS-6638
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added `drop` for overload to avoid custom logging.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 70f50f4264584be55312842b038c925687afb2cb 
>   src/master/master.cpp 80d481b2ea1435147cd213383008435f35112d92 
> 
> Diff: https://reviews.apache.org/r/56524/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 56568: Modified the executor driver to always relink on agent failover.

2017-02-10 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56568/#review165224
---


Ship it!




Ship It!

- Vinod Kone


On Feb. 11, 2017, 12:45 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56568/
> ---
> 
> (Updated Feb. 11, 2017, 12:45 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-7057
> https://issues.apache.org/jira/browse/MESOS-7057
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> A relink is needed in cases where a netfilter module like iptables
> can terminate the connection without notifying the executor. This
> results in the executor still trying to reuse the stale "half-open"
> connection upon receiving the reconnect message from the executor
> leading to the erroneous behavior.
> 
> 
> Diffs
> -
> 
>   src/exec/exec.cpp 95c2e19f06c1778bd837247fc96d46d32b53c00b 
> 
> Diff: https://reviews.apache.org/r/56568/diff/
> 
> 
> Testing
> ---
> 
> Performed the steps to reproduce on MESOS-5332 and verified the problem is 
> fixed with the relink.
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Review Request 56569: Updated `drop` log message from `ERROR` to `WARNING`.

2017-02-10 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56569/
---

Review request for mesos and Benjamin Mahler.


Repository: mesos


Description
---

Updated `drop` log message from `ERROR` to `WARNING`.


Diffs
-

  src/master/master.cpp 80d481b2ea1435147cd213383008435f35112d92 

Diff: https://reviews.apache.org/r/56569/diff/


Testing
---

make
make check


Thanks,

Guangya Liu



Re: Review Request 56527: Disallowed special path components in IDs.

2017-02-10 Thread James Peach


> On Feb 10, 2017, at 5:02 PM, Jiang Yan Xu  wrote:
> 
> 
> This is an automatically generated e-mail. To reply, visit: 
> https://reviews.apache.org/r/56527/
> 
> On February 10th, 2017, 10:11 a.m. PST, James Peach wrote:
> 
> src/common/validation.cpp (Diff revision 1)
> namespace mesos {
> 42
>   id == string{os::HOME_DIRECTORY}) {
> '~' is a bad choice for a ID, but by itself it is not a security issue. You 
> ought to check for id[0] != '~' (or just ban it anywhere in the string).
> "~" is "by itself" as much as a security issue as ".." right? 

Yep you are right, I forgot that case :)


> but yeah I overlooked other forms of Tilde-Expansion. As jpeach pointed out 
> offline, perhaps instead of disallowing certain charaters, it's easier to 
> only allow certain chars. We should discuss with the community on that 
> though. I'll drop "~" for now.
> 
> - Jiang Yan
> 
> 
> On February 9th, 2017, 11:05 p.m. PST, Jiang Yan Xu wrote:
> 
> Review request for mesos and James Peach.
> By Jiang Yan Xu.
> Updated Feb. 9, 2017, 11:05 p.m.
> 
> Bugs: MESOS-7086
> Repository: mesos
> Description
> 
> - Such IDs should lead to surprising or even dangerous agent side
> directory structure.
> Testing
> 
> make check.
> Diffs
> 
> src/common/validation.cpp (0f1a02286d8431acfee6136e8ada49b0ac746897)
> src/tests/master_validation_tests.cpp 
> (0c2649089d7fd29eb021ac75c71e6a74368577dc)
> src/tests/slave_validation_tests.cpp 
> (3d17799ed04951fb56524db0f5d89347192300b2)
> View Diff


Re: Review Request 56524: Added `drop` for overload to avoid custom logging.

2017-02-10 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56524/
---

(Updated 二月 11, 2017, 1:02 a.m.)


Review request for mesos, Benjamin Mahler and Jay Guo.


Bugs: MESOS-6638
https://issues.apache.org/jira/browse/MESOS-6638


Repository: mesos


Description
---

Added `drop` for overload to avoid custom logging.


Diffs (updated)
-

  src/master/master.hpp 70f50f4264584be55312842b038c925687afb2cb 
  src/master/master.cpp 80d481b2ea1435147cd213383008435f35112d92 

Diff: https://reviews.apache.org/r/56524/diff/


Testing
---

make
make check


Thanks,

Guangya Liu



Re: Review Request 56527: Disallowed special path components in IDs.

2017-02-10 Thread Jiang Yan Xu


> On Feb. 10, 2017, 10:11 a.m., James Peach wrote:
> > src/common/validation.cpp, line 42
> > 
> >
> > '~' is a bad choice for a ID, but by itself it is not a security issue. 
> > You ought to check for `id[0] != '~'` (or just ban it anywhere in the 
> > string).

"~" is "by itself" as much as a security issue as ".." right? but yeah I 
overlooked other forms of 
[Tilde-Expansion](https://www.gnu.org/software/bash/manual/html_node/Tilde-Expansion.html).
 As jpeach pointed out offline, perhaps instead of disallowing certain 
charaters, it's easier to only allow certain chars. We should discuss with the 
community on that though. I'll drop "~" for now.


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56527/#review165141
---


On Feb. 9, 2017, 11:05 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56527/
> ---
> 
> (Updated Feb. 9, 2017, 11:05 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-7086
> https://issues.apache.org/jira/browse/MESOS-7086
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> - Such IDs should lead to surprising or even dangerous agent side
> directory structure.
> 
> 
> Diffs
> -
> 
>   src/common/validation.cpp 0f1a02286d8431acfee6136e8ada49b0ac746897 
>   src/tests/master_validation_tests.cpp 
> 0c2649089d7fd29eb021ac75c71e6a74368577dc 
>   src/tests/slave_validation_tests.cpp 
> 3d17799ed04951fb56524db0f5d89347192300b2 
> 
> Diff: https://reviews.apache.org/r/56527/diff/
> 
> 
> Testing
> ---
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>



Re: Review Request 56524: Added `drop` for overload to avoid custom logging.

2017-02-10 Thread Guangya Liu


> On 二月 10, 2017, 9:27 p.m., Benjamin Mahler wrote:
> > src/master/master.cpp, lines 2246-2247
> > 
> >
> > Can you follow the same format at the drop two lines above?
> > 
> > ```
> >   LOG(ERROR) << "Dropping " << call.type() << " call"
> >  << " from framework " << *framework
> >  << ": " << message;
> > ```
> > 
> > Not yours, but just as an aside it seems like all of the drop logging 
> > should probably be at the warning instead of error level. Feel free to 
> > leave as is for now.

Added a `TODO` here to follow up the `WARNING` message.

```
// TODO(gyliu513): Update all `drop` logging to `WARNING`.
LOG(ERROR) << "Dropping " << call.type() << " call"
   << " from framework " << *framework
   << ": " << message;
```


- Guangya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56524/#review165176
---


On 二月 10, 2017, 1:42 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56524/
> ---
> 
> (Updated 二月 10, 2017, 1:42 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Jay Guo.
> 
> 
> Bugs: MESOS-6638
> https://issues.apache.org/jira/browse/MESOS-6638
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added `drop` for overload to avoid custom logging.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 767ac48f2a65514a19c4a3f8ca7b35f0259f9aeb 
>   src/master/master.cpp 620919ecfe85367b5c1281afc5216cc20e5e2e3c 
> 
> Diff: https://reviews.apache.org/r/56524/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 55710: Added agent capabilities to /state endpoint of master.

2017-02-10 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55710/#review165217
---



Looks good, the main issue is that it seems error-prone to store both the 
booleans and the raw capabilities within the struct in a non-const manner, 
since they can go out of sync. Left some suggestions below.


include/mesos/master/master.proto (line 310)


I don't think you need the `agent_` prefix, given we're within `Agent` 
here, any reason you added it?



src/common/protobuf_utils.hpp (lines 164 - 165)


This is unfortunate since the bool and the repeated field need to be kept 
consistent but there's nothing enforcing this.

How about a function that generates the repeated field?

```
google::protobuf::RepeatedPtrField toRepeatedField() 
const
{
  ...;
}
```

Alternatively, store the capabilities as you did here, but make the 
booleans functions that loop over the capabilities:

```
bool multiRole() const
{
  ...;
}
```



src/common/protobuf_utils.cpp (lines 654 - 658)


Can't you just do a single CopyFrom here?

```
agent.mutable_agent_capabilities()->CopyFrom(
slave.capabilities.capabilities);
```



src/master/http.cpp (lines 136 - 137)


Can you put each argument on its own line?

```
static void json(
JSON::StringWriter* writer, 
const SlaveInfo::Capability& capability)
```



src/tests/master_tests.cpp (line 4087)


We can just use the `->` operator instead of `.get().`, note that a lot of 
the existing tests still need to be swept to clean them up.


- Benjamin Mahler


On Feb. 9, 2017, 7:22 a.m., Jay Guo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55710/
> ---
> 
> (Updated Feb. 9, 2017, 7:22 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Guangya Liu, and 
> Michael Park.
> 
> 
> Bugs: MESOS-6902
> https://issues.apache.org/jira/browse/MESOS-6902
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master should be able to reflect agent capabilities via  `/state`(v0)
> and `getState`(v1) endpoints.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto a2228db8902c297f137b8106a7b3d2babbc35017 
>   include/mesos/v1/master/master.proto 
> cfdca7400d98233c6320eebd8784e4f51c43ebbd 
>   src/common/protobuf_utils.hpp 3ba689f30ae080ea9b2a0af4d819dd3e377c7bf5 
>   src/common/protobuf_utils.cpp ed84e9ae083c2dc6cd8b49d9ce8dc624a8607b8a 
>   src/master/http.cpp a598488296d4616c0126aa3bd4d1d7e7a6e439fe 
>   src/tests/master_tests.cpp 3b4123b49ee32c902a5d2a01fcc7026da21fdd18 
> 
> Diff: https://reviews.apache.org/r/55710/diff/
> 
> 
> Testing
> ---
> 
> make check GTEST_FILTER="MasterTest.StateEndpointAgentCapabilities"
> [   OK ] MasterTest.StateEndpointAgentCapabilities (85 ms)
> [--] 1 test from MasterTest (94 ms total)
> 
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (122 ms total)
> 
> make check on Ubuntu 14.04
> 
> 
> Thanks,
> 
> Jay Guo
> 
>



Re: Review Request 56568: Modified the executor driver to always relink on agent failover.

2017-02-10 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56568/#review165219
---


Ship it!




+1 for fewer executor deaths during agent recovery (the agent gives 2 seconds 
for the executor to respond to the reconnect message before killing the 
executor).

- Joseph Wu


On Feb. 10, 2017, 4:45 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56568/
> ---
> 
> (Updated Feb. 10, 2017, 4:45 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-7057
> https://issues.apache.org/jira/browse/MESOS-7057
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> A relink is needed in cases where a netfilter module like iptables
> can terminate the connection without notifying the executor. This
> results in the executor still trying to reuse the stale "half-open"
> connection upon receiving the reconnect message from the executor
> leading to the erroneous behavior.
> 
> 
> Diffs
> -
> 
>   src/exec/exec.cpp 95c2e19f06c1778bd837247fc96d46d32b53c00b 
> 
> Diff: https://reviews.apache.org/r/56568/diff/
> 
> 
> Testing
> ---
> 
> Performed the steps to reproduce on MESOS-5332 and verified the problem is 
> fixed with the relink.
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 56476: Enabled loading multiple HTTP authenticators in Mesos.

2017-02-10 Thread Greg Mann


> On Feb. 9, 2017, 2:10 p.m., Jan Schlicht wrote:
> > src/common/http.cpp, line 910
> > 
> >
> > s/name/httpAuthenticatorName/
> > Just `name` feels too general.

I went with the compromise of `authenticatorName`.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56476/#review164922
---


On Feb. 11, 2017, 12:46 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56476/
> ---
> 
> (Updated Feb. 11, 2017, 12:46 a.m.)
> 
> 
> Review request for mesos, Alexander Rojas, Jan Schlicht, Till Toenshoff, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-7004
> https://issues.apache.org/jira/browse/MESOS-7004
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the Mesos code to allow master and agent
> to load multiple HTTP authenticator modules.
> 
> 
> Diffs
> -
> 
>   src/common/http.cpp abfbf7248beb2d4068be06b7f5f829d7617f943e 
> 
> Diff: https://reviews.apache.org/r/56476/diff/
> 
> 
> Testing
> ---
> 
> Still need to add tests which test the master and agent with multiple 
> authenticator modules loaded.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 56476: Enabled loading multiple HTTP authenticators in Mesos.

2017-02-10 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56476/
---

(Updated Feb. 11, 2017, 12:46 a.m.)


Review request for mesos, Alexander Rojas, Jan Schlicht, Till Toenshoff, and 
Vinod Kone.


Changes
---

Renamed a variable.


Bugs: MESOS-7004
https://issues.apache.org/jira/browse/MESOS-7004


Repository: mesos


Description
---

This patch updates the Mesos code to allow master and agent
to load multiple HTTP authenticator modules.


Diffs (updated)
-

  src/common/http.cpp abfbf7248beb2d4068be06b7f5f829d7617f943e 

Diff: https://reviews.apache.org/r/56476/diff/


Testing
---

Still need to add tests which test the master and agent with multiple 
authenticator modules loaded.


Thanks,

Greg Mann



Review Request 56568: Modified the executor driver to always relink on agent failover.

2017-02-10 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56568/
---

Review request for mesos and Vinod Kone.


Bugs: MESOS-7057
https://issues.apache.org/jira/browse/MESOS-7057


Repository: mesos


Description
---

A relink is needed in cases where a netfilter module like iptables
can terminate the connection without notifying the executor. This
results in the executor still trying to reuse the stale "half-open"
connection upon receiving the reconnect message from the executor
leading to the erroneous behavior.


Diffs
-

  src/exec/exec.cpp 95c2e19f06c1778bd837247fc96d46d32b53c00b 

Diff: https://reviews.apache.org/r/56568/diff/


Testing
---

Performed the steps to reproduce on MESOS-5332 and verified the problem is 
fixed with the relink.


Thanks,

Anand Mazumdar



Re: Review Request 56474: Added support for multiple authenticators to libprocess.

2017-02-10 Thread Greg Mann


> On Feb. 9, 2017, 3:03 p.m., Jan Schlicht wrote:
> > 3rdparty/libprocess/src/authenticator_manager.cpp, lines 168-204
> > 
> >
> > How about we make this a function instead of using a lambda?
> > Seems to do a lot and even contains another lambda in line 180.

The outer lambda needs to be something like a mutable lambda so that it is 
instantiated once when `loop` is invoked, allowing us to capture the `results` 
list once and push onto it as we iterate. I suppose we could accomplish this 
with a struct bearing this code in its `()` operator, which we instantiate 
here, but I'm not sure I like that more. Perhaps there's a way of doing this 
that I'm not thinking of?

It would be possible to use `std::bind` with a static member function instead 
of the inner lambda, but since we use the `loop`-related constructs `Break` and 
`Continue` within that block, I think it actually makes the code a bit more 
readable to keep it all in one place. Let me know what you think.


> On Feb. 9, 2017, 3:03 p.m., Jan Schlicht wrote:
> > 3rdparty/libprocess/src/authenticator_manager.cpp, line 160
> > 
> >
> > Could also be `vector 
> > results(authenticators_[realm].size());`. Of course, 
> > `combineFailedRequests` would need to be changed respectively.

Yep good point, that is another option. I think I'm going to do some 
benchmarking of this patch vs. the old authentication code, so perhaps I can 
benchmark the list vs. vector in this case as well.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56474/#review164924
---


On Feb. 11, 2017, 12:44 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56474/
> ---
> 
> (Updated Feb. 11, 2017, 12:44 a.m.)
> 
> 
> Review request for mesos, Alexander Rojas, Jan Schlicht, Till Toenshoff, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-7004
> https://issues.apache.org/jira/browse/MESOS-7004
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the `AuthenticatorManager` to allow
> multiple authenticators to be set for a single realm.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/authenticator_manager.cpp 
> a22acd026a001788dc39b8005a56577e33c6800b 
> 
> Diff: https://reviews.apache.org/r/56474/diff/
> 
> 
> Testing
> ---
> 
> Testing information can be found in the subsequent patch in this chain.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 56474: Added support for multiple authenticators to libprocess.

2017-02-10 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56474/
---

(Updated Feb. 11, 2017, 12:44 a.m.)


Review request for mesos, Alexander Rojas, Jan Schlicht, Till Toenshoff, and 
Vinod Kone.


Changes
---

Addressed comments.


Bugs: MESOS-7004
https://issues.apache.org/jira/browse/MESOS-7004


Repository: mesos


Description
---

This patch updates the `AuthenticatorManager` to allow
multiple authenticators to be set for a single realm.


Diffs (updated)
-

  3rdparty/libprocess/src/authenticator_manager.cpp 
a22acd026a001788dc39b8005a56577e33c6800b 

Diff: https://reviews.apache.org/r/56474/diff/


Testing
---

Testing information can be found in the subsequent patch in this chain.


Thanks,

Greg Mann



Re: Review Request 56376: Updated allocator test to support create multi role framework.

2017-02-10 Thread Benjamin Mahler


> On Feb. 8, 2017, 6:48 a.m., Jay Guo wrote:
> > src/tests/hierarchical_allocator_tests.cpp, lines 203-205
> > 
> >
> > IMO, since this is for testing purpose, we probably don't need to 
> > rigidly require test writers to construct frameworkInfo with `MULTI_ROLE` 
> > capability. If `createFrameworkInfo` is called with multiple roles, we 
> > could simply add `MULTI_ROLE` implicitly. And one could still explicitly 
> > add `MULTI_ROLE` to capabilities with a single role. Then we don't need to 
> > change existing test cases and avoid future confusion. What do you think?
> 
> Guangya Liu wrote:
> Yes, but the only problem is that we cannot specfify multiple roles with 
> the first paramter here `const string& role`, so I have to update it to 
> `const set& roles`.
> 
> Jay Guo wrote:
> I agree that tests need to be updated to pass in `set`. However, 
> I feel return type `Try` is a bit complicated. In this form, 
> should we perform `isSome()` check? I would suggest to stick to 
> `FrameworkInfo createFrameworkInfo(...)`.
> 
> Guangya Liu wrote:
> As I added some error handling as folllowing, so I have to return `Try` 
> here, any comments? 
> 
> ```
> if (!multiRole && roles.size() > 1) {
>   return Error("Non multi-role framework support only one role");
> }
> ```
> 
> Jay Guo wrote:
> As I mentioned previously, we could add `MULTI_ROLE` implicitly for this 
> case. This method is for testing purpose and we could expect a test writer to 
> be aware of the implications, instead of writing unecessary validation code 
> in there test (to handle `Try<>`).

It seems to me we should just always inject the MULTI_ROLE capability since the 
allocator never looks at whether the framework is MULTI_ROLE capable. The logic 
between a non-MULTI_ROLE scheduler and a single role MULTI_ROLE scheduler is 
the same as far as the allocator is concerned. And let's document why we chose 
to do this. How does that sound?

Just as an aside, you can use `FAIL() << "XXX"` if the caller makes a 
programming error that you want to guard against. Unlike CHECK, it doesn't 
crash the program, rather it just fails the currently running test. But if we 
were to always inject AllocationInfo


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56376/#review164638
---


On Feb. 7, 2017, 10:10 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56376/
> ---
> 
> (Updated Feb. 7, 2017, 10:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Jay Guo.
> 
> 
> Bugs: MESOS-6638
> https://issues.apache.org/jira/browse/MESOS-6638
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated allocator test to support create multi role framework.
> 
> 
> Diffs
> -
> 
>   src/tests/hierarchical_allocator_tests.cpp 
> c681d03c3f94f7d071143366a5aad0421108ebec 
> 
> Diff: https://reviews.apache.org/r/56376/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 56476: Enabled loading multiple HTTP authenticators in Mesos.

2017-02-10 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56476/
---

(Updated Feb. 11, 2017, 12:24 a.m.)


Review request for mesos, Alexander Rojas, Jan Schlicht, Till Toenshoff, and 
Vinod Kone.


Changes
---

Fixed indentation.


Bugs: MESOS-7004
https://issues.apache.org/jira/browse/MESOS-7004


Repository: mesos


Description
---

This patch updates the Mesos code to allow master and agent
to load multiple HTTP authenticator modules.


Diffs (updated)
-

  src/common/http.cpp abfbf7248beb2d4068be06b7f5f829d7617f943e 

Diff: https://reviews.apache.org/r/56476/diff/


Testing
---

Still need to add tests which test the master and agent with multiple 
authenticator modules loaded.


Thanks,

Greg Mann



Re: Review Request 56378: Added test case for suppress and revive with multi role framework.

2017-02-10 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56378/#review165214
---


Fix it, then Ship it!





src/tests/hierarchical_allocator_tests.cpp (line 3690)


Hm.. why did you need this?



src/tests/hierarchical_allocator_tests.cpp (line 3703)


Did you mean to do 1000.0 or 1000? How about using something like 
`Days(1).secs()`?



src/tests/hierarchical_allocator_tests.cpp (lines 3723 - 3724)


How about putting this above the revive call? E.g.

```
  // Revive offers for role1, after which the agent's resources
  // should be offered to it.
  allocator->reviveOffers(framework.id(), "role1");

  expected = Allocation(
  framework.id(),
  {{"role1", {{agent.id(), agent.resources());

  AWAIT_EXPECT_EQ(expected, allocation);
```


- Benjamin Mahler


On Feb. 7, 2017, 10:10 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56378/
> ---
> 
> (Updated Feb. 7, 2017, 10:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Jay Guo.
> 
> 
> Bugs: MESOS-6638
> https://issues.apache.org/jira/browse/MESOS-6638
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added test case for suppress and revive with multi role framework.
> 
> 
> Diffs
> -
> 
>   src/tests/hierarchical_allocator_tests.cpp 
> c681d03c3f94f7d071143366a5aad0421108ebec 
> 
> Diff: https://reviews.apache.org/r/56378/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
> 
> ```
> ./bin/mesos-tests.sh  
> --gtest_filter="HierarchicalAllocatorTest.SuppressAndReviveOffersWithMultiRole"
>  --gtest_repeat=100
> ```
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 56537: Added a test to ensure framework can upgrade to support MULTI_ROLE.

2017-02-10 Thread Benjamin Mahler


> On Feb. 10, 2017, 9:33 a.m., Jay Guo wrote:
> > src/tests/master_tests.cpp, lines 6409-6411
> > 
> >
> > How to check the task is still running here?

If no status update is received we know it's still running. To check it more 
explicitly we can perform reconciliation.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56537/#review165087
---


On Feb. 10, 2017, 9:32 a.m., Jay Guo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56537/
> ---
> 
> (Updated Feb. 10, 2017, 9:32 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Guangya Liu, and 
> Michael Park.
> 
> 
> Bugs: MESOS-7035
> https://issues.apache.org/jira/browse/MESOS-7035
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> A framework can be upgraded to be MULTI_ROLE capable even with task
> running, as long as it does not change role while upgrading.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 3b4123b49ee32c902a5d2a01fcc7026da21fdd18 
> 
> Diff: https://reviews.apache.org/r/56537/diff/
> 
> 
> Testing
> ---
> 
> WIP
> 
> 
> Thanks,
> 
> Jay Guo
> 
>



Re: Review Request 56375: Added a test to ensure offers with different roles cannot be combined.

2017-02-10 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56375/#review165041
---




src/tests/master_tests.cpp (lines 1905 - 1909)


Do you want to add a TODO to test the other operations as well?



src/tests/master_tests.cpp (lines 1932 - 1936)


We should probably clear_role() here in order to not assume that 
DEFAULT_FRAMEWORK_INFO has not set the role field.



src/tests/master_tests.cpp (lines 1986 - 1987)


Is it possible to pause the clock for the whole test to ensure we're not 
depending on time?



src/tests/master_tests.cpp (line 1993)


You can use the -> operator (e.g. `offers2->size()` or `offers2->at(0)`), 
feel free to sweep this file in a later patch.



src/tests/master_tests.cpp (lines 1997 - 1998)


Does EXPECT_SOME_EQ not work here?



src/tests/master_tests.cpp (line 2032)


This is a little hard to figure out, can you document their allocations to 
describe how we know that fair sharing will make these offers to different 
roles? Also, it seems to me we don't know which of the two roles will get the 
initial offer?


- Benjamin Mahler


On Feb. 7, 2017, 7:34 a.m., Jay Guo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56375/
> ---
> 
> (Updated Feb. 7, 2017, 7:34 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Guangya Liu.
> 
> 
> Bugs: MESOS-6637
> https://issues.apache.org/jira/browse/MESOS-6637
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Multi-role framework cannot combine offers allocated to different
> roles of that framework in a single launchTask call.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 3b4123b49ee32c902a5d2a01fcc7026da21fdd18 
> 
> Diff: https://reviews.apache.org/r/56375/diff/
> 
> 
> Testing
> ---
> 
> Added a test
> make check GTEST_FILTER="MasterTest.LaunchDifferentRoleLost"
> 
> 
> Thanks,
> 
> Jay Guo
> 
>



Re: Review Request 56392: Tightened test expecations in reservation-related tests.

2017-02-10 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56392/#review165210
---



Looks good, you'll need to rebase this after you update your previous patch.


src/tests/master_validation_tests.cpp (line 428)


Not yours, but we should take the backticks out the log message in a follow 
up.


- Benjamin Mahler


On Feb. 10, 2017, 3:16 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56392/
> ---
> 
> (Updated Feb. 10, 2017, 3:16 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This change makes expected error strings in reservation
> validation-related tests explicit. This is to make sure that the
> observed errors match the expected ones.
> 
> 
> Diffs
> -
> 
>   src/tests/master_validation_tests.cpp 
> 0c2649089d7fd29eb021ac75c71e6a74368577dc 
> 
> Diff: https://reviews.apache.org/r/56392/diff/
> 
> 
> Testing
> ---
> 
> make check (OS X)
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 55462: Validate resource reservation against allocated role.

2017-02-10 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55462/#review165204
---




src/master/validation.cpp (lines 1612 - 1629)


Looks like this needs a rebase against master? This block was removed and 
the roles are no longer an option.



src/master/validation.cpp (lines 1659 - 1695)


See my response to the thread in the previous review comments about 
validating case (2) as well so that we can provide a better error message to 
the operators.



src/master/validation.cpp (lines 1665 - 1693)


Is it possible to get the open and close quotes on the same lines for these 
error messages? When on different lines we tend to forget to close them, e.g.

```
return Error(
"A reserve operation was attempted for a resource allocated to 
role"
" '" + resource.role() + "'"
", but the framework only has roles"
" '" + stringify(frameworkRoles.get()) + "'");
```



src/master/validation.cpp (lines 1667 - 1668)


How about:

A reserve operation was attempted on unallocated resources, but frameworks 
can only perform reservations on allocated resources

Or:

A reserve operation was attempted on unallocated resource 'XXX', but 
frameworks can only perform reservations on allocated resources



src/tests/master_validation_tests.cpp (lines 420 - 446)


Looks good, did you want to test the other cases as well?


- Benjamin Mahler


On Feb. 10, 2017, 3:16 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55462/
> ---
> 
> (Updated Feb. 10, 2017, 3:16 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Bugs: MESOS-6730
> https://issues.apache.org/jira/browse/MESOS-6730
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This change introduces validation of the 'AllocationInfo' of resources
> used in reservations.
> 
> 
> Diffs
> -
> 
>   src/master/validation.cpp 2bfca95713319a195e43e990cbc2dc5570b89c4e 
>   src/tests/master_validation_tests.cpp 
> 0c2649089d7fd29eb021ac75c71e6a74368577dc 
> 
> Diff: https://reviews.apache.org/r/55462/diff/
> 
> 
> Testing
> ---
> 
> N/A yet.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 55462: Validate resource reservation against allocated role.

2017-02-10 Thread Benjamin Mahler


> On Feb. 10, 2017, 2:39 a.m., Benjamin Mahler wrote:
> > src/master/validation.cpp, lines 1623-1630
> > 
> >
> > Just to clarify, this is an invariant in that if this fails it is a 
> > programming error in the master, right? It seems to me we have two 
> > invariants here:
> > 
> > (1) Coming from framework: framework info is set and resources are 
> > allocated (master enforces this when applying operations).
> > 
> > (2) Coming from operator: framework info is not set and resources are 
> > unallocated (master doesn't enforce this).
> > 
> > We should clarify this. Also, `!resource.allocation_info().has_role()` 
> > is sufficient here if you want to be more succinct.
> 
> Benjamin Bannier wrote:
> I am not sure these cases are invariants *for this function*, and it 
> seems us ensuring correct behavior with tests rather then fail hard here 
> would decrease coupling between validation logic and apply operations and be 
> just as good.
> 
> The case (1) we do explicitly handle here seems interesting since it 
> points to errors in framework code (e.g., incorrect handling of offered 
> resources). Case (2) seems less likely as it would mean that the operator 
> would set more information than needed or used by the master. In any case, 
> every caller of the validation function would need to normalize the resulting 
> operation in order to deal with the different possible inputs (e.g., (1) or 
> (2) here).
> 
> Does that make sense?

Yeah this was just to clarify we had the same understanding, and it sounds like 
we do. I'm also in favor of not failing hard here.
The other thing I would mention to be sure we both understand the same is that 
the operations won't apply later if case (1) or case (2) don't hold, even if we 
don't validate against these cases explicitly here.

I wonder if we should also be validating case (2) since we're validating case 
(1), since validating it here allows us to provide a more informative message. 
E.g. "A reserve operation was attempted on allocated resources, but operators 
can only reserve available resources". Feel free to add it in this patch or 
another patch.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55462/#review165056
---


On Feb. 10, 2017, 3:16 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55462/
> ---
> 
> (Updated Feb. 10, 2017, 3:16 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Bugs: MESOS-6730
> https://issues.apache.org/jira/browse/MESOS-6730
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This change introduces validation of the 'AllocationInfo' of resources
> used in reservations.
> 
> 
> Diffs
> -
> 
>   src/master/validation.cpp 2bfca95713319a195e43e990cbc2dc5570b89c4e 
>   src/tests/master_validation_tests.cpp 
> 0c2649089d7fd29eb021ac75c71e6a74368577dc 
> 
> Diff: https://reviews.apache.org/r/55462/diff/
> 
> 
> Testing
> ---
> 
> N/A yet.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 56392: Tightened test expecations in reservation-related tests.

2017-02-10 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56392/#review165203
---



Patch looks great!

Reviews applied: [55461, 56391, 55462, 56392]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 10, 2017, 3:16 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56392/
> ---
> 
> (Updated Feb. 10, 2017, 3:16 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This change makes expected error strings in reservation
> validation-related tests explicit. This is to make sure that the
> observed errors match the expected ones.
> 
> 
> Diffs
> -
> 
>   src/tests/master_validation_tests.cpp 
> 0c2649089d7fd29eb021ac75c71e6a74368577dc 
> 
> Diff: https://reviews.apache.org/r/56392/diff/
> 
> 
> Testing
> ---
> 
> make check (OS X)
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 56391: Set AllocationInfo in some reservation-related tests.

2017-02-10 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56391/#review165202
---


Ship it!




Ship It!

- Benjamin Mahler


On Feb. 10, 2017, 2:57 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56391/
> ---
> 
> (Updated Feb. 10, 2017, 2:57 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In a later change we will take a 'Resource''s 'AllocationInfo' into
> account. Update a number of tests related to resource validation to
> set 'AllocationInfo' so continue to trigger the originally expected
> errors.
> 
> 
> Diffs
> -
> 
>   src/tests/master_validation_tests.cpp 
> 0c2649089d7fd29eb021ac75c71e6a74368577dc 
> 
> Diff: https://reviews.apache.org/r/56391/diff/
> 
> 
> Testing
> ---
> 
> make check (OS X)
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 56525: Fixed a crash on the agent when handling the SIGUSR1 signal.

2017-02-10 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56525/#review165200
---


Ship it!




Ship It!

- Joseph Wu


On Feb. 9, 2017, 9:34 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56525/
> ---
> 
> (Updated Feb. 9, 2017, 9:34 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-7102
> https://issues.apache.org/jira/browse/MESOS-7102
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There were some actors that were not being destructed when
> `finalize()` was being invoked. Also fixed the order of the
> destruction of objects i.e., in the reverse order of their
> creation.
> 
> 
> Diffs
> -
> 
>   src/slave/main.cpp b9f9748d98775b2a511ded35bbf72099bfeeba64 
> 
> Diff: https://reviews.apache.org/r/56525/diff/
> 
> 
> Testing
> ---
> 
> make check, manually tested to ensure we don't crash on a `SIGUSR1` signal.
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 56362: Windows: Create the Windows container launcher.

2017-02-10 Thread Joseph Wu


> On Feb. 9, 2017, 4:24 p.m., Joseph Wu wrote:
> > src/slave/containerizer/mesos/windows_launcher.cpp, line 185
> > 
> >
> > s/container.get().handle/container->handle/
> 
> Andrew Schwartzmeyer wrote:
> Ah thanks, didn't know `Option` had `operator->()` defined this way. Why 
> don't we use this elsewhere?

The operator wasn't always available.  We first added it to `Try<>` and then to 
`Option<>`; for the sake of readability.

We do use it in a couple places, and we like new code to use it where 
appropriate.


- Joseph


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56362/#review165027
---


On Feb. 9, 2017, 10:50 a.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56362/
> ---
> 
> (Updated Feb. 9, 2017, 10:50 a.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
> https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Instead of deriving `WindowsLauncher` from `PosixLauncher`,
> this commit implements a separate `WindowsLauncher` derived
> from `Launcher` parallel to the `PosixLauncher`.
> 
> This launcher is then refactored to use Windows' "Job Objects," which
> are similar to Linux's cgroups, and enable us to reason about a group
> of processes associated with a task/container as a "Job Object"
> instead of a root PID and the tree containing its children. The latter
> is not a reasonable approach on Windows, and has been the source of
> subtle bugs.
> 
> The Job Object approach creates a named job with a one-to-one mapping
> to the containerizer, and assigns the first process started for the
> task to the job object. After being assigned, the Windows kernel
> ensures all spawned child processes
> (specifically those made with the `CreateProcess` system call) are
> also assigned to the named job object. Thus this job object can then
> be used to query the process group's resource usage, kill the process
> group, and set limits on the process group.
> 
> So instead of seeing a process group as a tree and referring to it by
> the root process's PID, the `WindowsLauncher` sees a process group as
> a named Job Object, the same way the Windows kernel sees it. However,
> the containerizer code which interacts with the launcher still refers
> to a task group by the singular PID, and so we have a sort of shim
> which maps the initial PID to the name of the job object. This is a an
> unfortunate consequence of the shared containerizer code being
> originally written for POSIX-like systems.
> 
> This abstraction sets us up for implementing resource usage limits on
> the process group as a whole.
> 
> 
> Diffs
> -
> 
>   src/CMakeLists.txt 3a4ace9c8011ac8eec5067cd085fa7fe4166b9ee 
>   src/slave/containerizer/mesos/containerizer.cpp 
> d2b4f75a55dbe4746bc2dfc180335fa831a554ef 
>   src/slave/containerizer/mesos/launcher.hpp 
> 79f6eea0ee8e564e90b36208672df150dbc5d540 
>   src/slave/containerizer/mesos/launcher.cpp 
> 5114c130efbfb252dde1e85c081f5174e66f57af 
>   src/slave/containerizer/mesos/windows_launcher.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/windows_launcher.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/56362/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Re: Review Request 56362: Windows: Create the Windows container launcher.

2017-02-10 Thread Andrew Schwartzmeyer


> On Feb. 10, 2017, 12:24 a.m., Joseph Wu wrote:
> > src/slave/containerizer/mesos/windows_launcher.cpp, line 185
> > 
> >
> > s/container.get().handle/container->handle/

Ah thanks, didn't know `Option` had `operator->()` defined this way. Why don't 
we use this elsewhere?


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56362/#review165027
---


On Feb. 9, 2017, 6:50 p.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56362/
> ---
> 
> (Updated Feb. 9, 2017, 6:50 p.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
> https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Instead of deriving `WindowsLauncher` from `PosixLauncher`,
> this commit implements a separate `WindowsLauncher` derived
> from `Launcher` parallel to the `PosixLauncher`.
> 
> This launcher is then refactored to use Windows' "Job Objects," which
> are similar to Linux's cgroups, and enable us to reason about a group
> of processes associated with a task/container as a "Job Object"
> instead of a root PID and the tree containing its children. The latter
> is not a reasonable approach on Windows, and has been the source of
> subtle bugs.
> 
> The Job Object approach creates a named job with a one-to-one mapping
> to the containerizer, and assigns the first process started for the
> task to the job object. After being assigned, the Windows kernel
> ensures all spawned child processes
> (specifically those made with the `CreateProcess` system call) are
> also assigned to the named job object. Thus this job object can then
> be used to query the process group's resource usage, kill the process
> group, and set limits on the process group.
> 
> So instead of seeing a process group as a tree and referring to it by
> the root process's PID, the `WindowsLauncher` sees a process group as
> a named Job Object, the same way the Windows kernel sees it. However,
> the containerizer code which interacts with the launcher still refers
> to a task group by the singular PID, and so we have a sort of shim
> which maps the initial PID to the name of the job object. This is a an
> unfortunate consequence of the shared containerizer code being
> originally written for POSIX-like systems.
> 
> This abstraction sets us up for implementing resource usage limits on
> the process group as a whole.
> 
> 
> Diffs
> -
> 
>   src/CMakeLists.txt 3a4ace9c8011ac8eec5067cd085fa7fe4166b9ee 
>   src/slave/containerizer/mesos/containerizer.cpp 
> d2b4f75a55dbe4746bc2dfc180335fa831a554ef 
>   src/slave/containerizer/mesos/launcher.hpp 
> 79f6eea0ee8e564e90b36208672df150dbc5d540 
>   src/slave/containerizer/mesos/launcher.cpp 
> 5114c130efbfb252dde1e85c081f5174e66f57af 
>   src/slave/containerizer/mesos/windows_launcher.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/windows_launcher.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/56362/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Re: Review Request 56052: Added the 'Secret' protobuf message.

2017-02-10 Thread Greg Mann


> On Feb. 9, 2017, 6:31 p.m., Jie Yu wrote:
> > include/mesos/mesos.proto, line 1994
> > 
> >
> > Actually, one thought here. Given that environment varaible's source 
> > cannot be a byte stream. Do we want to have explicit typing: REFERENCE, 
> > TEXT, BYTES for secret so that the validation for environment source can be 
> > easier and more explicit?
> 
> Greg Mann wrote:
> When a user specifies a REFERENCE-type Secret, would they need to be able 
> to specify text vs. bytes in that case? If so, then we might want two 
> different sets of type information. I don't know if a user's module 
> implementation might need to know the encoding (or lack thereof) of a 
> REFERENCE Secret when fetching it?
> 
> If we don't think the REFERENCE-type secret needs any encoding 
> information, then we could use {REFERENCE, TEXT} for the enum types in this 
> patch, so that we could add a BYTES type at the top level later. Another 
> option would be to include type information inside the Value message, but 
> this adds a bit more complexity to the message.
> 
> Jie Yu wrote:
> Yes, you're right that this also applies to REFERENCE type secret. What 
> happen in the future that we start to support REFERENCE type through modules. 
> What the agent should do for environment variable. Does the agent need to do 
> base64 encoding (as you mentioned, how does the agent know that the content 
> received is a string or byte stream, and how does the user know?), or leave 
> it as a plain string?
> 
> Looks like k8s secret always do base64 encoding? Maybe that's how they 
> solve the issue?
> 
> Let's do our due diligence to evaluate all the options thoroughly because 
> this is a very important API.

Yea it looks like k8s solves this by always base64-encoding the secret data. 
Since k8s provides its own secret-store, they can establish this constraint 
easily. In our case, if we want to support arbitrary back-ends, we can't 
necessarily make this assumption. For example, see the API section of the docs 
for the Hashicorp Vault PKI backend: 
https://www.vaultproject.io/docs/secrets/pki/. Some of these endpoints return 
raw binary data. If we require that a secret fetcher module be used to populate 
REFERENCE type secrets, then we could still require base64-encoding, and the 
secret fetcher could be responsible for encoding fetched secrets appropriately.

Let's consider a concrete example: a cluster using a Hashicorp Vault secret 
store which has two back-ends loaded: 'generic' (the default) and 'pki'. Given 
the current `Secret` design in this patch, a fetcher module could infer the 
correct back-end to use via the `reference.name`. For example, the module could 
map these names directly onto the Vault API: a name of '/pki/ca' would grab the 
PKI back-end's CA, and a name of '/pki/cert/' could grab a particular 
certificate from the PKI back-end. A name of '/generic/my_secret' could fetch 
the secret named "my_secret" in the generic back-end.

In any case, if we wanted to mandate only base64-encoded secrets within Mesos, 
the secret fetcher could be aware of the type of data received from the 
back-end's API and encode it appropriately, if necessary. Or, we could allow 
binary secrets and include explicit type information to distinguish between 
text and binary. If we do this, however, it's not clear to me that the user 
launching a task should need to specify whether the secret is text or bytes - 
perhaps the secret fetcher could determine this from a particular secret name, 
or perhaps some secret stores may return both text and binary secrets from the 
same API call? I'm not sure, need to do some more research.

If we want to provide a more generic HTTP-based secret-fetching model in which 
a user could specify an arbitrary endpoint that would return the desired 
secret, then it makes sense for that user to specify text vs. bytes. However, 
in that case it's not clear to me where the generic fetcher gets the 
credentials needed to fetch the secret itself.

This brings me to another important issue regarding secret fetching: how does 
the fetcher establish that the executor or task in question is authorized to 
fetch the specified secret? With this proto, we're attempting to provide a 
first-class object permitting the secure fetching of secrets; do we also want 
to provide a first-class identity associated with the executor/task for the 
purpose of authorizing the secret-fetching?


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56052/#review164960
---


On Feb. 9, 2017, 6:33 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> 

Re: Review Request 56362: Windows: Create the Windows container launcher.

2017-02-10 Thread Andrew Schwartzmeyer


> On Feb. 10, 2017, 12:24 a.m., Joseph Wu wrote:
> > src/slave/containerizer/mesos/windows_launcher.cpp, line 148
> > 
> >
> > Ditto about the constructor.

Ditto on the fix.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56362/#review165027
---


On Feb. 9, 2017, 6:50 p.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56362/
> ---
> 
> (Updated Feb. 9, 2017, 6:50 p.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
> https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Instead of deriving `WindowsLauncher` from `PosixLauncher`,
> this commit implements a separate `WindowsLauncher` derived
> from `Launcher` parallel to the `PosixLauncher`.
> 
> This launcher is then refactored to use Windows' "Job Objects," which
> are similar to Linux's cgroups, and enable us to reason about a group
> of processes associated with a task/container as a "Job Object"
> instead of a root PID and the tree containing its children. The latter
> is not a reasonable approach on Windows, and has been the source of
> subtle bugs.
> 
> The Job Object approach creates a named job with a one-to-one mapping
> to the containerizer, and assigns the first process started for the
> task to the job object. After being assigned, the Windows kernel
> ensures all spawned child processes
> (specifically those made with the `CreateProcess` system call) are
> also assigned to the named job object. Thus this job object can then
> be used to query the process group's resource usage, kill the process
> group, and set limits on the process group.
> 
> So instead of seeing a process group as a tree and referring to it by
> the root process's PID, the `WindowsLauncher` sees a process group as
> a named Job Object, the same way the Windows kernel sees it. However,
> the containerizer code which interacts with the launcher still refers
> to a task group by the singular PID, and so we have a sort of shim
> which maps the initial PID to the name of the job object. This is a an
> unfortunate consequence of the shared containerizer code being
> originally written for POSIX-like systems.
> 
> This abstraction sets us up for implementing resource usage limits on
> the process group as a whole.
> 
> 
> Diffs
> -
> 
>   src/CMakeLists.txt 3a4ace9c8011ac8eec5067cd085fa7fe4166b9ee 
>   src/slave/containerizer/mesos/containerizer.cpp 
> d2b4f75a55dbe4746bc2dfc180335fa831a554ef 
>   src/slave/containerizer/mesos/launcher.hpp 
> 79f6eea0ee8e564e90b36208672df150dbc5d540 
>   src/slave/containerizer/mesos/launcher.cpp 
> 5114c130efbfb252dde1e85c081f5174e66f57af 
>   src/slave/containerizer/mesos/windows_launcher.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/windows_launcher.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/56362/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Re: Review Request 56362: Windows: Create the Windows container launcher.

2017-02-10 Thread Andrew Schwartzmeyer


> On Feb. 10, 2017, 12:24 a.m., Joseph Wu wrote:
> > src/slave/containerizer/mesos/windows_launcher.cpp, line 78
> > 
> >
> > Since you are assigning the `Container` is this way, you don't actually 
> > need to specify the constructor in the header.
> > 
> > So either remove the constructor, or use it in this file.
> 
> Andrew Schwartzmeyer wrote:
> I'm not sure about this. I didn't add the constructor until I compiled 
> and _couldn't_ brace initialize without it. That said, I have some default 
> member initializers. Jo, are we using C++11 or C++17?
> 
> Oh, I forgot I needed to click publish. This is the error you get with 
> MSVC if you use the brace initializer without a constructor:
> 
> ```
> 
> 17>C:\Users\andschwa\src\mesos\src\slave\containerizer\mesos\windows_launcher.cpp(78):
>  error C2440: 'initializing': cannot convert from 'initializer list' to 
> 'mesos::internal::slave::WindowsLauncher::Container'
> ```
> 
> Now trying without member initializers...

Yup. I guess we're strictly C++11, and not say C++14. If I'm understanding 
[this](http://en.cppreference.com/w/cpp/language/aggregate_initialization) 
correctly, until C++14 you can't aggregate initialize with default memeber 
initializers. Fixing.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56362/#review165027
---


On Feb. 9, 2017, 6:50 p.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56362/
> ---
> 
> (Updated Feb. 9, 2017, 6:50 p.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
> https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Instead of deriving `WindowsLauncher` from `PosixLauncher`,
> this commit implements a separate `WindowsLauncher` derived
> from `Launcher` parallel to the `PosixLauncher`.
> 
> This launcher is then refactored to use Windows' "Job Objects," which
> are similar to Linux's cgroups, and enable us to reason about a group
> of processes associated with a task/container as a "Job Object"
> instead of a root PID and the tree containing its children. The latter
> is not a reasonable approach on Windows, and has been the source of
> subtle bugs.
> 
> The Job Object approach creates a named job with a one-to-one mapping
> to the containerizer, and assigns the first process started for the
> task to the job object. After being assigned, the Windows kernel
> ensures all spawned child processes
> (specifically those made with the `CreateProcess` system call) are
> also assigned to the named job object. Thus this job object can then
> be used to query the process group's resource usage, kill the process
> group, and set limits on the process group.
> 
> So instead of seeing a process group as a tree and referring to it by
> the root process's PID, the `WindowsLauncher` sees a process group as
> a named Job Object, the same way the Windows kernel sees it. However,
> the containerizer code which interacts with the launcher still refers
> to a task group by the singular PID, and so we have a sort of shim
> which maps the initial PID to the name of the job object. This is a an
> unfortunate consequence of the shared containerizer code being
> originally written for POSIX-like systems.
> 
> This abstraction sets us up for implementing resource usage limits on
> the process group as a whole.
> 
> 
> Diffs
> -
> 
>   src/CMakeLists.txt 3a4ace9c8011ac8eec5067cd085fa7fe4166b9ee 
>   src/slave/containerizer/mesos/containerizer.cpp 
> d2b4f75a55dbe4746bc2dfc180335fa831a554ef 
>   src/slave/containerizer/mesos/launcher.hpp 
> 79f6eea0ee8e564e90b36208672df150dbc5d540 
>   src/slave/containerizer/mesos/launcher.cpp 
> 5114c130efbfb252dde1e85c081f5174e66f57af 
>   src/slave/containerizer/mesos/windows_launcher.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/windows_launcher.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/56362/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Re: Review Request 56178: Enabled the authorizer to work with MULTI_ROLE frameworks.

2017-02-10 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56178/#review165182
---



Patch looks great!

Reviews applied: [56178]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 10, 2017, 11:27 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56178/
> ---
> 
> (Updated Feb. 10, 2017, 11:27 a.m.)
> 
> 
> Review request for mesos, Adam B, Alexander Rojas, and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7022
> https://issues.apache.org/jira/browse/MESOS-7022
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This updates the local authorizer so that MULTI_ROLE frameworks can be
> authorized.
> 
> For non-MULTI_ROLE frameworks we continue to support use of the
> deprecated 'value' field in the authorization request's 'Object';
> however for MULTI_ROLE frameworks the 'value' field will not be set,
> and authorizers still relying on it should be updated to instead use
> the object's 'framework_info' field to extract roles to authorize
> against from.
> 
> 
> Diffs
> -
> 
>   src/authorizer/local/authorizer.cpp 
> b98e1fcdf2ee5ec1f6ac0be6f8accdefaa390a09 
>   src/master/master.cpp 620919ecfe85367b5c1281afc5216cc20e5e2e3c 
> 
> Diff: https://reviews.apache.org/r/56178/diff/
> 
> 
> Testing
> ---
> 
> Tested on various configurations in internal CI.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 56362: Windows: Create the Windows container launcher.

2017-02-10 Thread Andrew Schwartzmeyer


> On Feb. 10, 2017, 12:24 a.m., Joseph Wu wrote:
> > src/slave/containerizer/mesos/windows_launcher.cpp, lines 59-60
> > 
> >
> > I recall some talk about dis-allowing certain `std::` helpers on the 
> > `hashmap` and `hashset` classes.  In which case, we'd need to rethink 
> > https://reviews.apache.org/r/56363
> > 
> > Anyway, for this method, I'd recommend constructing a `hashset` of PIDs 
> > and checking/updating this set while recovering.  Because O(N).
> 
> Andrew Schwartzmeyer wrote:
> Is this really a CPU bottleneck on the agent here? The `PosixLauncher` 
> does the [same O(n) 
> operation](https://github.com/andschwa/mesos/blob/master/src/slave/containerizer/mesos/launcher.cpp#L62)
>  (I point it out because this code is pretty much just a derivation of the 
> former).

(I don't mind a better operation; where `n` is `containers.size()` and `p` is 
number of PIDs to check, one time `O(n)` cache of container PIDs to a hashmap 
upfront, followed by `O(p)` lookups afterward for `O(p)` instead of `O(p*n)`. 
But then we should do this for the `PosixLauncher` too.)


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56362/#review165027
---


On Feb. 9, 2017, 6:50 p.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56362/
> ---
> 
> (Updated Feb. 9, 2017, 6:50 p.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
> https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Instead of deriving `WindowsLauncher` from `PosixLauncher`,
> this commit implements a separate `WindowsLauncher` derived
> from `Launcher` parallel to the `PosixLauncher`.
> 
> This launcher is then refactored to use Windows' "Job Objects," which
> are similar to Linux's cgroups, and enable us to reason about a group
> of processes associated with a task/container as a "Job Object"
> instead of a root PID and the tree containing its children. The latter
> is not a reasonable approach on Windows, and has been the source of
> subtle bugs.
> 
> The Job Object approach creates a named job with a one-to-one mapping
> to the containerizer, and assigns the first process started for the
> task to the job object. After being assigned, the Windows kernel
> ensures all spawned child processes
> (specifically those made with the `CreateProcess` system call) are
> also assigned to the named job object. Thus this job object can then
> be used to query the process group's resource usage, kill the process
> group, and set limits on the process group.
> 
> So instead of seeing a process group as a tree and referring to it by
> the root process's PID, the `WindowsLauncher` sees a process group as
> a named Job Object, the same way the Windows kernel sees it. However,
> the containerizer code which interacts with the launcher still refers
> to a task group by the singular PID, and so we have a sort of shim
> which maps the initial PID to the name of the job object. This is a an
> unfortunate consequence of the shared containerizer code being
> originally written for POSIX-like systems.
> 
> This abstraction sets us up for implementing resource usage limits on
> the process group as a whole.
> 
> 
> Diffs
> -
> 
>   src/CMakeLists.txt 3a4ace9c8011ac8eec5067cd085fa7fe4166b9ee 
>   src/slave/containerizer/mesos/containerizer.cpp 
> d2b4f75a55dbe4746bc2dfc180335fa831a554ef 
>   src/slave/containerizer/mesos/launcher.hpp 
> 79f6eea0ee8e564e90b36208672df150dbc5d540 
>   src/slave/containerizer/mesos/launcher.cpp 
> 5114c130efbfb252dde1e85c081f5174e66f57af 
>   src/slave/containerizer/mesos/windows_launcher.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/windows_launcher.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/56362/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Re: Review Request 56362: Windows: Create the Windows container launcher.

2017-02-10 Thread Andrew Schwartzmeyer


> On Feb. 10, 2017, 12:24 a.m., Joseph Wu wrote:
> > src/slave/containerizer/mesos/windows_launcher.cpp, lines 59-60
> > 
> >
> > I recall some talk about dis-allowing certain `std::` helpers on the 
> > `hashmap` and `hashset` classes.  In which case, we'd need to rethink 
> > https://reviews.apache.org/r/56363
> > 
> > Anyway, for this method, I'd recommend constructing a `hashset` of PIDs 
> > and checking/updating this set while recovering.  Because O(N).

Is this really a CPU bottleneck on the agent here? The `PosixLauncher` does the 
[same O(n) 
operation](https://github.com/andschwa/mesos/blob/master/src/slave/containerizer/mesos/launcher.cpp#L62)
 (I point it out because this code is pretty much just a derivation of the 
former).


> On Feb. 10, 2017, 12:24 a.m., Joseph Wu wrote:
> > src/slave/containerizer/mesos/windows_launcher.cpp, line 78
> > 
> >
> > Since you are assigning the `Container` is this way, you don't actually 
> > need to specify the constructor in the header.
> > 
> > So either remove the constructor, or use it in this file.

I'm not sure about this. I didn't add the constructor until I compiled and 
_couldn't_ brace initialize without it. That said, I have some default member 
initializers. Jo, are we using C++11 or C++17?

Oh, I forgot I needed to click publish. This is the error you get with MSVC if 
you use the brace initializer without a constructor:

```
17>C:\Users\andschwa\src\mesos\src\slave\containerizer\mesos\windows_launcher.cpp(78):
 error C2440: 'initializing': cannot convert from 'initializer list' to 
'mesos::internal::slave::WindowsLauncher::Container'
```

Now trying without member initializers...


> On Feb. 10, 2017, 12:24 a.m., Joseph Wu wrote:
> > src/slave/containerizer/mesos/windows_launcher.cpp, line 162
> > 
> >
> > Can you add a TODO here about how executors will die when the agent 
> > dies?

Sure, it has been a subject of much debate :D


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56362/#review165027
---


On Feb. 9, 2017, 6:50 p.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56362/
> ---
> 
> (Updated Feb. 9, 2017, 6:50 p.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
> https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Instead of deriving `WindowsLauncher` from `PosixLauncher`,
> this commit implements a separate `WindowsLauncher` derived
> from `Launcher` parallel to the `PosixLauncher`.
> 
> This launcher is then refactored to use Windows' "Job Objects," which
> are similar to Linux's cgroups, and enable us to reason about a group
> of processes associated with a task/container as a "Job Object"
> instead of a root PID and the tree containing its children. The latter
> is not a reasonable approach on Windows, and has been the source of
> subtle bugs.
> 
> The Job Object approach creates a named job with a one-to-one mapping
> to the containerizer, and assigns the first process started for the
> task to the job object. After being assigned, the Windows kernel
> ensures all spawned child processes
> (specifically those made with the `CreateProcess` system call) are
> also assigned to the named job object. Thus this job object can then
> be used to query the process group's resource usage, kill the process
> group, and set limits on the process group.
> 
> So instead of seeing a process group as a tree and referring to it by
> the root process's PID, the `WindowsLauncher` sees a process group as
> a named Job Object, the same way the Windows kernel sees it. However,
> the containerizer code which interacts with the launcher still refers
> to a task group by the singular PID, and so we have a sort of shim
> which maps the initial PID to the name of the job object. This is a an
> unfortunate consequence of the shared containerizer code being
> originally written for POSIX-like systems.
> 
> This abstraction sets us up for implementing resource usage limits on
> the process group as a whole.
> 
> 
> Diffs
> -
> 
>   src/CMakeLists.txt 3a4ace9c8011ac8eec5067cd085fa7fe4166b9ee 
>   src/slave/containerizer/mesos/containerizer.cpp 
> d2b4f75a55dbe4746bc2dfc180335fa831a554ef 
>   src/slave/containerizer/mesos/launcher.hpp 
> 79f6eea0ee8e564e90b36208672df150dbc5d540 
>   src/slave/containerizer/mesos/launcher.cpp 
> 

Re: Review Request 56524: Added `drop` for overload to avoid custom logging.

2017-02-10 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56524/#review165176
---




src/master/master.cpp (lines 2246 - 2247)


Can you follow the same format at the drop two lines above?

```
  LOG(ERROR) << "Dropping " << call.type() << " call"
 << " from framework " << *framework
 << ": " << message;
```

Not yours, but just as an aside it seems like all of the drop logging 
should probably be at the warning instead of error level. Feel free to leave as 
is for now.



src/master/master.cpp (lines 3248 - 3253)


I would suggest adding a drop overload for Suppress, so that you can just 
do:

```
drop(framework,
 suppress,
 "Suppression role is invalid: " + roleError->message);
```

This could do the `scheduler::Call` construction and call the version 
you've added in this patch.

As it stands the call-sites need to do the `scheduler::Call` which is error 
prone. For example, the current code in this patch doesn't fill in the suppress 
portion of the call:

```
  scheduler::Call call;
  call.set_type(scheduler::Call::SUPPRESS);
  call.mutable_suppress()->CopyFrom(suppress); // This is missed.
```


- Benjamin Mahler


On Feb. 10, 2017, 1:42 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56524/
> ---
> 
> (Updated Feb. 10, 2017, 1:42 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Jay Guo.
> 
> 
> Bugs: MESOS-6638
> https://issues.apache.org/jira/browse/MESOS-6638
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added `drop` for overload to avoid custom logging.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 767ac48f2a65514a19c4a3f8ca7b35f0259f9aeb 
>   src/master/master.cpp 620919ecfe85367b5c1281afc5216cc20e5e2e3c 
> 
> Diff: https://reviews.apache.org/r/56524/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 56499: Added roles field to framework.

2017-02-10 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56499/#review165175
---


Ship it!




Ship It!

- Benjamin Mahler


On Feb. 10, 2017, 12:24 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56499/
> ---
> 
> (Updated Feb. 10, 2017, 12:24 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Jay Guo.
> 
> 
> Bugs: MESOS-6638
> https://issues.apache.org/jira/browse/MESOS-6638
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added roles field to framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 767ac48f2a65514a19c4a3f8ca7b35f0259f9aeb 
>   src/master/master.cpp 620919ecfe85367b5c1281afc5216cc20e5e2e3c 
> 
> Diff: https://reviews.apache.org/r/56499/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 56499: Added roles field to framework.

2017-02-10 Thread Benjamin Mahler


> On Feb. 9, 2017, 9:38 p.m., Benjamin Mahler wrote:
> > src/master/master.hpp, lines 2425-2451
> > 
> >
> > We need to update the newly introduced `roles` field here. Also, 
> > oldRoles can be removed in favor of directly accessing `roles`.
> 
> Guangya Liu wrote:
> Since we do not allow updating `roles` for frameworks and the `roles` was 
> already initialized when construct the framework, so do we still need to 
> update the `roles` here?

Ah right, dropping this.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56499/#review165008
---


On Feb. 10, 2017, 12:24 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56499/
> ---
> 
> (Updated Feb. 10, 2017, 12:24 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Jay Guo.
> 
> 
> Bugs: MESOS-6638
> https://issues.apache.org/jira/browse/MESOS-6638
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added roles field to framework.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 767ac48f2a65514a19c4a3f8ca7b35f0259f9aeb 
>   src/master/master.cpp 620919ecfe85367b5c1281afc5216cc20e5e2e3c 
> 
> Diff: https://reviews.apache.org/r/56499/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 56525: Fixed a crash on the agent when handling the SIGUSR1 signal.

2017-02-10 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56525/#review165161
---


Ship it!




I tested as well and the segfault is gone. Re-ordering of delete's LGTM.

- Greg Mann


On Feb. 10, 2017, 5:34 a.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56525/
> ---
> 
> (Updated Feb. 10, 2017, 5:34 a.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-7102
> https://issues.apache.org/jira/browse/MESOS-7102
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There were some actors that were not being destructed when
> `finalize()` was being invoked. Also fixed the order of the
> destruction of objects i.e., in the reverse order of their
> creation.
> 
> 
> Diffs
> -
> 
>   src/slave/main.cpp b9f9748d98775b2a511ded35bbf72099bfeeba64 
> 
> Diff: https://reviews.apache.org/r/56525/diff/
> 
> 
> Testing
> ---
> 
> make check, manually tested to ensure we don't crash on a `SIGUSR1` signal.
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 56537: Added a test to ensure framework can upgrade to support MULTI_ROLE.

2017-02-10 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56537/#review165160
---



Patch looks great!

Reviews applied: [56537]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 10, 2017, 9:32 a.m., Jay Guo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56537/
> ---
> 
> (Updated Feb. 10, 2017, 9:32 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Guangya Liu, and 
> Michael Park.
> 
> 
> Bugs: MESOS-7035
> https://issues.apache.org/jira/browse/MESOS-7035
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> A framework can be upgraded to be MULTI_ROLE capable even with task
> running, as long as it does not change role while upgrading.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 3b4123b49ee32c902a5d2a01fcc7026da21fdd18 
> 
> Diff: https://reviews.apache.org/r/56537/diff/
> 
> 
> Testing
> ---
> 
> WIP
> 
> 
> Thanks,
> 
> Jay Guo
> 
>



Re: Review Request 55901: Added support for command health checks to the default executor.

2017-02-10 Thread Gastón Kleiman


> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, lines 653-654
> > 
> >
> > Why not `defer`?

Added `defer` where it corresponds.


> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, line 635
> > 
> >
> > Could you please specify what this function returns? Status or exit 
> > code?

This depends on what containerizer the agent is using. In the case of the Mesos 
containerizer, it currently returns a status code.


> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, line 667
> > 
> >
> > What do you get from the agent call: status or exit code directly? It's 
> > probably not that important here, but it is for checks.

This depends on what containerizer the agent is using. In the case of the Mesos 
containerizer, it currently returns a status code.


> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, lines 565-567
> > 
> >
> > This should probably be indented, no?

Fixed.


> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, line 555
> > 
> >
> > s/until/until after/ ?

Fixed.


> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, line 615
> > 
> >
> > s/this future is completed//we finish processing current timeout./
> > 
> > It's not clear what "this future" means.

Fixed.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164919
---


On Feb. 10, 2017, 6:40 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> ---
> 
> (Updated Feb. 10, 2017, 6:40 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent 
> huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
> https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -
> 
>   src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
>   src/checks/health_checker.cpp a5225ff1f4b071ed4182d41fa8ecc705fa4dbe00 
>   src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> ---
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It 
> passes on Linux, but not on macOS, because of MESOS-7050.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 55901: Added support for command health checks to the default executor.

2017-02-10 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
---

(Updated Feb. 10, 2017, 6:40 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, 
and Vinod Kone.


Changes
---

Split `waitNestedContainer` into two + use defer instead of plain lambdas.


Bugs: MESOS-6280
https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
---

Added support for command health checks to the default executor.


Diffs (updated)
-

  src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
  src/checks/health_checker.cpp a5225ff1f4b071ed4182d41fa8ecc705fa4dbe00 
  src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
  src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
---

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It 
passes on Linux, but not on macOS, because of MESOS-7050.


Thanks,

Gastón Kleiman



Re: Review Request 56526: Added stout constants for special path components.

2017-02-10 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56526/#review165143
---


Ship it!




Ship It!

- James Peach


On Feb. 10, 2017, 7:03 a.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56526/
> ---
> 
> (Updated Feb. 10, 2017, 7:03 a.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-7086
> https://issues.apache.org/jira/browse/MESOS-7086
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added stout constants for special path components.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/constants.hpp 
> c71d52e1bc0433f9922f26a6eb486235bb9880d4 
> 
> Diff: https://reviews.apache.org/r/56526/diff/
> 
> 
> Testing
> ---
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>



Re: Review Request 56527: Disallowed special path components in IDs.

2017-02-10 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56527/#review165141
---




src/common/validation.cpp (line 36)


Consider "ID must not be empty".



src/common/validation.cpp (line 42)


'~' is a bad choice for a ID, but by itself it is not a security issue. You 
ought to check for `id[0] != '~'` (or just ban it anywhere in the string).


- James Peach


On Feb. 10, 2017, 7:05 a.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56527/
> ---
> 
> (Updated Feb. 10, 2017, 7:05 a.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-7086
> https://issues.apache.org/jira/browse/MESOS-7086
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> - Such IDs should lead to surprising or even dangerous agent side
> directory structure.
> 
> 
> Diffs
> -
> 
>   src/common/validation.cpp 0f1a02286d8431acfee6136e8ada49b0ac746897 
>   src/tests/master_validation_tests.cpp 
> 0c2649089d7fd29eb021ac75c71e6a74368577dc 
>   src/tests/slave_validation_tests.cpp 
> 3d17799ed04951fb56524db0f5d89347192300b2 
> 
> Diff: https://reviews.apache.org/r/56527/diff/
> 
> 
> Testing
> ---
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>



Re: Review Request 56527: Disallowed special path components in IDs.

2017-02-10 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56527/#review165142
---


Ship it!




Ship It!

- James Peach


On Feb. 10, 2017, 7:05 a.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56527/
> ---
> 
> (Updated Feb. 10, 2017, 7:05 a.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-7086
> https://issues.apache.org/jira/browse/MESOS-7086
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> - Such IDs should lead to surprising or even dangerous agent side
> directory structure.
> 
> 
> Diffs
> -
> 
>   src/common/validation.cpp 0f1a02286d8431acfee6136e8ada49b0ac746897 
>   src/tests/master_validation_tests.cpp 
> 0c2649089d7fd29eb021ac75c71e6a74368577dc 
>   src/tests/slave_validation_tests.cpp 
> 3d17799ed04951fb56524db0f5d89347192300b2 
> 
> Diff: https://reviews.apache.org/r/56527/diff/
> 
> 
> Testing
> ---
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>



Re: Review Request 53993: Updated quota doc to support quota update.

2017-02-10 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53993/#review165140
---



Patch looks great!

Reviews applied: [52284, 53679, 53691, 52103, 53991, 54135, 53993]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 10, 2017, 7:57 a.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53993/
> ---
> 
> (Updated Feb. 10, 2017, 7:57 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Xiaojian Huang, and Neil 
> Conway.
> 
> 
> Bugs: MESOS-4941
> https://issues.apache.org/jira/browse/MESOS-4941
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated quota doc to support quota update.
> 
> 
> Diffs
> -
> 
>   docs/quota.md 36b4853b524ba7b28788fe1f3983dd43d95072d4 
> 
> Diff: https://reviews.apache.org/r/53993/diff/
> 
> 
> Testing
> ---
> 
> https://github.com/zhitaoli/mesos/blob/quota_update_sept/docs/quota.md
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 56533: Fix the shell script to not use deprecated [..] in increment operation.

2017-02-10 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56533/#review165137
---


Ship it!




Ship It!

- Jiang Yan Xu


On Feb. 9, 2017, 11:26 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56533/
> ---
> 
> (Updated Feb. 9, 2017, 11:26 p.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7101
> https://issues.apache.org/jira/browse/MESOS-7101
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fix the shell script to not use deprecated [..] in increment operation.
> 
> 
> Diffs
> -
> 
>   src/examples/persistent_volume_framework.cpp 
> 702dba3e025923819874a6f6b082e5c78289fb0e 
> 
> Diff: https://reviews.apache.org/r/56533/diff/
> 
> 
> Testing
> ---
> 
> Tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Review Request 56551: Updated CHANGELOG with still experimental features from 1.1.0.

2017-02-10 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56551/
---

Review request for mesos and Adam B.


Repository: mesos


Description
---

See summary.


Diffs
-

  CHANGELOG 1478ad4dd3b20fc768fd679f81d803acf0995ad2 

Diff: https://reviews.apache.org/r/56551/diff/


Testing
---

None: not a functional change.


Thanks,

Alexander Rukletsov



Re: Review Request 56533: Fix the shell script to not use deprecated [..] in increment operation.

2017-02-10 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56533/#review165129
---



Patch looks great!

Reviews applied: [56533]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 10, 2017, 7:26 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56533/
> ---
> 
> (Updated Feb. 10, 2017, 7:26 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7101
> https://issues.apache.org/jira/browse/MESOS-7101
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fix the shell script to not use deprecated [..] in increment operation.
> 
> 
> Diffs
> -
> 
>   src/examples/persistent_volume_framework.cpp 
> 702dba3e025923819874a6f6b082e5c78289fb0e 
> 
> Diff: https://reviews.apache.org/r/56533/diff/
> 
> 
> Testing
> ---
> 
> Tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 55462: Validate resource reservation against allocated role.

2017-02-10 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55462/
---

(Updated Feb. 10, 2017, 4:16 p.m.)


Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.


Changes
---

Added missing validation, include test.


Bugs: MESOS-6730
https://issues.apache.org/jira/browse/MESOS-6730


Repository: mesos


Description
---

This change introduces validation of the 'AllocationInfo' of resources
used in reservations.


Diffs (updated)
-

  src/master/validation.cpp 2bfca95713319a195e43e990cbc2dc5570b89c4e 
  src/tests/master_validation_tests.cpp 
0c2649089d7fd29eb021ac75c71e6a74368577dc 

Diff: https://reviews.apache.org/r/55462/diff/


Testing
---

N/A yet.


Thanks,

Benjamin Bannier



Re: Review Request 56392: Tightened test expecations in reservation-related tests.

2017-02-10 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56392/
---

(Updated Feb. 10, 2017, 4:16 p.m.)


Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.


Changes
---

Rebased.


Repository: mesos


Description
---

This change makes expected error strings in reservation
validation-related tests explicit. This is to make sure that the
observed errors match the expected ones.


Diffs (updated)
-

  src/tests/master_validation_tests.cpp 
0c2649089d7fd29eb021ac75c71e6a74368577dc 

Diff: https://reviews.apache.org/r/56392/diff/


Testing
---

make check (OS X)


Thanks,

Benjamin Bannier



Re: Review Request 55462: Validate resource reservation against allocated role.

2017-02-10 Thread Benjamin Bannier


> On Feb. 10, 2017, 3:39 a.m., Benjamin Mahler wrote:
> > src/master/validation.cpp, lines 1632-1634
> > 
> >
> > Shouldn't we also be checking that the allocation role matches the 
> > reservation role?

Yes, this seems rather crucial to this patch. Added a check for this, also a 
test.


> On Feb. 10, 2017, 3:39 a.m., Benjamin Mahler wrote:
> > src/master/validation.cpp, line 1633
> > 
> >
> > Hm.. this case seems to warrant a different message?

Done.


> On Feb. 10, 2017, 3:39 a.m., Benjamin Mahler wrote:
> > src/master/validation.cpp, lines 1623-1630
> > 
> >
> > Just to clarify, this is an invariant in that if this fails it is a 
> > programming error in the master, right? It seems to me we have two 
> > invariants here:
> > 
> > (1) Coming from framework: framework info is set and resources are 
> > allocated (master enforces this when applying operations).
> > 
> > (2) Coming from operator: framework info is not set and resources are 
> > unallocated (master doesn't enforce this).
> > 
> > We should clarify this. Also, `!resource.allocation_info().has_role()` 
> > is sufficient here if you want to be more succinct.

I am not sure these cases are invariants *for this function*, and it seems us 
ensuring correct behavior with tests rather then fail hard here would decrease 
coupling between validation logic and apply operations and be just as good.

The case (1) we do explicitly handle here seems interesting since it points to 
errors in framework code (e.g., incorrect handling of offered resources). Case 
(2) seems less likely as it would mean that the operator would set more 
information than needed or used by the master. In any case, every caller of the 
validation function would need to normalize the resulting operation in order to 
deal with the different possible inputs (e.g., (1) or (2) here).

Does that make sense?


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55462/#review165056
---


On Feb. 10, 2017, 4:16 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55462/
> ---
> 
> (Updated Feb. 10, 2017, 4:16 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Bugs: MESOS-6730
> https://issues.apache.org/jira/browse/MESOS-6730
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This change introduces validation of the 'AllocationInfo' of resources
> used in reservations.
> 
> 
> Diffs
> -
> 
>   src/master/validation.cpp 2bfca95713319a195e43e990cbc2dc5570b89c4e 
>   src/tests/master_validation_tests.cpp 
> 0c2649089d7fd29eb021ac75c71e6a74368577dc 
> 
> Diff: https://reviews.apache.org/r/55462/diff/
> 
> 
> Testing
> ---
> 
> N/A yet.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 54639: Implemented the 'create()' method of OCI store.

2017-02-10 Thread Qian Zhang


> On Feb. 9, 2017, 7:36 a.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/provisioner/oci/paths.hpp, lines 36-39
> > 
> >
> > Can you elaborate a bit more the layout here?
> > 
> > so `layer_id` is the digest? What is `config.json`? Does it store 
> > decompressed content, or the original content? what about image configs?
> > 
> > Given this 
> > https://github.com/opencontainers/image-spec/blob/master/image-layout.md, 
> > my hunch is that we should probably cache the original blob, as well as 
> > decompressed content, potentially in separate subdirectories.
> > 
> > ```
> > 
> >|--- blobs/  # this is origional CAS blobs.
> >|--- sha256/
> >|--- 
> >|--- 
> >|--- ...
> >|--- layers/ # this is decompressed filesystem change sets
> >|--- 
> >|--- ...
> > ```
> > 
> > Looks like we probably do not need `configs` or `manifests` directory 
> > for now because the only media type is json, and can be directly read from 
> > `blobs/`.
> > 
> > Another thing to consider is the conversion to overlayfs opaque file 
> > from whiteout files as you did before. So for `layers` directory, we 
> > probably need to have a `backend` in the path (e.g., 
> > `layers/overlayfs/...`).
> 
> Jie Yu wrote:
> Also, worth taking a look at how Docker (or other OCI system) organize 
> its cache.
> 
> Qian Zhang wrote:
> Please take a look at https://reviews.apache.org/r/56147/diff/1#4, I have 
> changed it to:
> ```
>  |--layers
> |--
> |-- rootfs
> |--
> |-- rootfs
> |--
> ```
> Here both `` and `` are digest. 
> `/rootfs/` stores decompressed content(filesystem) extracted by 
> `command::untar()` from the layer tar ball. `` stores the image 
> configuration.
> 
> Can you please let me know why we need to cache the original blobs? In my 
> current implementation, the orignal blobs will only be fetched by fetcher 
> into a temp directory under `staging` directory, `layers` directory will only 
> store the decompressed content, and that temp directory under `staging` 
> directory will be removed after fetching image is done. I do not think we 
> need the original blobs once the fetching image is done and we have cached 
> the image in the store.
> 
> And actually under `layers/`, there can be two sub-directories: 
> `rootfs` (for aufs/bind/copy) and `rootfs.overlay` (for overlay), this is 
> same with what I did for for Docker store. Do you think we need to change it 
> to `layers/non-overlay/` and `layers/overlay/` for OCI 
> store?

And I checked Docker today, it does not cache blobs as well. When pulling an 
image, Docker will download the blobs in a temp directory 
(`/var/lib/docker/tmp/`), and once the pulling is done, the blobs in that temp 
directory will be removed. And Docker will store the decompressed content of 
each image layer in `/var/lib/docker/aufs/diff/`.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54639/#review164794
---


On Feb. 1, 2017, 9:17 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54639/
> ---
> 
> (Updated Feb. 1, 2017, 9:17 a.m.)
> 
> 
> Review request for mesos, Gilbert Song and Jie Yu.
> 
> 
> Bugs: MESOS-6681
> https://issues.apache.org/jira/browse/MESOS-6681
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Implemented the 'create()' method of OCI store.
> 
> 
> Diffs
> -
> 
>   src/CMakeLists.txt 09ef1aee680c6b91476092dbf61d4ca3a8d256bb 
>   src/Makefile.am 6c9be54fe0ce3faaa26e2f090773d1b77bb6e430 
>   src/slave/containerizer/mesos/provisioner/oci/paths.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/provisioner/oci/paths.cpp PRE-CREATION 
>   src/slave/containerizer/mesos/provisioner/oci/store.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/54639/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 56391: Set AllocationInfo in some reservation-related tests.

2017-02-10 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56391/
---

(Updated Feb. 10, 2017, 3:57 p.m.)


Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.


Changes
---

Implemented suggestion from bmahler.


Repository: mesos


Description
---

In a later change we will take a 'Resource''s 'AllocationInfo' into
account. Update a number of tests related to resource validation to
set 'AllocationInfo' so continue to trigger the originally expected
errors.


Diffs (updated)
-

  src/tests/master_validation_tests.cpp 
0c2649089d7fd29eb021ac75c71e6a74368577dc 

Diff: https://reviews.apache.org/r/56391/diff/


Testing
---

make check (OS X)


Thanks,

Benjamin Bannier



Re: Review Request 56178: Enabled the authorizer to work with MULTI_ROLE frameworks.

2017-02-10 Thread Alexander Rojas


> On Feb. 9, 2017, 10:59 p.m., Benjamin Mahler wrote:
> > src/master/master.cpp, lines 2533-2535
> > 
> >
> > Longer term, are there any thoughts on how we might be able to know 
> > which role is not authorized? E.g. getting the authorization message via 
> > `Future

Re: Review Request 55461: Made resource reservation validation multi-role aware.

2017-02-10 Thread Benjamin Bannier


> On Feb. 6, 2017, 10:28 p.m., Benjamin Mahler wrote:
> > src/master/validation.cpp, lines 1500-1503
> > 
> >
> > Hm.. do you know which call sites need to be updated? It seems like we 
> > need to do the sweep now to ensure the call-sites are correct, no?

This was an in process-`TODO`, the call sites should be updated now.


> On Feb. 6, 2017, 10:28 p.m., Benjamin Mahler wrote:
> > src/master/master.cpp, lines 3942-3947
> > 
> >
> > This isn't quite correct since we need to check the capability to 
> > determine which field to examine.
> > 
> > Per my suggestion below, can we pass the FrameworkInfo in rather than 
> > just the roles?
> > 
> > Also note that there is a `getRoles` helper in protobuf utils to 
> > simplify this. We should add a `set roles` field to the `Framework` 
> > struct in the master though.

Updated to pass an `Option`.


> On Feb. 6, 2017, 10:28 p.m., Benjamin Mahler wrote:
> > src/master/validation.cpp, lines 1509-1513
> > 
> >
> > This condition isn't quite the `*` role, this would be a framework with 
> > no roles. The `*` role case is now the 1 element `*` set, but it seems to 
> > me this check should be looking at the resource.role() rather than the 
> > framework's roles, no?

This definitely incorrectly handles single role frameworks in `{*}`. I updated 
the patch so we check for any both an empty roles set and `{*}` to handle both 
frameworks with no roles and frameworks in `{*}`. The cases where `*` is 
included in `roles` is supported, so framework roles `{*}` will remain a valid 
case even after `FrameworkInfo.role` is removed, right?


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55461/#review164410
---


On Feb. 8, 2017, 12:41 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55461/
> ---
> 
> (Updated Feb. 8, 2017, 12:41 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Jay Guo, and Guangya Liu.
> 
> 
> Bugs: MESOS-6730
> https://issues.apache.org/jira/browse/MESOS-6730
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This updates the resource reservation validation for frameworks which
> can have multiple roles. During a deprecation period 'FrameworkInfo'
> will have fields for both 'role' and 'roles', however the validation
> function works with just an optional set of roles. Here an empty set
> captures the previous semantics of either having an empty 'role' field
> or 'role' set as '*'. This forces the callers to properly construct a
> set of framework roles from the available information. An optional set
> is used in order to accommodate callers which have no information
> about the framework's roles, and ultimately disables validation taking
> that information into account.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 98c39b279e7b9830d02efc8ec6a4469afc15d62a 
>   src/master/validation.hpp fc1eaeebd0643b897f4e14aabc76e7d94f0de263 
>   src/master/validation.cpp 37d171512ee54f260aabb6e1071739bcc3769fb0 
>   src/tests/master_validation_tests.cpp 
> 51185031cb67e64cd69ec6ce1c8f722a0c349970 
> 
> Diff: https://reviews.apache.org/r/55461/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 56178: Enabled the authorizer to work with MULTI_ROLE frameworks.

2017-02-10 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56178/
---

(Updated Feb. 10, 2017, 12:27 p.m.)


Review request for mesos, Adam B, Alexander Rojas, and Benjamin Mahler.


Changes
---

Improved comment style, per bmahler.


Bugs: MESOS-7022
https://issues.apache.org/jira/browse/MESOS-7022


Repository: mesos


Description
---

This updates the local authorizer so that MULTI_ROLE frameworks can be
authorized.

For non-MULTI_ROLE frameworks we continue to support use of the
deprecated 'value' field in the authorization request's 'Object';
however for MULTI_ROLE frameworks the 'value' field will not be set,
and authorizers still relying on it should be updated to instead use
the object's 'framework_info' field to extract roles to authorize
against from.


Diffs (updated)
-

  src/authorizer/local/authorizer.cpp b98e1fcdf2ee5ec1f6ac0be6f8accdefaa390a09 
  src/master/master.cpp 620919ecfe85367b5c1281afc5216cc20e5e2e3c 

Diff: https://reviews.apache.org/r/56178/diff/


Testing
---

Tested on various configurations in internal CI.


Thanks,

Benjamin Bannier



Re: Review Request 55901: Added support for command health checks to the default executor.

2017-02-10 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
---

(Updated Feb. 10, 2017, 11:25 a.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, 
and Vinod Kone.


Changes
---

Addressed feedback.


Bugs: MESOS-6280
https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
---

Added support for command health checks to the default executor.


Diffs (updated)
-

  src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
  src/checks/health_checker.cpp a5225ff1f4b071ed4182d41fa8ecc705fa4dbe00 
  src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
  src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 

Diff: https://reviews.apache.org/r/55901/diff/


Testing (updated)
---

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It 
passes on Linux, but not on macOS, because of MESOS-7050.


Thanks,

Gastón Kleiman



Re: Review Request 56288: Improved the wording of what's logged on command health check timeouts.

2017-02-10 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56288/
---

(Updated Feb. 10, 2017, 11:17 a.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, 
and Vinod Kone.


Changes
---

Updated another logging message.


Bugs: MESOS-6280
https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
---

Improved the wording of what's logged on command health check timeouts.


Diffs (updated)
-

  src/checks/health_checker.cpp a5225ff1f4b071ed4182d41fa8ecc705fa4dbe00 

Diff: https://reviews.apache.org/r/56288/diff/


Testing
---

None, not a functional change.


Thanks,

Gastón Kleiman



Re: Review Request 55901: Added support for command health checks to the default executor.

2017-02-10 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164951
---




src/checks/health_checker.cpp (line 657)


I wanted to make the errors here similar to/consistent with the errors 
logged by the default executor:

https://github.com/apache/mesos/blob/c2388a511c775dd6f392961b06fd7738bf051dbc/src/launcher/default_executor.cpp#L618-L624



src/checks/health_checker.cpp (line 667)


It depends on what containerizer is being used. With the 
`MesosContainerizer`, we get the exit status as returned by `waitpid`. I added 
a comment in the header file.


- Gastón Kleiman


On Feb. 8, 2017, 1:27 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> ---
> 
> (Updated Feb. 8, 2017, 1:27 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent 
> huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
> https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> ---
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It 
> passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 56178: Enabled the authorizer to work with MULTI_ROLE frameworks.

2017-02-10 Thread Adam B


> On Feb. 9, 2017, 1:59 p.m., Benjamin Mahler wrote:
> > src/authorizer/local/authorizer.cpp, line 248
> > 
> >
> > Not yours, but I find it rather confusing as to what the object value 
> > is, looking at the other code, is it the role? It would be nice to clarify 
> > how one figures out what `value` represents.

That's part of the reason why we're moving away from 'value' to more explicit 
FrameworkInfo/FooInfos, from which the authorizer can authorize based on 
any/many fields.

Until then, the best documentation is in authorizer.proto:
```
  // `REGISTER_FRAMEWORK` will have an object with `FrameworkInfo` set.
  // The `_WITH_ROLE` alias is deprecated and will be removed after
  // Mesos 1.2's deprecation cycle ends. The `value` field will continue
  // to be set until that time.
  REGISTER_FRAMEWORK = 1;
  REGISTER_FRAMEWORK_WITH_ROLE = 1;
```


- Adam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56178/#review165014
---


On Feb. 9, 2017, 1:26 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56178/
> ---
> 
> (Updated Feb. 9, 2017, 1:26 a.m.)
> 
> 
> Review request for mesos, Adam B, Alexander Rojas, and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7022
> https://issues.apache.org/jira/browse/MESOS-7022
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This updates the local authorizer so that MULTI_ROLE frameworks can be
> authorized.
> 
> For non-MULTI_ROLE frameworks we continue to support use of the
> deprecated 'value' field in the authorization request's 'Object';
> however for MULTI_ROLE frameworks the 'value' field will not be set,
> and authorizers still relying on it should be updated to instead use
> the object's 'framework_info' field to extract roles to authorize
> against from.
> 
> 
> Diffs
> -
> 
>   src/authorizer/local/authorizer.cpp 
> b98e1fcdf2ee5ec1f6ac0be6f8accdefaa390a09 
>   src/master/master.cpp 620919ecfe85367b5c1281afc5216cc20e5e2e3c 
> 
> Diff: https://reviews.apache.org/r/56178/diff/
> 
> 
> Testing
> ---
> 
> Tested on various configurations in internal CI.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 56527: Disallowed special path components in IDs.

2017-02-10 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56527/#review165099
---



Patch looks great!

Reviews applied: [56526, 56527]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 10, 2017, 7:05 a.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56527/
> ---
> 
> (Updated Feb. 10, 2017, 7:05 a.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-7086
> https://issues.apache.org/jira/browse/MESOS-7086
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> - Such IDs should lead to surprising or even dangerous agent side
> directory structure.
> 
> 
> Diffs
> -
> 
>   src/common/validation.cpp 0f1a02286d8431acfee6136e8ada49b0ac746897 
>   src/tests/master_validation_tests.cpp 
> 0c2649089d7fd29eb021ac75c71e6a74368577dc 
>   src/tests/slave_validation_tests.cpp 
> 3d17799ed04951fb56524db0f5d89347192300b2 
> 
> Diff: https://reviews.apache.org/r/56527/diff/
> 
> 
> Testing
> ---
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>



Re: Review Request 56537: Added a test to ensure framework can upgrade to support MULTI_ROLE.

2017-02-10 Thread Jay Guo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56537/#review165087
---




src/tests/master_tests.cpp (lines 6409 - 6411)


How to check the task is still running here?


- Jay Guo


On Feb. 10, 2017, 5:32 p.m., Jay Guo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56537/
> ---
> 
> (Updated Feb. 10, 2017, 5:32 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Guangya Liu, and 
> Michael Park.
> 
> 
> Bugs: MESOS-7035
> https://issues.apache.org/jira/browse/MESOS-7035
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> A framework can be upgraded to be MULTI_ROLE capable even with task
> running, as long as it does not change role while upgrading.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 3b4123b49ee32c902a5d2a01fcc7026da21fdd18 
> 
> Diff: https://reviews.apache.org/r/56537/diff/
> 
> 
> Testing
> ---
> 
> WIP
> 
> 
> Thanks,
> 
> Jay Guo
> 
>



Review Request 56537: Added a test to ensure framework can upgrade to support MULTI_ROLE.

2017-02-10 Thread Jay Guo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56537/
---

Review request for mesos, Benjamin Bannier, Benjamin Mahler, Guangya Liu, and 
Michael Park.


Bugs: MESOS-7035
https://issues.apache.org/jira/browse/MESOS-7035


Repository: mesos


Description
---

A framework can be upgraded to be MULTI_ROLE capable even with task
running, as long as it does not change role while upgrading.


Diffs
-

  src/tests/master_tests.cpp 3b4123b49ee32c902a5d2a01fcc7026da21fdd18 

Diff: https://reviews.apache.org/r/56537/diff/


Testing
---

WIP


Thanks,

Jay Guo



Re: Review Request 56530: Prevent offers for old agents being sent to MULTI_ROLE frameworks.

2017-02-10 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56530/#review165070
---



Patch looks great!

Reviews applied: [56530]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Feb. 10, 2017, 6:10 a.m., Jay Guo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56530/
> ---
> 
> (Updated Feb. 10, 2017, 6:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Guangya Liu, and 
> Michael Park.
> 
> 
> Bugs: MESOS-6940
> https://issues.apache.org/jira/browse/MESOS-6940
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In a mixed cluster, when an old agent (not MULTI_ROLE capable)
> registers with new master (MULTI_ROLE capable), it cannot correctly
> handle tasks from a MULTI_ROLE framework. Therefore, we should
> disallow offers from these agents being sent to MULTI_ROLE frameworks.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 620919ecfe85367b5c1281afc5216cc20e5e2e3c 
>   src/tests/master_tests.cpp 3b4123b49ee32c902a5d2a01fcc7026da21fdd18 
> 
> Diff: https://reviews.apache.org/r/56530/diff/
> 
> 
> Testing
> ---
> 
> WIP
> 
> ./bin/mesos-tests.sh 
> --gtest_filter="MasterTest.MultiRoleFrameworkDoesNotReceiveOfferFromOldAgent" 
> --gtest_break_on_failure --gtest_repeat=-1
> 
> 
> Thanks,
> 
> Jay Guo
> 
>