Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

2018-02-28 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review198434
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65855', '65856']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65856

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65856/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (206 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1251 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (39 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (45 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (86 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2506 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2531 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2468 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2492 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (477671 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65856/logs/mesos-tests-stderr.log):

```
I0301 06:19:01.867746  1316 slave.cpp:3879] Shutting down framework 
ed234448-e169-4c04-93bc-5ddd2de8997a-
I0301 06:19:01.867746  6516 master.cpp:10258] Updating the state of task 
9d020280-c221-4c61-b7ad-1afee8382366 of framework 
ed234448-e169-4c04-93bc-5ddd2de8997a- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0301 06:19:01.867746  1316 slave.cpp:6586] Shutting down executor 
'9d020280-c221-4c61-b7ad-1afee8382366' of framework 
ed234448-e169-4c04-93bc-5ddd2de8997a- at executor(1)@10.3.1.5:55976
I0301 06:19:01.869748  1316 slave.cpp:922] Agent terminating
W0301 06:19:01.869748  1316 slave.cpp:3875] Ignoring shutdown framework 
ed234448-e169-4c04-93bc-5ddd2de8997a- because it is terminating
I0301 06:19:01.869748  6516 master.cpp:10357] Removing task 
9d020280-c221-4c61-b7ad-1afee8382366 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework ed234448-e169-4c04-93bc-5ddd2de8997a- on 
agent ed234448-e169-4c04-93bc-5ddd2de8997a-S0 at slave(398)@10.3.1.5:55955 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 06:19:01.872721  6516 master.cpp:1306] Agent 
ed234448-e169-4c04-93bc-5ddd2de8997a-S0 at slave(398)@10.3.1.5:55955 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0301 06:19:01.872721  6516 master.cpp:3276] Disconnecting agent 
ed234448-e169-4c04-93bc-5ddd2de8997a-S0 aI0301 06:19:01.169744  6488 
exec.cpp:162] Version: 1.6.0
I0301 06:19:01.198719  2512 exec.cpp:236] Executor registered on agent 
ed234448-e169-4c04-93bc-5ddd2de8997a-S0
I0301 06:19:01.202718  1240 executor.cpp:176] Received SUBSCRIBED event
I0301 06:19:01.207742  1240 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0301 06:19:01.208744  1240 executor.cpp:176] Received LAUNCH event
I0301 06:19:01.212718  1240 executor.cpp:648] Starting task 
9d020280-c221-4c61-b7ad-1afee8382366
I0301 06:19:01.296744  1240 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0301 06:19:01.835724  1240 executor.cpp:661] Forked command at 3796
I0301 06:19:01.873726  7912 exec.cpp:445] Executor asked to shutdown
I0301 06:19:01.874722  1240 executor.cpp:176] Received SHUTDOWN event
I0301 06:19:01.874722  1240 executor.cpp:758] Shutting down
I0301 06:19:01.874722  1240 executor.cpp:868] Sending SIGTERM to process tree 
at pid 3t slave(398)@10.3.1.5:55955 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 06:19:01.873726  1132 hierarchical.cpp:344] Removed framework 
ed234448-e169-4c04-93bc-5ddd2de8997a-
I0301 06:19:01.872721  6752 containerizer.cpp:2338] Destroying container 
5d36381f-2b59-464b-ab48-1139dc873e9d in RUNNING state
I0301 

Re: Review Request 65842: Changed `os::spawn` to return `Option` instead of `int`.

2018-02-28 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65842/#review198433
---



Patch looks great!

Reviews applied: [65839, 65840, 65841, 65842]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 28, 2018, 6:15 p.m., Akash Gupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65842/
> ---
> 
> (Updated Feb. 28, 2018, 6:15 p.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer, Joseph Wu, and Michael Park.
> 
> 
> Bugs: MESOS-4549
> https://issues.apache.org/jira/browse/MESOS-4549
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Similar to `os::system`, `os::spawn` returned -1 on error, which is a
> valid exit code on Windows. Using the same solution for `os::system`,
> now when `os::spawn` fails, `None` will be returned. Otherwise, the
> process exit code will be returned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/posix/copyfile.hpp 
> 0d9ed5babbff05a84740df3dbe1594f0e3012360 
>   3rdparty/stout/include/stout/os/posix/shell.hpp 
> b878718e137e4c8b599e0f4dac67672d8df27f7d 
>   3rdparty/stout/include/stout/os/windows/shell.hpp 
> 1696d084c8453fa0eb9ef41e0a4128e547f7f53c 
>   src/linux/fs.cpp dae094224321c0974c705023daf076409049de51 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
> c60c23f74f2abf6bef8dd32cc2e47e33bf666169 
>   src/tests/containerizer/perf_tests.cpp 
> d8aab08eb131f974821fb85662cbc6cc685d2f3e 
> 
> 
> Diff: https://reviews.apache.org/r/65842/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Akash Gupta
> 
>



Review Request 65856: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

2018-02-28 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/
---

Review request for mesos and Gilbert Song.


Bugs: MESOS-8620
https://issues.apache.org/jira/browse/MESOS-8620


Repository: mesos


Description
---

This flag specifies a timeout for `mesos-fetcher` to wait before
aborting if the download speed keeps below 1 bytes/sec. This would avoid
containers to get stuck at FETCHING. The default value is 1 minute.


Diffs
-

  include/mesos/fetcher/fetcher.proto 6a5d807221681853444ffd3ab42a23827604e550 
  src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77 
  src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc 
  src/slave/containerizer/fetcher.cpp a49411b7bac2d5a50a75d0b802842c2f61fe58c6 
  src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e 
  src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc 


Diff: https://reviews.apache.org/r/65856/diff/1/


Testing
---

sudo make check

Manually tested with Nginx servers that sleeps for 59 seconds and 1 mintue 
before serving artifacts.


Thanks,

Chun-Hung Hsiao



Review Request 65855: Added the `stall_timeout` parameter to `net::download()`.

2018-02-28 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65855/
---

Review request for mesos and Gilbert Song.


Bugs: MESOS-8620
https://issues.apache.org/jira/browse/MESOS-8620


Repository: mesos


Description
---

When the `stall_timeout` is given, the download would be aborted if the
download speed keeps below 1 byte per second during the timeout.


Diffs
-

  3rdparty/stout/include/stout/net.hpp abb01446d631a67c5182514e3d1368377a0575fc 


Diff: https://reviews.apache.org/r/65855/diff/1/


Testing
---

sudo make test


Thanks,

Chun-Hung Hsiao



Re: Review Request 59987: Added protobuf map support.

2018-02-28 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59987/#review198429
---



Chun and I went over this together, so feel free to reach out to either of us 
for discussion! Just some suggestions to clean up the code below.

In the description, could we mention that this is for stout's json <-> protobuf 
conversion? E.g.

```
Added protobuf map support to stout JSON<->protobuf conversion.

Map is a feature of proto2 syntax, but it can only be compiled
with 3.0.0+ protobuf compiler, see the following discussion for
details:
  
  https://groups.google.com/forum/#!topic/protobuf/p4WxcplrlA4

We have already upgraded the compiler from 2.6.1 to 3.3.0 in
MESOS-7228. This patch adds map support in the json <-> protobuf
conversion in stout.
```


3rdparty/stout/include/stout/protobuf.hpp
Lines 32 (patched)


FWICT, we're just supposed to include message.h for reflection:


https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.message#Reflection

And this reflection.h header seems internal and already included by 
message.h? It only seems to define some pieces of the reflection code:


https://github.com/google/protobuf/blob/v3.5.0/src/google/protobuf/reflection.h



3rdparty/stout/include/stout/protobuf.hpp
Lines 388 (patched)


s/name/key/?



3rdparty/stout/include/stout/protobuf.hpp
Lines 391-487 (patched)


It looks like we can avoid this logic since protobuf's JSON conversion 
allows the protobuf key types (bool, integers, strings) to be converted from 
JSON strings. This means we could just recurse for both key and value here:

```
// Some comment explaining what map is equivalent to with
// a reference to the google documentation.
google::protobuf::Message* entry =
  reflection->AddMessage(message, field);

const google::protobuf::FieldDescriptor* key_field =
  entry->GetDescriptor()->FindFieldByNumber(1);
  
// Maybe passing just 'key' works due to implicit construction?
// TODO(...): This copies the key, consider avoiding this.
Try apply =
  boost::apply_visitor(Parser(entry, key_field), 
JSON::String(key));

if (apply.isError()) {
  return Error(apply.error());
}
  
const google::protobuf::FieldDescriptor* value_field =
  entry->GetDescriptor()->FindFieldByNumber(2);
  
Try apply =
  boost::apply_visitor(Parser(entry, value_field), value);

if (apply.isError()) {
  return Error(apply.error());
}
```

Now, to make this simplification work, we need to update our JSON 
conversion in a separate patch to allow bools and integers to be parsed from 
JSON strings to match google's logic for conversion:


https://github.com/google/protobuf/blob/v3.5.1/src/google/protobuf/util/internal/datapiece.cc#L203

Documentation here: 
https://developers.google.com/protocol-buffers/docs/proto3#json



3rdparty/stout/include/stout/protobuf.hpp
Lines 394-398 (patched)


Can we reference the documentation here and mention that a map is 
equivalent to:

```
message MapFieldEntry {
  optional key_type key = 1;
  optional value_type value = 2;
}

repeated MapFieldEntry map_field = N;
```



3rdparty/stout/include/stout/protobuf.hpp
Lines 400-401 (patched)


Can you move this out of the loop?



3rdparty/stout/include/stout/protobuf.hpp
Lines 1036-1037 (patched)


Google converts 64 bit integers into strings, it's pretty badly broken 
right now that we don't do this. I think we're going to have to switch to 
string and just deal with breaking any clients that assumed strings were not 
coming out. I don't know if you want to help fix this, but if you do I would be 
happy to review / discuss!



3rdparty/stout/include/stout/protobuf.hpp
Lines 1039-1043 (patched)


Is there a way to abstract out this logic into a lambda here and re-use it? 
E.g.

```
template 
JSON::Value value_for_field(
Message* entry,
google::protobuf::Reflection* reflection,
google::protobuf::FieldDescriptor* field)
{
  switch (type) {
case ...:
case TYPE_INT64:
  

Re: Review Request 65853: Updated createQuotaInfo helper to handle quota limit.

2018-02-28 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65853/#review198432
---



PASS: Mesos patch 65853 was successfully built and tested.

Reviews applied: `['65334', '65851', '65852', '65780', '65781', '65782', 
'65783', '65784', '65785', '65853']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65853

- Mesos Reviewbot Windows


On March 1, 2018, 2:34 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65853/
> ---
> 
> (Updated March 1, 2018, 2:34 a.m.)
> 
> 
> Review request for mesos, Michael Park and Meng Zhu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Note that this has no effect on the existing code paths since the
> limit will be empty.
> 
> 
> Diffs
> -
> 
>   src/master/quota.hpp 7f43996d9aa3d3f462a6ed750733a55d4ef770c8 
>   src/master/quota.cpp 58bab6a678bac9e41a7994ba0b7cc1ed069a8a18 
> 
> 
> Diff: https://reviews.apache.org/r/65853/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 65847: Fixed allocator test `QuotaAbsentFramework`.

2018-02-28 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65847/#review198430
---



Patch looks great!

Reviews applied: [65661, 65819, 65820, 65821, 65844, 65845, 65847]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 28, 2018, 9 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65847/
> ---
> 
> (Updated Feb. 28, 2018, 9 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Michael Park.
> 
> 
> Bugs: MESOS-8456
> https://issues.apache.org/jira/browse/MESOS-8456
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> `QuotaAbsentFramework` assumes the coarse grained nature
> of second stage resource allocation. This is no longer
> true given #65821. This patch fixes the test by asserting
> the allocation in terms of quantity.
> 
> 
> Diffs
> -
> 
>   src/tests/hierarchical_allocator_tests.cpp 
> c97b2ba0884a7ded867c2d80e4749de54c89b5e4 
> 
> 
> Diff: https://reviews.apache.org/r/65847/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Review Request 65853: Updated createQuotaInfo helper to handle quota limit.

2018-02-28 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65853/
---

Review request for mesos, Michael Park and Meng Zhu.


Repository: mesos


Description
---

Note that this has no effect on the existing code paths since the
limit will be empty.


Diffs
-

  src/master/quota.hpp 7f43996d9aa3d3f462a6ed750733a55d4ef770c8 
  src/master/quota.cpp 58bab6a678bac9e41a7994ba0b7cc1ed069a8a18 


Diff: https://reviews.apache.org/r/65853/diff/1/


Testing
---

make check


Thanks,

Benjamin Mahler



Review Request 65851: Disallow limit in /quota requests and SET_QUOTA calls.

2018-02-28 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65851/
---

Review request for mesos, Michael Park and Meng Zhu.


Repository: mesos


Description
---

This ensures that after the protobuf changes to introduce limit
are committed, users cannot set limit in the existing API.


Diffs
-

  src/master/quota.cpp 58bab6a678bac9e41a7994ba0b7cc1ed069a8a18 


Diff: https://reviews.apache.org/r/65851/diff/1/


Testing
---

Test added in the subsequent patch.


Thanks,

Benjamin Mahler



Re: Review Request 65780: Removed an unnecessary argument from master::Call validation.

2018-02-28 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65780/
---

(Updated March 1, 2018, 2:31 a.m.)


Review request for mesos, Michael Park and Meng Zhu.


Changes
---

Update dependency.


Repository: mesos


Description
---

The principal is not needed by the validation function.


Diffs (updated)
-

  src/master/http.cpp acf10a0f396234ec9f444c9b41515d657379d6ad 
  src/master/validation.hpp 7c129ceb929596acbb64d37025e055661277e6bf 
  src/master/validation.cpp 9254cd22e67d9aca53135701575fb0857ff6f7d6 


Diff: https://reviews.apache.org/r/65780/diff/2/

Changes: https://reviews.apache.org/r/65780/diff/1-2/


Testing
---

make check


Thanks,

Benjamin Mahler



Review Request 65852: Added a test to ensure /quota and SET_QUOTA disallow limit.

2018-02-28 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65852/
---

Review request for mesos, Michael Park and Meng Zhu.


Repository: mesos


Description
---

We will only allow limit to be set using UPDATE_QUOTA, this
tests that users cannot specify a limit in the old APIs.


Diffs
-

  src/tests/master_quota_tests.cpp ddb2903c0b60f2929e39d6aa23dc924aec454c69 


Diff: https://reviews.apache.org/r/65852/diff/1/


Testing
---

make check


Thanks,

Benjamin Mahler



Re: Review Request 65842: Changed `os::spawn` to return `Option` instead of `int`.

2018-02-28 Thread Andrew Schwartzmeyer

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65842/#review198424
---



Ditto, can you split this review per-project?

I _highly_ recommend sticking to WIP commits on Windows and committing on Linux 
so you can utilize the very helpful hooks. We really should get those working 
on Windows...

- Andrew Schwartzmeyer


On Feb. 28, 2018, 10:15 a.m., Akash Gupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65842/
> ---
> 
> (Updated Feb. 28, 2018, 10:15 a.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer, Joseph Wu, and Michael Park.
> 
> 
> Bugs: MESOS-4549
> https://issues.apache.org/jira/browse/MESOS-4549
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Similar to `os::system`, `os::spawn` returned -1 on error, which is a
> valid exit code on Windows. Using the same solution for `os::system`,
> now when `os::spawn` fails, `None` will be returned. Otherwise, the
> process exit code will be returned.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/posix/copyfile.hpp 
> 0d9ed5babbff05a84740df3dbe1594f0e3012360 
>   3rdparty/stout/include/stout/os/posix/shell.hpp 
> b878718e137e4c8b599e0f4dac67672d8df27f7d 
>   3rdparty/stout/include/stout/os/windows/shell.hpp 
> 1696d084c8453fa0eb9ef41e0a4128e547f7f53c 
>   src/linux/fs.cpp dae094224321c0974c705023daf076409049de51 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
> c60c23f74f2abf6bef8dd32cc2e47e33bf666169 
>   src/tests/containerizer/perf_tests.cpp 
> d8aab08eb131f974821fb85662cbc6cc685d2f3e 
> 
> 
> Diff: https://reviews.apache.org/r/65842/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Akash Gupta
> 
>



Re: Review Request 65841: Changed `os::system` to return `Option` instead of `int`.

2018-02-28 Thread Andrew Schwartzmeyer


> On Feb. 28, 2018, 4:32 p.m., Andrew Schwartzmeyer wrote:
> > Before I review this, it needs to be split for stout and mesos changes.

(Otherwise I'll leave a bunch of comments on code that'll get deleted from this 
review ;)


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65841/#review198422
---


On Feb. 28, 2018, 10:12 a.m., Akash Gupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65841/
> ---
> 
> (Updated Feb. 28, 2018, 10:12 a.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer, Joseph Wu, and Michael Park.
> 
> 
> Bugs: MESOS-4549
> https://issues.apache.org/jira/browse/MESOS-4549
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> `os::system` returned -1 on error, which is a valid error code on
> Windows, e.g., `os::system("exit -1")`, so it was impossible to
> distinguish a failure from a process returning -1. With `Option`,
> failures will return as `None`..
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/posix/shell.hpp 
> b878718e137e4c8b599e0f4dac67672d8df27f7d 
>   3rdparty/stout/include/stout/os/windows/shell.hpp 
> 1696d084c8453fa0eb9ef41e0a4128e547f7f53c 
>   3rdparty/stout/tests/os/rmdir_tests.cpp 
> 2fd24a825068019423674f74fb71cc61d3c504b5 
>   3rdparty/stout/tests/os_tests.cpp d04c3b95b24a519eb21ff07fa51da7c2cefe2b61 
>   src/slave/containerizer/mesos/launch.cpp 
> 75b7eaf9cd62d6b5f02896175168b651f4517e12 
>   src/tests/containerizer/cgroups_isolator_tests.cpp 
> 59b23be78efb4025bfef53a19ab97d9873ab8dc9 
>   src/tests/containerizer/memory_pressure_tests.cpp 
> 0c3e738ce05553a2ee5c38c6748d6d28a1eb93d3 
>   src/tests/environment.cpp 1cba274e0e684b123ce1e4d9cd296f428022fcdc 
> 
> 
> Diff: https://reviews.apache.org/r/65841/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Akash Gupta
> 
>



Re: Review Request 65841: Changed `os::system` to return `Option` instead of `int`.

2018-02-28 Thread Andrew Schwartzmeyer

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65841/#review198422
---



Before I review this, it needs to be split for stout and mesos changes.

- Andrew Schwartzmeyer


On Feb. 28, 2018, 10:12 a.m., Akash Gupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65841/
> ---
> 
> (Updated Feb. 28, 2018, 10:12 a.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer, Joseph Wu, and Michael Park.
> 
> 
> Bugs: MESOS-4549
> https://issues.apache.org/jira/browse/MESOS-4549
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> `os::system` returned -1 on error, which is a valid error code on
> Windows, e.g., `os::system("exit -1")`, so it was impossible to
> distinguish a failure from a process returning -1. With `Option`,
> failures will return as `None`..
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/posix/shell.hpp 
> b878718e137e4c8b599e0f4dac67672d8df27f7d 
>   3rdparty/stout/include/stout/os/windows/shell.hpp 
> 1696d084c8453fa0eb9ef41e0a4128e547f7f53c 
>   3rdparty/stout/tests/os/rmdir_tests.cpp 
> 2fd24a825068019423674f74fb71cc61d3c504b5 
>   3rdparty/stout/tests/os_tests.cpp d04c3b95b24a519eb21ff07fa51da7c2cefe2b61 
>   src/slave/containerizer/mesos/launch.cpp 
> 75b7eaf9cd62d6b5f02896175168b651f4517e12 
>   src/tests/containerizer/cgroups_isolator_tests.cpp 
> 59b23be78efb4025bfef53a19ab97d9873ab8dc9 
>   src/tests/containerizer/memory_pressure_tests.cpp 
> 0c3e738ce05553a2ee5c38c6748d6d28a1eb93d3 
>   src/tests/environment.cpp 1cba274e0e684b123ce1e4d9cd296f428022fcdc 
> 
> 
> Diff: https://reviews.apache.org/r/65841/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Akash Gupta
> 
>



Re: Review Request 65840: Windows: Fixed remaining W* macros in `windows.hpp`.

2018-02-28 Thread Andrew Schwartzmeyer

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65840/#review198421
---




3rdparty/stout/include/stout/windows.hpp
Line 367 (original), 367 (patched)


This seems to contradict the statement above that `-1` is a valid exit code 
on Windows.


- Andrew Schwartzmeyer


On Feb. 28, 2018, 10:10 a.m., Akash Gupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65840/
> ---
> 
> (Updated Feb. 28, 2018, 10:10 a.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer and Joseph Wu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The `WIFEXITED` and `WIFSIGNALED` were incorrectly checking if the
> exit code was not -1 to determine if the process exited or was signaled.
> However, -1 is a valid return code on Windows, so places in the Mesos
> codebase that called `CHECK(WIFEXITED(status)|| WIFSIGNALED(status))`
> would end up aborting the agent or executor.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/windows.hpp 
> b35e6b94ba6709254450be9429b6f48f2d276689 
> 
> 
> Diff: https://reviews.apache.org/r/65840/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Akash Gupta
> 
>



Re: Review Request 65839: Windows: Removed W* test-only macros in `windows.hpp`.

2018-02-28 Thread Andrew Schwartzmeyer

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65839/#review198420
---



Changes LGTM but this needs to be split into three commits as you're touching 
stout, libprocess, and mesos. Do you have the Git hooks installed? They should 
have warned you.

- Andrew Schwartzmeyer


On Feb. 28, 2018, 10:09 a.m., Akash Gupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65839/
> ---
> 
> (Updated Feb. 28, 2018, 10:09 a.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer and Joseph Wu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> `windows.hpp` defined a few macros for the status value of `waitpid`
> that were either unused in the code base or were only used in gtest.
> These macros handled signal logic, so they were removed since they
> aren't used on Windows.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/include/process/gtest.hpp 
> dd4c9bdf7e6e0e45db347108fde3b41960974be6 
>   3rdparty/stout/include/stout/gtest.hpp 
> ce6e4de6a9f69ea121ff0ca6048316d678c8c857 
>   3rdparty/stout/include/stout/windows.hpp 
> b35e6b94ba6709254450be9429b6f48f2d276689 
>   src/checks/checker_process.cpp cf9ec053946e620eb36e92d647ab864c4e88d506 
> 
> 
> Diff: https://reviews.apache.org/r/65839/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Akash Gupta
> 
>



Re: Review Request 65683: Updated discard handling in `Docker::inspect()`.

2018-02-28 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65683/
---

(Updated March 1, 2018, 12:07 a.m.)


Review request for mesos, Andrei Budnik, Gilbert Song, Michael Park, and Vinod 
Kone.


Changes
---

Added a missing CHECK guard.


Summary (updated)
-

Updated discard handling in `Docker::inspect()`.


Bugs: MESOS-8576
https://issues.apache.org/jira/browse/MESOS-8576


Repository: mesos


Description (updated)
---

Previously, discards of the `Future` returned by `Docker::inspect()`
were only handled at the beginning of each asynchronous continuation
in the library function's call chain. This meant that if a Docker
CLI command became stuck in between async calls, discarding the
`Future` would have no effect.

This patch adds an `onDiscard` callback to the `Future` to ensure
that any discards have the desired effect: cleanup of any spawned
subprocess, and a transition of the `Future` to the discarded state.

Since the Docker library is not a libprocess process, we must
implement this with a `shared_ptr` and a mutex, to protect against
concurrent access to the `onDiscard` callback, which must be updated
when retries are performed.


Diffs (updated)
-

  src/docker/docker.hpp 0116ca8f28303905d80cd6b3a33a7a7feacc8f8a 
  src/docker/docker.cpp f7a4466292fcedd7e97740698475da2c616ca1d0 


Diff: https://reviews.apache.org/r/65683/diff/9/

Changes: https://reviews.apache.org/r/65683/diff/8-9/


Testing (updated)
---

sudo bin/mesos-tests.sh --gtest_filter="*DOCKER*"


Thanks,

Greg Mann



Re: Review Request 65683: Updated discard handling in 'Docker::inspect()'.

2018-02-28 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65683/
---

(Updated Feb. 28, 2018, 11:48 p.m.)


Review request for mesos, Andrei Budnik, Gilbert Song, Michael Park, and Vinod 
Kone.


Bugs: MESOS-8575
https://issues.apache.org/jira/browse/MESOS-8575


Repository: mesos


Description (updated)
---

Updated discard handling in `Docker::inspect()`.

Previously, discards of the `Future` returned by `Docker::inspect()`
were only handled at the beginning of each asynchronous continuation
in the library function's call chain. This meant that if a Docker
CLI command became stuck in between async calls, discarding the
`Future` would have no effect.

This patch adds an `onDiscard` callback to the `Future` to ensure
that any discards have the desired effect: cleanup of any spawned
subprocess, and a transition of the `Future` to the discarded state.

Since the Docker library is not a libprocess process, we must
implement this with a `shared_ptr` and a mutex, to protect against
concurrent access to the `onDiscard` callback, which must be updated
when retries are performed.


Diffs
-

  src/docker/docker.hpp 0116ca8f28303905d80cd6b3a33a7a7feacc8f8a 
  src/docker/docker.cpp f7a4466292fcedd7e97740698475da2c616ca1d0 


Diff: https://reviews.apache.org/r/65683/diff/8/


Testing (updated)
---

sudo bin/mesos-tests.sh --gtest_filter="*DOCKER*"


Thanks,

Greg Mann



Re: Review Request 65683: Updated discard handling in 'Docker::inspect()'.

2018-02-28 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65683/
---

(Updated Feb. 28, 2018, 11:49 p.m.)


Review request for mesos, Andrei Budnik, Gilbert Song, Michael Park, and Vinod 
Kone.


Bugs: MESOS-8576
https://issues.apache.org/jira/browse/MESOS-8576


Repository: mesos


Description
---

Updated discard handling in `Docker::inspect()`.

Previously, discards of the `Future` returned by `Docker::inspect()`
were only handled at the beginning of each asynchronous continuation
in the library function's call chain. This meant that if a Docker
CLI command became stuck in between async calls, discarding the
`Future` would have no effect.

This patch adds an `onDiscard` callback to the `Future` to ensure
that any discards have the desired effect: cleanup of any spawned
subprocess, and a transition of the `Future` to the discarded state.

Since the Docker library is not a libprocess process, we must
implement this with a `shared_ptr` and a mutex, to protect against
concurrent access to the `onDiscard` callback, which must be updated
when retries are performed.


Diffs
-

  src/docker/docker.hpp 0116ca8f28303905d80cd6b3a33a7a7feacc8f8a 
  src/docker/docker.cpp f7a4466292fcedd7e97740698475da2c616ca1d0 


Diff: https://reviews.apache.org/r/65683/diff/8/


Testing
---

sudo bin/mesos-tests.sh --gtest_filter="*DOCKER*"


Thanks,

Greg Mann



Re: Review Request 65799: Ported mesos-master to Windows.

2018-02-28 Thread Gaston Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65799/#review198418
---




src/master/main.cpp
Lines 212 (patched)


```
s/The '--ip.discovery_command'/`--ip_discovery_command`/
```


- Gaston Kleiman


On Feb. 24, 2018, 7:19 p.m., Sachin Paryani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65799/
> ---
> 
> (Updated Feb. 24, 2018, 7:19 p.m.)
> 
> 
> Review request for mesos and Andrew Schwartzmeyer.
> 
> 
> Bugs: MESOS-5820
> https://issues.apache.org/jira/browse/MESOS-5820
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The change allows for the mesos-master executable to work on Windows.
> The change was to remove the part of the code for that utilizes the
> ip_discovery_command flag if the build is Windows and replaced with
> an appropriate error message.
> 
> 
> Diffs
> -
> 
>   src/master/CMakeLists.txt ec552110509d17628107f4ad98f2d04675ed3ce3 
>   src/master/main.cpp 8cb52d0c452ab3c328eea80aa56f1c984de372e9 
> 
> 
> Diff: https://reviews.apache.org/r/65799/diff/1/
> 
> 
> Testing
> ---
> 
> Successfully built the master and then started the mesos-master executable.  
> The following command was used:
> ./src/mesos-master --registry='in_memory' --webui_dir=../src/webui
> 
> 
> Thanks,
> 
> Sachin Paryani
> 
>



Re: Review Request 65795: Adding warning to `apply-reviews.py` for previously submitted patches.

2018-02-28 Thread Gaston Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65795/#review198417
---


Fix it, then Ship it!




Thanks for your contribution!


support/apply-reviews.py
Lines 113-114 (patched)


The space between "already" and "been" should be moved to line 114 to be 
consistent with the style of the other messages.


- Gaston Kleiman


On Feb. 24, 2018, 5:36 p.m., Shubham Kumar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65795/
> ---
> 
> (Updated Feb. 24, 2018, 5:36 p.m.)
> 
> 
> Review request for mesos and Andrew Schwartzmeyer.
> 
> 
> Bugs: MESOS-8586
> https://issues.apache.org/jira/browse/MESOS-8586
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This commit emits a warning when a chain skips submitted patches so
> that the user is aware they were ignored.
> 
> 
> Diffs
> -
> 
>   support/apply-reviews.py 9fc1579f732609dac1b19ff5862f374e71e6cb53 
> 
> 
> Diff: https://reviews.apache.org/r/65795/diff/1/
> 
> 
> Testing
> ---
> 
> Test done.
> 
> command: ./support/apply-reviews.py -c -r 65397
> output: Warning: Review 65397 has already been applied
> 
> 
> Thanks,
> 
> Shubham Kumar
> 
>



Re: Review Request 65759: Added inspect retries to the Docker executor.

2018-02-28 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65759/#review198414
---


Ship it!




Ship It!

- Greg Mann


On Feb. 28, 2018, 6:05 p.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65759/
> ---
> 
> (Updated Feb. 28, 2018, 6:05 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
> Michael Park.
> 
> 
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds retries for `inspect` command to workaround docker
> daemon hangs. We assume that the docker daemon can be temporarily
> unresponsive. If it's unresponsive, then any started docker cli
> command hangs. To address the issue, we retry `inspect` in the loop.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65759/diff/6/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> Manually, described in /r/65713
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 65847: Fixed allocator test `QuotaAbsentFramework`.

2018-02-28 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65847/#review198412
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65661', '65819', '65820', '65821', '65844', '65845', 
'65847']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65847

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65847/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (133 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1215 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (38 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (42 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (82 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2495 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2525 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2559 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2584 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (498283 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65847/logs/mesos-tests-stderr.log):

```
I0228 22:10:39.230201  9160 slave.cpp:3879] Shutting down framework 
4cb7c9f6-982b-44eb-bf6a-cdbf9bd88bbb-
I0228 22:10:39.230201 10216 master.cpp:10258] Updating the state of task 
385eeb95-6621-493f-86e2-e1305ac9e884 of framework 
4cb7c9f6-982b-44eb-bf6a-cdbf9bd88bbb- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0228 22:10:39.231204  9160 slave.cpp:6586] Shutting down executor 
'385eeb95-6621-493f-86e2-e1305ac9e884' of framework 
4cb7c9f6-982b-44eb-bf6a-cdbf9bd88bbb- at executor(1)@10.3.1.5:52835
I0228 22:10:39.232201  9160 slave.cpp:922] Agent terminating
W0228 22:10:39.233201  9160 slave.cpp:3875] Ignoring shutdown framework 
4cb7c9f6-982b-44eb-bf6a-cdbf9bd88bbb- because it is terminating
I0228 22:10:39.234201 10216 master.cpp:10357] Removing task 
385eeb95-6621-493f-86e2-e1305ac9e884 with resources ports(allocated: 
*):[31000-32000]; disk(allocated: *):1024; cpus(allocated: *):4; mem(allocated: 
*):2048 of framework 4cb7c9f6-982b-44eb-bf6a-cdbf9bd88bbb- on agent 
4cb7c9f6-982b-44eb-bf6a-cdbf9bd88bbb-S0 at slave(398)@10.3.1.5:52814 
(build-srv-04.zq4gs31qjI0228 22:10:38.525193  5524 exec.cpp:162] Version: 1.6.0
I0228 22:10:38.553197  7144 exec.cpp:236] Executor registered on agent 
4cb7c9f6-982b-44eb-bf6a-cdbf9bd88bbb-S0
I0228 22:10:38.558202  7452 executor.cpp:176] Received SUBSCRIBED event
I0228 22:10:38.563199  7452 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0228 22:10:38.564200  7452 executor.cpp:176] Received LAUNCH event
I0228 22:10:38.569198  7452 executor.cpp:648] Starting task 
385eeb95-6621-493f-86e2-e1305ac9e884
I0228 22:10:38.652200  7452 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0228 22:10:39.198199  7452 executor.cpp:661] Forked command at 1656
I0228 22:10:39.233201  1292 exec.cpp:445] Executor asked to shutdown
I0228 22:10:39.234201  7452 executor.cpp:176] Received SHUTDOWN event
I0228 22:10:39.235203  7452 executor.cpp:758] Shutting down
I0228 22:10:39.235203  7452 executor.cpp:868] Sending SIGTERM to process tree 
at pid 1diunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0228 22:10:39.236205  9160 containerizer.cpp:2338] Destroying container 
e8ed2a08-2f14-45fb-b6c7-a6065d1f636a in RUNNING state
I0228 22:10:39.236205  9160 containerizer.cpp:2952] Transitioning the state of 
container e8ed2a08-2f14-45fb-b6c7-a6065d1f636a from RUNNING to DESTROYING
I0228 22:10:39.237200 10216 master.cpp:1306] Agent 
4cb7c9f6-982b-44eb-bf6a-cdbf9bd88bbb-S0 at slave(398)@10.3.1.5:52814 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0228 22:10:39.237200 10216 master.cpp:3276] Disconnecting agent 
4cb7c9f6-982b-44eb-bf6a-cdbf9bd88bbb-S0 at 

Re: Review Request 63368: Added MemoryProfiler class to Libprocess.

2018-02-28 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63368/#review198409
---




3rdparty/libprocess/src/memory_profiler.cpp
Lines 305 (patched)


This is a speculation, `result` should be informative enough. A likely 
reason is that `jeprof` is absent on the machine, in which case this line is 
confusing.



3rdparty/libprocess/src/memory_profiler.cpp
Lines 312-322 (patched)


Ideally we would respond with a simply formatted data, e.g., JSON, if a 
request comes from `curl`/`httpie` and alike, and a "proper" html if it comes 
from a browser.

Our help endpoints try to detect `curl` and `httpie` and respond 
accordingly. Another approach would be to introduce a parameter to the 
endpoints, e.g., `format=json` that allows to control the format of the 
response.



3rdparty/libprocess/src/memory_profiler.cpp
Lines 675 (patched)


Can you please explain why you need default capture-by-reference here and 
below?



3rdparty/libprocess/src/memory_profiler.cpp
Lines 689 (patched)


s\in"\in "



3rdparty/libprocess/src/memory_profiler.cpp
Lines 689-692 (patched)


Technically, you do not redirect to the result, but you redirect to the 
`stop` endpoint if the browser is used. `curl` users will have to request 
`stop` manually:
```
alex@gru1: ~$ http 192.99.40.208:/memory-profiler/start?duration=1mins
HTTP/1.1 200 OK
Content-Length: 175
Content-Type: text/html
Date: Wed, 28 Feb 2018 21:32:42 GMT


Started new heap profiling run.You will be redirected to the result 
in60s.
```


- Alexander Rukletsov


On Feb. 21, 2018, 10:41 a.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63368/
> ---
> 
> (Updated Feb. 21, 2018, 10:41 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Benjamin Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This class exposes profiling functionality of jemalloc memory allocator
> when it is detected to be the memory allocator of the current process.
> 
> In particular, it gives developers an easy method to collect and
> access heap profiles which report which pieces of code were
> responsible for allocating memory.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/Makefile.am c895d3ac7b9cc5ffd6c8b57ff24def866eb0213d 
>   3rdparty/libprocess/include/process/memory_profiler.hpp PRE-CREATION 
>   3rdparty/libprocess/include/process/process.hpp 
> 8661706cb058efb26f5bfbcc84972f9930d3670f 
>   3rdparty/libprocess/src/CMakeLists.txt 
> 0ce7dac5deea94623530820d173ce3ffe5b42ea4 
>   3rdparty/libprocess/src/memory_profiler.cpp PRE-CREATION 
>   3rdparty/libprocess/src/process.cpp 
> 42e7adf740b234ebf23d2dcb71440749c0ed87ec 
> 
> 
> Diff: https://reviews.apache.org/r/63368/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 63372: Added documentation for memory profiling.

2018-02-28 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63372/#review198404
---




docs/memory-profiling.md
Lines 22 (patched)


Ain't it called `--enable-jemalloc-allocator`?



docs/memory-profiling.md
Lines 49 (patched)


This endpoint spits out a huge JSON file even on a fresh master without any 
load. Maybe it makes sense to say what might be interesting / what this JSON 
includes; including JSON schema would also be great.



docs/memory-profiling.md
Lines 95-97 (patched)


How a user gets a compatible `jeprof` installed? Do we expect them to 
figure out the bundled version of jemalloc and then figure out how to get and 
install jeprof onto the machine? Can we do it ourselves during the configure 
step? Or maybe better have memory-profiler use the bundled jeprof?


- Alexander Rukletsov


On Jan. 24, 2018, 4:40 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63372/
> ---
> 
> (Updated Jan. 24, 2018, 4:40 p.m.)
> 
> 
> Review request for mesos and Alexander Rukletsov.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added documentation for memory profiling.
> 
> 
> Diffs
> -
> 
>   docs/memory-profiling.md PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63372/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 65661: Updated comment and variables to decouple quota limit and guarantee.

2018-02-28 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65661/#review198410
---



Patch looks great!

Reviews applied: [65661]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 15, 2018, 10:42 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65661/
> ---
> 
> (Updated Feb. 15, 2018, 10:42 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Kapil Arya, Joseph Wu, Michael 
> Park, and Till Toenshoff.
> 
> 
> Bugs: MESOS-8456
> https://issues.apache.org/jira/browse/MESOS-8456
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We currently do not differentiate between quota guarantee and limit
> i.e. quota currently is set for both guarantee and limit. As a
> first step to introduce quota limit, comments and related variable
> names are updated to reflect the difference between quota guarantee
> and limit.
> 
> Also fixed a few typos and removed some outdated comments.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> 37d2df58d374f118697fb5848c36f78b3e65128d 
>   src/master/allocator/mesos/hierarchical.cpp 
> 58aa83ffdb194bbb9bb518a6b77e23fa49d340f4 
> 
> 
> Diff: https://reviews.apache.org/r/65661/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65661: Updated comment and variables to decouple quota limit and guarantee.

2018-02-28 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65661/#review198407
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65661']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65661

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65661/logs/mesos-tests-stdout.log):

```
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (39 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (46 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (87 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2453 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2477 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2462 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2486 ms total)

[--] Global test environment tear-down
D:\DCOS\mesos\mesos\src\tests\environment.cpp(950): error: Failed
Tests completed with child processes remaining:
-+- 7616 0092B2EFDFFC
 --- 1212 0092B2EFDFFC
[==] 915 tests from 90 test cases ran. (483191 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65661/logs/mesos-tests-stderr.log):

```
I0228 21:10:10.175643  1392 slave.cpp:3879] Shutting down framework 
063cb035-d536-49c9-aeb8-c1d9e597d682-
I0228 21:10:10.175643  4032 master.cpp:10258] Updating the state of task 
d04903f2-ef59-452d-a383-005e158b4c62 of framework 
063cb035-d536-49c9-aeb8-c1d9e597d682- (I0228 21:10:09.460642  9668 
exec.cpp:162] Version: 1.6.0
I0228 21:10:09.495640  6748 exec.cpp:236] Executor registered on agent 
063cb035-d536-49c9-aeb8-c1d9e597d682-S0
I0228 21:10:09.499639  9808 executor.cpp:176] Received SUBSCRIBED event
I0228 21:10:09.504665  9808 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0228 21:10:09.504665  9808 executor.cpp:176] Received LAUNCH event
I0228 21:10:09.509665  9808 executor.cpp:648] Starting task 
d04903f2-ef59-452d-a383-005e158b4c62
I0228 21:10:09.591665  9808 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0228 21:10:10.136643  9808 executor.cpp:661] Forked command at 888
I0228 21:10:10.178644  9820 exec.cpp:445] Executor asked to shutdown
I0228 21:10:10.179644  9808 executor.cpp:176] Received SHUTDOWN event
I0228 21:10:10.179644  9808 executor.cpp:758] Shutting down
I0228 21:10:10.179644  9808 executor.cpp:868] Sending SIGTERM to process tree 
at pid 88latest state: TASK_KILLED, status update state: TASK_KILLED)
I0228 21:10:10.175643  1392 slave.cpp:6586] Shutting down executor 
'd04903f2-ef59-452d-a383-005e158b4c62' of framework 
063cb035-d536-49c9-aeb8-c1d9e597d682- at executor(1)@10.3.1.5:50637
I0228 21:10:10.177644  1392 slave.cpp:922] Agent terminating
W0228 21:10:10.177644  1392 slave.cpp:3875] Ignoring shutdown framework 
063cb035-d536-49c9-aeb8-c1d9e597d682- because it is terminating
I0228 21:10:10.180647  4032 master.cpp:10357] Removing task 
d04903f2-ef59-452d-a383-005e158b4c62 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework 063cb035-d536-49c9-aeb8-c1d9e597d682- on 
agent 063cb035-d536-49c9-aeb8-c1d9e597d682-S0 at slave(398)@10.3.1.5:50616 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0228 21:10:10.180647  9468 containerizer.cpp:2338] Destroying container 
ea80a9a4-db8e-4057-a2e4-489d5915cf5a in RUNNING state
I0228 21:10:10.180647  9468 containerizer.cpp:2952] Transitioning the state of 
container ea80a9a4-db8e-4057-a2e4-489d5915cf5a from RUNNING to DESTROYING
I0228 21:10:10.182643  9468 launcher.cpp:156] Asked to destroy container 
ea80a9a4-db8e-4057-a2e4-489d5915cf5a
I0228 21:10:10.183642  4032 master.cpp:1306] Agent 
063cb035-d536-49c9-aeb8-c1d9e597d682-S0 at slave(398)@10.3.1.5:50616 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0228 21:10:10.183642  4032 master.cpp:3276] Disconnecting agent 
063cb035-d536-49c9-aeb8-c1d9e597d682-S0 

Review Request 65847: Fixed allocator test `QuotaAbsentFramework`.

2018-02-28 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65847/
---

Review request for mesos, Benjamin Mahler and Michael Park.


Bugs: MESOS-8456
https://issues.apache.org/jira/browse/MESOS-8456


Repository: mesos


Description
---

`QuotaAbsentFramework` assumes the coarse grained nature
of second stage resource allocation. This is no longer
true given #65821. This patch fixes the test by asserting
the allocation in terms of quantity.


Diffs
-

  src/tests/hierarchical_allocator_tests.cpp 
c97b2ba0884a7ded867c2d80e4749de54c89b5e4 


Diff: https://reviews.apache.org/r/65847/diff/1/


Testing
---

make check


Thanks,

Meng Zhu



Review Request 65845: Fixed broken test `SlavesEndpointFullResources`.

2018-02-28 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65845/
---

Review request for mesos, Benjamin Mahler and Michael Park.


Bugs: MESOS-8456
https://issues.apache.org/jira/browse/MESOS-8456


Repository: mesos


Description
---

The test relies on the order of the JSON array output and is broken
after the recent allocator changes in #65821.
This patch fixes this test by parsing the JSON array to resources,
so that the comparison is order agnostic.


Diffs
-

  src/tests/persistent_volume_endpoints_tests.cpp 
94b8caac0c298d0e660b9bfe523b02a919f81594 


Diff: https://reviews.apache.org/r/65845/diff/1/


Testing
---

make check


Thanks,

Meng Zhu



Review Request 65844: Added `upgradeResource` to some resource parsing functions.

2018-02-28 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65844/
---

Review request for mesos, Benjamin Mahler and Michael Park.


Repository: mesos


Description
---

Currently only `Resources::parse()` calls `upgradeResource()`,
but the actual parsing functions `Resources::fromJSON()` and
`Resources::fromSimpleString()` which are also public utilities
do not, this may produce invalid resource objects.

This patch moves the `upgradeResource()` to the two internal
parsing functions, so that all resource objects parsed are
valid.


Diffs
-

  src/common/resources.cpp a0c29246b3e3b377aabd0eb00ba16aa6bbd804e3 


Diff: https://reviews.apache.org/r/65844/diff/1/


Testing
---

make check


Thanks,

Meng Zhu



Review Request 65821: Made second resource allocation stage burst-able (below limit).

2018-02-28 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65821/
---

Review request for mesos, Benjamin Mahler and Michael Park.


Bugs: MESOS-8456
https://issues.apache.org/jira/browse/MESOS-8456


Repository: mesos


Description
---

The second allocation stage was only for non-quota role allocation.
This patch makes the second allocation stage a burst stage for
all roles to get resource allocations above their guarantee but
below limit.


Diffs
-

  src/master/allocator/mesos/hierarchical.cpp 
58aa83ffdb194bbb9bb518a6b77e23fa49d340f4 


Diff: https://reviews.apache.org/r/65821/diff/1/


Testing
---

make check


Thanks,

Meng Zhu



Review Request 65820: Added an utility to filter resources reserved by ancestor roles.

2018-02-28 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65820/
---

Review request for mesos, Benjamin Mahler and Michael Park.


Repository: mesos


Description
---

This resource utility returns resources reserved by a role's
ancestors (but not by the role itself).


Diffs
-

  include/mesos/resources.hpp 7afe0d8090a5075faed22bd408310ae1bb464bce 
  include/mesos/v1/resources.hpp c7120575d0dbfb9f8c9bb6b0cf9a4cea3d9f6932 
  src/common/resources.cpp a0c29246b3e3b377aabd0eb00ba16aa6bbd804e3 
  src/v1/resources.cpp 7fc6cd803982ea3c28bf0983d868f14fc6355cc9 


Diff: https://reviews.apache.org/r/65820/diff/1/


Testing
---

make check


Thanks,

Meng Zhu



Review Request 65819: Enforced quota limit in the first resource allocation stage.

2018-02-28 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65819/
---

Review request for mesos, Benjamin Mahler and Michael Park.


Bugs: MESOS-8456
https://issues.apache.org/jira/browse/MESOS-8456


Repository: mesos


Description
---

In the first allocation stage where we allocate resources to
roles with unsatisfied quota guarantee, we used to also allocate
resources that the role has no guarantee for as long as it does
not break the global headroom. With the introduction of quota
limit, when we make these additional "add-on" allocations, we
should also make sure that these won't break the role's own
quota limit.


Diffs
-

  src/master/allocator/mesos/hierarchical.cpp 
58aa83ffdb194bbb9bb518a6b77e23fa49d340f4 


Diff: https://reviews.apache.org/r/65819/diff/1/


Testing
---

make check


Thanks,

Meng Zhu



Re: Review Request 65842: Changed `os::spawn` to return `Option` instead of `int`.

2018-02-28 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65842/#review198403
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65839', '65840', '65841', '65842']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65842

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65842/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (125 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1208 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (37 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (49 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (88 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2406 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2430 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2563 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2588 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (482668 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65842/logs/mesos-tests-stderr.log):

```
I0228 20:09:13.533985   460 slave.cpp:3879] Shutting down framework 
cac8df1c-78d2-4a08-9812-815f9f3ebaee-
I0228 20:09:13.533985  9548 master.cpp:10258] Updating the state of task 
d015bdc4-f2fc-4bd2-9d84-3b83d8741674 of framework 
cac8df1c-78d2-4a08-9812-815f9f3ebaee- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0228 20:09:13.533985   460 slave.cpp:6586] Shutting down executor 
'd015bdc4-f2fc-4bd2-9d84-3b83d8741674' of framework 
cac8df1c-78d2-4a08-9812-815f9f3ebaee- at executor(1)@10.3.1.5:64832
I0228 20:09:13.535985   460 slave.cpp:922] Agent terminating
W0228 20:09:13.535985   460 slave.cpp:3875] Ignoring shutdown framework 
cac8df1c-78d2-4a08-9812-815f9f3ebaee- becI0228 20:09:12.808979  1212 
exec.cpp:162] Version: 1.6.0
I0228 20:09:12.836979 10128 exec.cpp:236] Executor registered on agent 
cac8df1c-78d2-4a08-9812-815f9f3ebaee-S0
I0228 20:09:12.840981  7856 executor.cpp:176] Received SUBSCRIBED event
I0228 20:09:12.845983  7856 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0228 20:09:12.846983  7856 executor.cpp:176] Received LAUNCH event
I0228 20:09:12.850985  7856 executor.cpp:648] Starting task 
d015bdc4-f2fc-4bd2-9d84-3b83d8741674
I0228 20:09:12.934008  7856 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0228 20:09:13.494029  7856 executor.cpp:661] Forked command at 7116
I0228 20:09:13.536986  1696 exec.cpp:445] Executor asked to shutdown
I0228 20:09:13.537986  7856 executor.cpp:176] Received SHUTDOWN event
I0228 20:09:13.537986  7856 executor.cpp:758] Shutting down
I0228 20:09:13.537986  7856 executor.cpp:868] Sending SIGTERM to process tree 
at pid 7ause it is terminating
I0228 20:09:13.536986  9548 master.cpp:10357] Removing task 
d015bdc4-f2fc-4bd2-9d84-3b83d8741674 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework cac8df1c-78d2-4a08-9812-815f9f3ebaee- on 
agent cac8df1c-78d2-4a08-9812-815f9f3ebaee-S0 at slave(398)@10.3.1.5:64811 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0228 20:09:13.540984  7184 containerizer.cpp:2338] Destroying container 
f57486f2-bc7b-42bb-9cfd-72dbe2dad7b4 in RUNNING state
I0228 20:09:13.540984  7184 containerizer.cpp:2952] Transitioning the state of 
container f57486f2-bc7b-42bb-9cfd-72dbe2dad7b4 from RUNNING to DESTROYING
I0228 20:09:13.541985  9548 master.cpp:1306] Agent 
cac8df1c-78d2-4a08-9812-815f9f3ebaee-S0 at slave(398)@10.3.1.5:64811 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0228 20:09:13.541985  7184 launcher.cpp:156] Asked to destroy container 
f57486f2-bc7b-42bb-9cfd-72dbe2dad7b4
I0228 20:09:13.542986  9548 

Re: Review Request 63366: Added jemalloc release tarball and build rules.

2018-02-28 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63366/
---

(Updated Feb. 28, 2018, 7:56 p.m.)


Review request for mesos, Alexander Rukletsov and Benjamin Mahler.


Summary (updated)
-

Added jemalloc release tarball and build rules.


Repository: mesos


Description (updated)
---

Added jemalloc release tarball and build rules.


Diffs
-

  3rdparty/CMakeLists.txt da605707b89bbe9b3db9e60bc0b0a26dac46e56e 
  3rdparty/Makefile.am f4cb731198e271b967b920235ed2785e9296f9e1 
  3rdparty/cmake/Versions.cmake 94b0d8048412e00e2480f45e7ce4e6494da4bd5d 
  3rdparty/jemalloc-5.0.1.tar.bz2 PRE-CREATION 
  3rdparty/versions.am 3e008d5a7bdc029de23950463c295b968b3280c8 
  cmake/CompilationConfigure.cmake 50cddf9476c8c5196c4824a7b060c2680a96b277 
  configure.ac 30fbadc32d1d96f719d45fa8067f975283c25507 
  src/Makefile.am e444b8981093ba6f4e94acb7a060b998f75b542a 
  src/master/CMakeLists.txt ec552110509d17628107f4ad98f2d04675ed3ce3 
  src/slave/CMakeLists.txt 943e8f523f867e3dd4030f78eca7830500578faf 


Diff: https://reviews.apache.org/r/63366/diff/4/


Testing
---


Thanks,

Benno Evers



Re: Review Request 65462: Used the new 'route()' overload from libprocess.

2018-02-28 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65462/
---

(Updated Feb. 28, 2018, 7:56 p.m.)


Review request for mesos, Alexander Rukletsov and Benjamin Mahler.


Summary (updated)
-

Used the new 'route()' overload from libprocess.


Repository: mesos


Description
---

Make use of the new `ProcessBase::route()` overload introduced in the
previous commit to reduce the amount of duplicated logic.


Diffs
-

  src/files/files.cpp 324b4bcf0ef53931fa7ef9b7a891b160564705ff 
  src/master/registrar.cpp aabce22e4b06826e4452c61c27ebdd702b1e47bc 


Diff: https://reviews.apache.org/r/65462/diff/1/


Testing
---


Thanks,

Benno Evers



Re: Review Request 65461: Provided new overload of 'ProcessBase::route()'.

2018-02-28 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65461/
---

(Updated Feb. 28, 2018, 7:55 p.m.)


Review request for mesos, Alexander Rukletsov and Benjamin Mahler.


Summary (updated)
-

Provided new overload of 'ProcessBase::route()'.


Repository: mesos


Description (updated)
---

Provided new overload of 'ProcessBase::route()'.


Diffs
-

  3rdparty/libprocess/include/process/logging.hpp 
f9997677d69d54f5723d4fc0a495008d3ce11cc5 
  3rdparty/libprocess/include/process/process.hpp 
8661706cb058efb26f5bfbcc84972f9930d3670f 
  3rdparty/libprocess/include/process/profiler.hpp 
2991dd2033d68802a813de91babb47679c807aa0 
  3rdparty/libprocess/src/metrics/metrics.cpp 
f79541853b4c826014ee969633345c3d51520ecf 


Diff: https://reviews.apache.org/r/65461/diff/3/


Testing
---

It is a common pattern in libprocess to have a single function serving as 
endpoint that handles both authenticated and non-authenticated requests, 
providing `None()` as a default parameter in non-authenticated contexts.

This patch adds an overload to `ProcessBase::route()` implementing that so that 
it does not need to be re-implemented in each individual process.


Thanks,

Benno Evers



Re: Review Request 65838: Updated validation of 'RAW' and 'BLOCK' disk resources.

2018-02-28 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65838/#review198402
---



Patch looks great!

Reviews applied: [65838]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 28, 2018, 8:13 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65838/
> ---
> 
> (Updated Feb. 28, 2018, 8:13 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Jie Yu, and Jan Schlicht.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated validation of 'RAW' and 'BLOCK' disk resources.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp a0c29246b3e3b377aabd0eb00ba16aa6bbd804e3 
> 
> 
> Diff: https://reviews.apache.org/r/65838/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> `sudo make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 65840: Windows: Fixed remaining W* macros in `windows.hpp`.

2018-02-28 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65840/#review198400
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65839', '65840']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65840

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65840/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (127 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1178 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (39 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (45 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (85 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2428 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2453 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2575 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2601 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (486138 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65840/logs/mesos-tests-stderr.log):

```
I0228 19:10:05.489377  7064 slave.cpp:3879] Shutting I0228 19:10:04.783398  
6664 exec.cpp:162] Version: 1.6.0
I0228 19:10:04.812382   920 exec.cpp:236] Executor registered on agent 
84a13f3e-8581-4cd6-b054-cddf5a6d8a65-S0
I0228 19:10:04.816375   232 executor.cpp:176] Received SUBSCRIBED event
I0228 19:10:04.821398   232 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0228 19:10:04.821398   232 executor.cpp:176] Received LAUNCH event
I0228 19:10:04.826400   232 executor.cpp:648] Starting task 
a92b0ef1-c792-4f76-bc87-ea2280b8bde6
I0228 19:10:04.909375   232 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0228 19:10:05.449395   232 executor.cpp:661] Forked command at 3040
I0228 19:10:05.493376  6596 exec.cpp:445] Executor asked to shutdown
I0228 19:10:05.493376   232 executor.cpp:176] Received SHUTDOWN event
I0228 19:10:05.493376   232 executor.cpp:758] Shutting down
I0228 19:10:05.494376   232 executor.cpp:868] Sending SIGTERM to process tree 
at pid 3down framework 84a13f3e-8581-4cd6-b054-cddf5a6d8a65-
I0228 19:10:05.489377  5492 master.cpp:10258] Updating the state of task 
a92b0ef1-c792-4f76-bc87-ea2280b8bde6 of framework 
84a13f3e-8581-4cd6-b054-cddf5a6d8a65- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0228 19:10:05.490375  7064 slave.cpp:6586] Shutting down executor 
'a92b0ef1-c792-4f76-bc87-ea2280b8bde6' of framework 
84a13f3e-8581-4cd6-b054-cddf5a6d8a65- at executor(1)@10.3.1.5:62662
I0228 19:10:05.492377  7064 slave.cpp:922] Agent terminating
W0228 19:10:05.493376  7064 slave.cpp:3875] Ignoring shutdown framework 
84a13f3e-8581-4cd6-b054-cddf5a6d8a65- because it is terminating
I0228 19:10:05.495380  5492 master.cpp:10357] Removing task 
a92b0ef1-c792-4f76-bc87-ea2280b8bde6 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework 84a13f3e-8581-4cd6-b054-cddf5a6d8a65- on 
agent 84a13f3e-8581-4cd6-b054-cddf5a6d8a65-S0 at slave(398)@10.3.1.5:62641 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0228 19:10:05.496383  1488 containerizer.cpp:2338] Destroying container 
04a7e434-34b1-4688-a572-5a2d5dcb7fc0 in RUNNING state
I0228 19:10:05.497377  1488 containerizer.cpp:2952] Transitioning the state of 
container 04a7e434-34b1-4688-a572-5a2d5dcb7fc0 from RUNNING to DESTROYING
I0228 19:10:05.498376  1488 launcher.cpp:156] Asked to destroy container 
04a7e434-34b1-4688-a572-5a2d5dcb7fc0
I0228 19:10:05.498376  5492 master.cpp:1306] Agent 
84a13f3e-8581-4cd6-b054-cddf5a6d8a65-S0 at slave(398)@10.3.1.5:62641 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0228 19:10:05.498376  5492 master.cpp:3276] 

Review Request 65842: Changed `os::spawn` to return `Option` instead of `int`.

2018-02-28 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65842/
---

Review request for mesos, Andrew Schwartzmeyer, Joseph Wu, and Michael Park.


Bugs: MESOS-4549
https://issues.apache.org/jira/browse/MESOS-4549


Repository: mesos


Description
---

Similar to `os::system`, `os::spawn` returned -1 on error, which is a
valid exit code on Windows. Using the same solution for `os::system`,
now when `os::spawn` fails, `None` will be returned. Otherwise, the
process exit code will be returned.


Diffs
-

  3rdparty/stout/include/stout/os/posix/copyfile.hpp 
0d9ed5babbff05a84740df3dbe1594f0e3012360 
  3rdparty/stout/include/stout/os/posix/shell.hpp 
b878718e137e4c8b599e0f4dac67672d8df27f7d 
  3rdparty/stout/include/stout/os/windows/shell.hpp 
1696d084c8453fa0eb9ef41e0a4128e547f7f53c 
  src/linux/fs.cpp dae094224321c0974c705023daf076409049de51 
  src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
c60c23f74f2abf6bef8dd32cc2e47e33bf666169 
  src/tests/containerizer/perf_tests.cpp 
d8aab08eb131f974821fb85662cbc6cc685d2f3e 


Diff: https://reviews.apache.org/r/65842/diff/1/


Testing
---


Thanks,

Akash Gupta



Review Request 65841: Changed `os::system` to return `Option` instead of `int`.

2018-02-28 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65841/
---

Review request for mesos, Andrew Schwartzmeyer, Joseph Wu, and Michael Park.


Bugs: MESOS-4549
https://issues.apache.org/jira/browse/MESOS-4549


Repository: mesos


Description
---

`os::system` returned -1 on error, which is a valid error code on
Windows, e.g., `os::system("exit -1")`, so it was impossible to
distinguish a failure from a process returning -1. With `Option`,
failures will return as `None`..


Diffs
-

  3rdparty/stout/include/stout/os/posix/shell.hpp 
b878718e137e4c8b599e0f4dac67672d8df27f7d 
  3rdparty/stout/include/stout/os/windows/shell.hpp 
1696d084c8453fa0eb9ef41e0a4128e547f7f53c 
  3rdparty/stout/tests/os/rmdir_tests.cpp 
2fd24a825068019423674f74fb71cc61d3c504b5 
  3rdparty/stout/tests/os_tests.cpp d04c3b95b24a519eb21ff07fa51da7c2cefe2b61 
  src/slave/containerizer/mesos/launch.cpp 
75b7eaf9cd62d6b5f02896175168b651f4517e12 
  src/tests/containerizer/cgroups_isolator_tests.cpp 
59b23be78efb4025bfef53a19ab97d9873ab8dc9 
  src/tests/containerizer/memory_pressure_tests.cpp 
0c3e738ce05553a2ee5c38c6748d6d28a1eb93d3 
  src/tests/environment.cpp 1cba274e0e684b123ce1e4d9cd296f428022fcdc 


Diff: https://reviews.apache.org/r/65841/diff/1/


Testing
---


Thanks,

Akash Gupta



Review Request 65840: Windows: Fixed remaining W* macros in `windows.hpp`.

2018-02-28 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65840/
---

Review request for mesos, Andrew Schwartzmeyer and Joseph Wu.


Repository: mesos


Description
---

The `WIFEXITED` and `WIFSIGNALED` were incorrectly checking if the
exit code was not -1 to determine if the process exited or was signaled.
However, -1 is a valid return code on Windows, so places in the Mesos
codebase that called `CHECK(WIFEXITED(status)|| WIFSIGNALED(status))`
would end up aborting the agent or executor.


Diffs
-

  3rdparty/stout/include/stout/windows.hpp 
b35e6b94ba6709254450be9429b6f48f2d276689 


Diff: https://reviews.apache.org/r/65840/diff/1/


Testing
---


Thanks,

Akash Gupta



Review Request 65839: Windows: Removed W* test-only macros in `windows.hpp`.

2018-02-28 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65839/
---

Review request for mesos, Andrew Schwartzmeyer and Joseph Wu.


Repository: mesos


Description
---

`windows.hpp` defined a few macros for the status value of `waitpid`
that were either unused in the code base or were only used in gtest.
These macros handled signal logic, so they were removed since they
aren't used on Windows.


Diffs
-

  3rdparty/libprocess/include/process/gtest.hpp 
dd4c9bdf7e6e0e45db347108fde3b41960974be6 
  3rdparty/stout/include/stout/gtest.hpp 
ce6e4de6a9f69ea121ff0ca6048316d678c8c857 
  3rdparty/stout/include/stout/windows.hpp 
b35e6b94ba6709254450be9429b6f48f2d276689 
  src/checks/checker_process.cpp cf9ec053946e620eb36e92d647ab864c4e88d506 


Diff: https://reviews.apache.org/r/65839/diff/1/


Testing
---


Thanks,

Akash Gupta



Re: Review Request 65759: Added inspect retries to the Docker executor.

2018-02-28 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65759/
---

(Updated Feb. 28, 2018, 6:05 p.m.)


Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
Michael Park.


Bugs: MESOS-8574
https://issues.apache.org/jira/browse/MESOS-8574


Repository: mesos


Description
---

This patch adds retries for `inspect` command to workaround docker
daemon hangs. We assume that the docker daemon can be temporarily
unresponsive. If it's unresponsive, then any started docker cli
command hangs. To address the issue, we retry `inspect` in the loop.


Diffs (updated)
-

  src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 


Diff: https://reviews.apache.org/r/65759/diff/6/

Changes: https://reviews.apache.org/r/65759/diff/5-6/


Testing
---

internal CI

Manually, described in /r/65713


Thanks,

Andrei Budnik



Re: Review Request 65838: Updated validation of 'RAW' and 'BLOCK' disk resources.

2018-02-28 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65838/#review198397
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65838']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65838

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65838/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (124 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1159 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (37 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (42 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (83 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2464 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2490 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2457 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2482 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (481882 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65838/logs/mesos-tests-stderr.log):

```
I0228 17:21:07.602572  9088 master.cpp:10258] Updating the state of task 
ad4c1906-4af3-4297-8c8e-3a48d4b76c0e of framework 
7c635dbc-0475-4a94-90ee-67d4ee21ccb1- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0228 17:21:07.602572  8792 slave.cpp:3879] Shutting down framework 
7c635dbc-0475-4a94-90ee-67d4ee21ccb1-
I0228 17:21:07.602572  8792 slave.cpp:6586] Shutting down executor 
'ad4c1906-4af3-4297-8c8e-3a48d4b76c0e' of framework 
7c635dbc-0475-4a94-90ee-67d4ee21ccb1- at executor(1)@10.3.1.5:60453
I0228 17:21:07.603581  8792 slave.cpp:922] Agent terminating
W0228 17:21:07.603581  8792 slave.cpp:3875] Ignoring shutdown framework 
7c635dbc-0475-4a94-90ee-67d4ee21ccb1- because it is terminating
I0228 17:21:0I0228 17:21:06.900574  3544 exec.cpp:162] Version: 1.6.0
I0228 17:21:06.929574  1036 exec.cpp:236] Executor registered on agent 
7c635dbc-0475-4a94-90ee-67d4ee21ccb1-S0
I0228 17:21:06.934571  8392 executor.cpp:176] Received SUBSCRIBED event
I0228 17:21:06.939574  8392 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0228 17:21:06.939574  8392 executor.cpp:176] Received LAUNCH event
I0228 17:21:06.944573  8392 executor.cpp:648] Starting task 
ad4c1906-4af3-4297-8c8e-3a48d4b76c0e
I0228 17:21:07.025573  8392 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0228 17:21:07.566575  8392 executor.cpp:661] Forked command at 9672
I0228 17:21:07.604604  8900 exec.cpp:445] Executor asked to shutdown
I0228 17:21:07.605573  8392 executor.cpp:176] Received SHUTDOWN event
I0228 17:21:07.606578  8392 executor.cpp:758] Shutting down
I0228 17:21:07.606578  8392 executor.cpp:868] Sending SIGTERM to process tree 
at pid 97.605573  9088 master.cpp:10357] Removing task 
ad4c1906-4af3-4297-8c8e-3a48d4b76c0e with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework 7c635dbc-0475-4a94-90ee-67d4ee21ccb1- on 
agent 7c635dbc-0475-4a94-90ee-67d4ee21ccb1-S0 at slave(398)@10.3.1.5:60432 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0228 17:21:07.606578  7144 containerizer.cpp:2338] Destroying container 
a7769d0f-ccba-4b0a-8f25-bbb8f6f75331 in RUNNING state
I0228 17:21:07.607573  7144 containerizer.cpp:2952] Transitioning the state of 
container a7769d0f-ccba-4b0a-8f25-bbb8f6f75331 from RUNNING to DESTROYING
I0228 17:21:07.607573  7144 launcher.cpp:156] Asked to destroy container 
a7769d0f-ccba-4b0a-8f25-bbb8f6f75331
I0228 17:21:07.608572  9088 master.cpp:1306] Agent 
7c635dbc-0475-4a94-90ee-67d4ee21ccb1-S0 at slave(398)@10.3.1.5:60432 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0228 17:21:07.608572  9088 master.cpp:3276] Disconnecting agent 

Re: Review Request 65833: Displayed resource provider resources in GET_AGENTS response.

2018-02-28 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65833/#review198394
---



Patch looks great!

Reviews applied: [65587, 65588, 65589, 65590, 65591, 65832, 65833]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 28, 2018, 12:55 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65833/
> ---
> 
> (Updated Feb. 28, 2018, 12:55 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8388
> https://issues.apache.org/jira/browse/MESOS-8388
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Displayed resource provider resources in GET_AGENTS response.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto f40caa2a6338ff58a325444cfb72cd24f2c99472 
>   include/mesos/v1/master/master.proto 
> 67c9560eaa7bd6b20b73af07f7666f0d98c4453d 
>   src/common/protobuf_utils.cpp d2ada356be4c26290692bc99e7ec5121039bda4e 
>   src/tests/api_tests.cpp 9c172f751b11b3b50f23253d212c2eb603da9c2c 
> 
> 
> Diff: https://reviews.apache.org/r/65833/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 65759: Added inspect retries to the Docker executor.

2018-02-28 Thread Greg Mann


> On Feb. 28, 2018, 10:04 a.m., Greg Mann wrote:
> > src/docker/executor.cpp
> > Lines 242-247 (patched)
> > 
> >
> > Doesn't this render the `onFailed` callback registered on L357 useless? 
> > i.e., the `inspect` future will never transition to the failed state?
> > 
> > If so, we should probably either remove the `onFailed` callback, or do 
> > this instead:
> > ```
> > if (!future.hasDiscard()) {
> >   return Break(future);
> > }
> > ```
> > 
> > I guess the question is: if the inspect call actually fails, rather 
> > than hanging, do we want to retry? It looks like there are several cases in 
> > `Docker::inspect` which will result in a failure (failed to create 
> > subprocess, failed to read stdout, etc..), and it looks to me like we could 
> > probably just retry in those cases. WDYT?
> 
> Andrei Budnik wrote:
> Good point! If a docker daemon returns non-zero, the docker libray will 
> retry `inspect`, then we'll get a message kile: 
> `I0228 17:28:13.275115  3248 docker.cpp:1369] Retrying inspect with 
> non-zero status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-210b988c-c808-47e5-af65-75f40269755b', interval: 500ms`
> 
> But if the docker library returns a failure itself due to some severe bug 
> (failed to create subprocess, failed to read stdout, etc...), then IMO we 
> should stop retrying `inspect`:
> 
> ```
>[](const Future& future)
>   -> Future {
>   if (future.isReady()) {
> return Break(future.get());
>   }
>   if (future.isFailed()) {
> return Failure(future.failure());
>   }
>   return Continue();
> });
> ```

SGTM!


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65759/#review198382
---


On Feb. 22, 2018, 8:32 p.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65759/
> ---
> 
> (Updated Feb. 22, 2018, 8:32 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
> Michael Park.
> 
> 
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds retries for `inspect` command to workaround docker
> daemon hangs. We assume that the docker daemon can be temporarily
> unresponsive. If it's unresponsive, then any started docker cli
> command hangs. To address the issue, we retry `inspect` in the loop.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65759/diff/4/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> Manually, described in /r/65713
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 65759: Added inspect retries to the Docker executor.

2018-02-28 Thread Andrei Budnik


> On Feb. 28, 2018, 10:04 a.m., Greg Mann wrote:
> > src/docker/executor.cpp
> > Lines 242-247 (patched)
> > 
> >
> > Doesn't this render the `onFailed` callback registered on L357 useless? 
> > i.e., the `inspect` future will never transition to the failed state?
> > 
> > If so, we should probably either remove the `onFailed` callback, or do 
> > this instead:
> > ```
> > if (!future.hasDiscard()) {
> >   return Break(future);
> > }
> > ```
> > 
> > I guess the question is: if the inspect call actually fails, rather 
> > than hanging, do we want to retry? It looks like there are several cases in 
> > `Docker::inspect` which will result in a failure (failed to create 
> > subprocess, failed to read stdout, etc..), and it looks to me like we could 
> > probably just retry in those cases. WDYT?

Good point! If a docker daemon returns non-zero, the docker libray will retry 
`inspect`, then we'll get a message kile: 
`I0228 17:28:13.275115  3248 docker.cpp:1369] Retrying inspect with non-zero 
status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
mesos-210b988c-c808-47e5-af65-75f40269755b', interval: 500ms`

But if the docker library returns a failure itself due to some severe bug 
(failed to create subprocess, failed to read stdout, etc...), then IMO we 
should stop retrying `inspect`:

```
   [](const Future& future)
  -> Future {
  if (future.isReady()) {
return Break(future.get());
  }
  if (future.isFailed()) {
return Failure(future.failure());
  }
  return Continue();
});
```


- Andrei


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65759/#review198382
---


On Feb. 22, 2018, 8:32 p.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65759/
> ---
> 
> (Updated Feb. 22, 2018, 8:32 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
> Michael Park.
> 
> 
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds retries for `inspect` command to workaround docker
> daemon hangs. We assume that the docker daemon can be temporarily
> unresponsive. If it's unresponsive, then any started docker cli
> command hangs. To address the issue, we retry `inspect` in the loop.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65759/diff/4/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> Manually, described in /r/65713
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Review Request 65838: Updated validation of 'RAW' and 'BLOCK' disk resources.

2018-02-28 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65838/
---

Review request for mesos, Chun-Hung Hsiao, Jie Yu, and Jan Schlicht.


Repository: mesos


Description
---

Updated validation of 'RAW' and 'BLOCK' disk resources.


Diffs
-

  src/common/resources.cpp a0c29246b3e3b377aabd0eb00ba16aa6bbd804e3 


Diff: https://reviews.apache.org/r/65838/diff/1/


Testing
---

`make check`
`sudo make check`


Thanks,

Benjamin Bannier



Re: Review Request 65758: Fix flakyness in MasterTests.RegistryUpdateAfterReconfiguration.

2018-02-28 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65758/#review198390
---


Ship it!




Ship It!

- Alexander Rukletsov


On Feb. 22, 2018, 4:25 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65758/
> ---
> 
> (Updated Feb. 22, 2018, 4:25 p.m.)
> 
> 
> Review request for mesos and Alexander Rukletsov.
> 
> 
> Bugs: MESOS-8336
> https://issues.apache.org/jira/browse/MESOS-8336
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> It could happen that an agent re-sends its `RegisterSlaveMessage`
> and with the correct timing, the master would assume that it
> comes from a new agent with the same pid and store its information
> in the registry. See MESOS-8336.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 3705fa7cc3c81a77dc431d5337a7d4a061fa84b9 
> 
> 
> Diff: https://reviews.apache.org/r/65758/diff/1/
> 
> 
> Testing
> ---
> 
> `./mesos-tests --gtest_filter="*AfterReconfiguration*" --gtest_repeat=1000 
> --gtest_throw_on_failure`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 65836: WIP: Random source in mesos.

2018-02-28 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65836/#review198389
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65834', '65835', '65836']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65836

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65836/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (136 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1280 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (53 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (46 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (101 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2456 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2486 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2587 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2618 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (495952 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65836/logs/mesos-tests-stderr.log):

```
I0228 15:01:19.916844 10156 master.cpp:10258] Updating the state of task 
de8ea35e-d93e-4364-b5fd-5cc477c26c35 of framework 
f36cdc6f-32a8-4df9-b422-92c0b309b053- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0228 15:01:19.916844  5552 slave.cpp:3879] Shutting down framework 
f36cdc6f-32a8-4df9-b422-92c0b309b053-
I0228 15:01:19.917840  5552 slave.cpp:6586] Shutting down executor 
'de8ea35e-d93e-4364-b5fd-5cc477c26c35' of framework 
f36cdc6f-32a8-4df9-b422-92c0b309b053- at executor(1)@10.3.1.5:58210
I0228 15:01:19.919838  5552 slave.cpp:922] Agent terminating
W0228 15:01:19.919838  5552 slave.cpp:3875] Ignoring shutdown framework 
f36cdc6f-32a8-4df9-b422-92c0b309b053- because it is terminating
I0228 15:01:19.920837 10156 master.cpp:10357] Removing task 
de8ea35e-d93e-4364-b5fd-5cc477c26c35 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework f36cdc6f-32a8-4df9-b422-92c0b309b053- on 
agent f36cdc6f-32a8-4df9-b422-92c0b309b053-S0 at slave(398)@10.3.1.5:58189 
(build-srv-04.zq4gsI0228 15:01:19.205835  8060 exec.cpp:162] Version: 1.6.0
I0228 15:01:19.234833  4996 exec.cpp:236] Executor registered on agent 
f36cdc6f-32a8-4df9-b422-92c0b309b053-S0
I0228 15:01:19.238837  9160 executor.cpp:176] Received SUBSCRIBED event
I0228 15:01:19.243836  9160 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0228 15:01:19.244840  9160 executor.cpp:176] Received LAUNCH event
I0228 15:01:19.249845  9160 executor.cpp:648] Starting task 
de8ea35e-d93e-4364-b5fd-5cc477c26c35
I0228 15:01:19.337836  9160 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0228 15:01:19.883836  9160 executor.cpp:661] Forked command at 7392
I0228 15:01:19.920837  5948 exec.cpp:445] Executor asked to shutdown
I0228 15:01:19.921836  9160 executor.cpp:176] Received SHUTDOWN event
I0228 15:01:19.921836  9160 executor.cpp:758] Shutting down
I0228 15:01:19.921836  9160 executor.cpp:868] Sending SIGTERM to process tree 
at pid 731qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0228 15:01:19.922837  4452 containerizer.cpp:2338] Destroying container 
4044b451-15f2-4985-af2c-aecce0b851b9 in RUNNING state
I0228 15:01:19.922837  4452 containerizer.cpp:2952] Transitioning the state of 
container 4044b451-15f2-4985-af2c-aecce0b851b9 from RUNNING to DESTROYING
I0228 15:01:19.923840  4452 launcher.cpp:156] Asked to destroy container 
4044b451-15f2-4985-af2c-aecce0b851b9
I0228 15:01:19.924841 10156 master.cpp:1306] Agent 
f36cdc6f-32a8-4df9-b422-92c0b309b053-S0 at slave(398)@10.3.1.5:58189 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0228 15:01:19.924841 10156 master.cpp:3276] 

Re: Review Request 65836: WIP: Random source in mesos.

2018-02-28 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65836/#review198388
---



Bad review!

Reviews applied: []

Error:
No reviewers specified. Please find a reviewer by asking on JIRA or the mailing 
list.

- Mesos Reviewbot


On Feb. 28, 2018, 1:56 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65836/
> ---
> 
> (Updated Feb. 28, 2018, 1:56 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> WIP: Random source in mesos.
> 
> 
> Diffs
> -
> 
>   src/common/random.hpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65836/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 65833: Displayed resource provider resources in GET_AGENTS response.

2018-02-28 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65833/#review198387
---



PASS: Mesos patch 65833 was successfully built and tested.

Reviews applied: `['65587', '65588', '65589', '65590', '65591', '65832', 
'65833']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65833

- Mesos Reviewbot Windows


On Feb. 28, 2018, 1:55 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65833/
> ---
> 
> (Updated Feb. 28, 2018, 1:55 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8388
> https://issues.apache.org/jira/browse/MESOS-8388
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Displayed resource provider resources in GET_AGENTS response.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto f40caa2a6338ff58a325444cfb72cd24f2c99472 
>   include/mesos/v1/master/master.proto 
> 67c9560eaa7bd6b20b73af07f7666f0d98c4453d 
>   src/common/protobuf_utils.cpp d2ada356be4c26290692bc99e7ec5121039bda4e 
>   src/tests/api_tests.cpp 9c172f751b11b3b50f23253d212c2eb603da9c2c 
> 
> 
> Diff: https://reviews.apache.org/r/65833/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Review Request 65834: WIP: Random source in libprocess.

2018-02-28 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65834/
---

Review request for mesos.


Repository: mesos


Description
---

WIP: Random source in libprocess.


Diffs
-

  3rdparty/libprocess/include/process/random.hpp PRE-CREATION 
  3rdparty/libprocess/src/random.cpp PRE-CREATION 


Diff: https://reviews.apache.org/r/65834/diff/1/


Testing
---


Thanks,

Benno Evers



Review Request 65835: WIP: Random source in stout.

2018-02-28 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65835/
---

Review request for mesos.


Repository: mesos


Description
---

WIP: Random source in stout.


Diffs
-

  3rdparty/stout/include/stout/random.hpp PRE-CREATION 


Diff: https://reviews.apache.org/r/65835/diff/1/


Testing
---


Thanks,

Benno Evers



Review Request 65836: WIP: Random source in mesos.

2018-02-28 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65836/
---

Review request for mesos.


Repository: mesos


Description
---

WIP: Random source in mesos.


Diffs
-

  src/common/random.hpp PRE-CREATION 


Diff: https://reviews.apache.org/r/65836/diff/1/


Testing
---


Thanks,

Benno Evers



Re: Review Request 65683: Updated discard handling in 'Docker::inspect()'.

2018-02-28 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65683/#review198384
---


Ship it!




Ship It!

- Andrei Budnik


On Feb. 28, 2018, 8:49 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65683/
> ---
> 
> (Updated Feb. 28, 2018, 8:49 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Gilbert Song, Michael Park, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-8575
> https://issues.apache.org/jira/browse/MESOS-8575
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated discard handling in 'Docker::inspect()'.
> 
> 
> Diffs
> -
> 
>   src/docker/docker.hpp 0116ca8f28303905d80cd6b3a33a7a7feacc8f8a 
>   src/docker/docker.cpp f7a4466292fcedd7e97740698475da2c616ca1d0 
> 
> 
> Diff: https://reviews.apache.org/r/65683/diff/8/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Review Request 65832: Displayed resource provider resources in GET_RESOURCE_PROVIDER response.

2018-02-28 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65832/
---

Review request for mesos, Jie Yu and Jan Schlicht.


Bugs: MESOS-8388
https://issues.apache.org/jira/browse/MESOS-8388


Repository: mesos


Description
---

Displayed resource provider resources in GET_RESOURCE_PROVIDER response.


Diffs
-

  include/mesos/agent/agent.proto 7d92cb8e3dc533081d973e488bd140c7d5ea2bbf 
  include/mesos/v1/agent/agent.proto 59a9fd69df47c3605662529b5493cd3bf18c8397 
  src/slave/http.cpp 7d7fa2b4ec2e1f8f65c5264ce72590d0d8195b9b 
  src/tests/api_tests.cpp 9c172f751b11b3b50f23253d212c2eb603da9c2c 


Diff: https://reviews.apache.org/r/65832/diff/1/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Review Request 65833: Displayed resource provider resources in GET_AGENTS response.

2018-02-28 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65833/
---

Review request for mesos, Jie Yu and Jan Schlicht.


Bugs: MESOS-8388
https://issues.apache.org/jira/browse/MESOS-8388


Repository: mesos


Description
---

Displayed resource provider resources in GET_AGENTS response.


Diffs
-

  include/mesos/master/master.proto f40caa2a6338ff58a325444cfb72cd24f2c99472 
  include/mesos/v1/master/master.proto 67c9560eaa7bd6b20b73af07f7666f0d98c4453d 
  src/common/protobuf_utils.cpp d2ada356be4c26290692bc99e7ec5121039bda4e 
  src/tests/api_tests.cpp 9c172f751b11b3b50f23253d212c2eb603da9c2c 


Diff: https://reviews.apache.org/r/65833/diff/1/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 65759: Added inspect retries to the Docker executor.

2018-02-28 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65759/#review198382
---




src/docker/executor.cpp
Lines 242-247 (patched)


Doesn't this render the `onFailed` callback registered on L357 useless? 
i.e., the `inspect` future will never transition to the failed state?

If so, we should probably either remove the `onFailed` callback, or do this 
instead:
```
if (!future.hasDiscard()) {
  return Break(future);
}
```

I guess the question is: if the inspect call actually fails, rather than 
hanging, do we want to retry? It looks like there are several cases in 
`Docker::inspect` which will result in a failure (failed to create subprocess, 
failed to read stdout, etc..), and it looks to me like we could probably just 
retry in those cases. WDYT?



src/docker/executor.cpp
Lines 243 (patched)


Nit: indent two more spaces, to avoid identical indentation to the 
following line.


- Greg Mann


On Feb. 22, 2018, 8:32 p.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65759/
> ---
> 
> (Updated Feb. 22, 2018, 8:32 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
> Michael Park.
> 
> 
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds retries for `inspect` command to workaround docker
> daemon hangs. We assume that the docker daemon can be temporarily
> unresponsive. If it's unresponsive, then any started docker cli
> command hangs. To address the issue, we retry `inspect` in the loop.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65759/diff/3/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> Manually, described in /r/65713
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 65713: Handled hanging docker `stop`, `inspect` commands in docker executor.

2018-02-28 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65713/#review198381
---




src/docker/executor.cpp
Lines 516-517 (patched)


Nit:
s/docker/Docker/g
s/cli/CLI/


- Greg Mann


On Feb. 28, 2018, 1:37 a.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65713/
> ---
> 
> (Updated Feb. 28, 2018, 1:37 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previosly, if `docker inspect` command hanged, the docker container
> ended up in an unkillable state. This patch adds a timeout for inspect
> command after receiving `killTask` analogically to `reaped` handler.
> In addition we've added a timeout for `docker stop` command. If docker
> `stop` or `inspect` command times out, we discard the related future,
> thus the docker library kills previously spawned docker cli subprocess.
> As a result, a scheduler can retry `killTask` operation to handle
> nasty docker bugs that lead to hanging docker cli.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65713/diff/4/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> Manual testing:
> 1. Build docker from sources: 
> http://oyvindsk.com/writing/docker-build-from-source
> 2. Modify `ContainerInspect` function from `docker/inspect.go`:
> ```
>  func (daemon *Daemon) ContainerInspect(name string, size bool, version 
> string) (interface{}, error) {
> +   time.Sleep(10 * time.Second)
> ```
> 3. Modify `ContainerStop` function from `docker/stop.go`:
> ```
>  func (daemon *Daemon) ContainerStop(name string, seconds *int) error {
> +   rand.Seed(time.Now().UTC().UnixNano())
> +   if rand.Intn(2) == 0 {
> +   time.Sleep(20 * time.Second)
> +   }
> ```
> 4. Rebuild docker: `sudo make build && sudo make binary`
> 5. Stop system docker daemon: `sudo service docker stop`
> 6. Start modified docker daemon: `sudo ./bundles/binary-daemon/dockerd-dev`
> 7. Modify `src/cli/execute.cpp`:
>   a) Add `delay(Seconds(15), self(), ::retryKill, task->task_id(), 
> offer.agent_id());` after 
> https://github.com/apache/mesos/blob/072ea2787ffca6f2a6dcb2d636f68c51823d6665/src/cli/execute.cpp#L606
>   b) Add a new method `retryKill` to `CommandScheduler`:
> ```
>   void retryKill(const TaskID& taskId, const AgentID& agentId)
>   {
> killTask(taskId, agentId);
> delay(Seconds(6), self(), ::retryKill, taskId, agentId);
>   }
> ```
> 8. Rebuild mesos
> 9. Run mesos master: `./bin/mesos-master.sh --work_dir='var/master-1'`
> 10. Run mesos agent: `GLOG_v=1 ./bin/mesos-agent.sh 
> --resources="cpus:1;mem:100" 
> --work_dir='/home/abudnik/mesos/build/var/agent-1' 
> --containerizers="docker,mesos" --master="127.0.1.1:5050"`
> 11. Submit a task for the docker executor: `./src/mesos-execute 
> --master="127.0.1.1:5050" --name="a" --containerizer=docker 
> --docker_image="ubuntu:xenial" --command="sleep "`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 65713: Handled hanging docker `stop`, `inspect` commands in docker executor.

2018-02-28 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65713/#review198380
---


Fix it, then Ship it!





src/docker/executor.cpp
Lines 516 (patched)


Nit:
s/a future/this future/


- Greg Mann


On Feb. 28, 2018, 1:37 a.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65713/
> ---
> 
> (Updated Feb. 28, 2018, 1:37 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previosly, if `docker inspect` command hanged, the docker container
> ended up in an unkillable state. This patch adds a timeout for inspect
> command after receiving `killTask` analogically to `reaped` handler.
> In addition we've added a timeout for `docker stop` command. If docker
> `stop` or `inspect` command times out, we discard the related future,
> thus the docker library kills previously spawned docker cli subprocess.
> As a result, a scheduler can retry `killTask` operation to handle
> nasty docker bugs that lead to hanging docker cli.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65713/diff/4/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> Manual testing:
> 1. Build docker from sources: 
> http://oyvindsk.com/writing/docker-build-from-source
> 2. Modify `ContainerInspect` function from `docker/inspect.go`:
> ```
>  func (daemon *Daemon) ContainerInspect(name string, size bool, version 
> string) (interface{}, error) {
> +   time.Sleep(10 * time.Second)
> ```
> 3. Modify `ContainerStop` function from `docker/stop.go`:
> ```
>  func (daemon *Daemon) ContainerStop(name string, seconds *int) error {
> +   rand.Seed(time.Now().UTC().UnixNano())
> +   if rand.Intn(2) == 0 {
> +   time.Sleep(20 * time.Second)
> +   }
> ```
> 4. Rebuild docker: `sudo make build && sudo make binary`
> 5. Stop system docker daemon: `sudo service docker stop`
> 6. Start modified docker daemon: `sudo ./bundles/binary-daemon/dockerd-dev`
> 7. Modify `src/cli/execute.cpp`:
>   a) Add `delay(Seconds(15), self(), ::retryKill, task->task_id(), 
> offer.agent_id());` after 
> https://github.com/apache/mesos/blob/072ea2787ffca6f2a6dcb2d636f68c51823d6665/src/cli/execute.cpp#L606
>   b) Add a new method `retryKill` to `CommandScheduler`:
> ```
>   void retryKill(const TaskID& taskId, const AgentID& agentId)
>   {
> killTask(taskId, agentId);
> delay(Seconds(6), self(), ::retryKill, taskId, agentId);
>   }
> ```
> 8. Rebuild mesos
> 9. Run mesos master: `./bin/mesos-master.sh --work_dir='var/master-1'`
> 10. Run mesos agent: `GLOG_v=1 ./bin/mesos-agent.sh 
> --resources="cpus:1;mem:100" 
> --work_dir='/home/abudnik/mesos/build/var/agent-1' 
> --containerizers="docker,mesos" --master="127.0.1.1:5050"`
> 11. Submit a task for the docker executor: `./src/mesos-execute 
> --master="127.0.1.1:5050" --name="a" --containerizer=docker 
> --docker_image="ubuntu:xenial" --command="sleep "`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 65683: Updated discard handling in 'Docker::inspect()'.

2018-02-28 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65683/
---

(Updated Feb. 28, 2018, 8:49 a.m.)


Review request for mesos, Andrei Budnik, Gilbert Song, Michael Park, and Vinod 
Kone.


Changes
---

Addressed Andrei's comment.


Bugs: MESOS-8575
https://issues.apache.org/jira/browse/MESOS-8575


Repository: mesos


Description
---

Updated discard handling in 'Docker::inspect()'.


Diffs (updated)
-

  src/docker/docker.hpp 0116ca8f28303905d80cd6b3a33a7a7feacc8f8a 
  src/docker/docker.cpp f7a4466292fcedd7e97740698475da2c616ca1d0 


Diff: https://reviews.apache.org/r/65683/diff/8/

Changes: https://reviews.apache.org/r/65683/diff/7-8/


Testing
---


Thanks,

Greg Mann