Re: Review Request 65875: Improved logging for offers and inverse offers.

2018-03-01 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65875/#review198512
---



Patch looks great!

Reviews applied: [65875]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On March 1, 2018, 9:39 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65875/
> ---
> 
> (Updated March 1, 2018, 9:39 p.m.)
> 
> 
> Review request for mesos and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Printed offer IDs and inverse offer IDs when sending out offers and
> inverse offers so it is easier to match them to their ACCEPT or DECLINE
> calls and removals.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp f2c2dbe6a4568027f11ad14121c2f9a1d1a43f80 
> 
> 
> Diff: https://reviews.apache.org/r/65875/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 65876: Enabled `--fetch_stall_timeout` in curl-based URI fetcher plugins.

2018-03-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65876/#review198511
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65855', '65856', '65876']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65876

Relevant logs:

- 
[mesos-tests-cmake-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65876/logs/mesos-tests-cmake-stdout.log):

```
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(392): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(414): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(415): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(437): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(438): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(460): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(461): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(500): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(501): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 
D:\DCOS\mesos\mesos\src\java\jni\org_apache_mesos_state_Variable.cpp(44): 
warning C4267: 'argument': conversion from 'size_t' to 'jsize', possible loss 
of data [D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 
D:\DCOS\mesos\mesos\src\java\jni\org_apache_mesos_state_Variable.cpp(45): 
warning C4267: 'argument': conversion from 'size_t' to 'jsize', possible loss 
of data [D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\jvm\jvm.cpp(95): warning C4267: '=': 
conversion from 'size_t' to 'jint', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\jvm\jvm.cpp(463): warning C4244: 
'initializing': conversion from 'jchar' to 'char', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\jvm\jvm.cpp(493): warning C4244: 
'initializing': conversion from 'jlong' to 'long', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\jvm\jvm.cpp(654): warning C4244: 
'initializing': conversion from 'jchar' to 'char', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\jvm\jvm.cpp(693): warning C4244: 
'initializing': conversion from 'jlong' to 'long', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\jvm\jvm.cpp(770): warning C4244: 
'initializing': conversion from 'jchar' to 'char', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\jvm\jvm.cpp(809): warning C4244: 
'initializing': conversion from 'jlong' to 'long', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]


   "D:\DCOS\mesos\src\tests\mesos-tests.vcxproj" (default target) (1) ->
   "D:\DCOS\mesos\src\mesos.vcxproj" (default target) (16) ->
   (ClCompile target) -> 
 
D:\DCOS\mesos\mesos\src\slave\containerizer\mesos\provisioner\docker\store.cpp(146):
 error C2039: 'docker_stall_timeout': is not a member of 
'mesos::uri::fetcher::Flags' [D:\DCOS\mesos\src\mesos.vcxproj]

214 Warning(s)
1 Error(s)

Time Elapsed 00:17:09.74
```

- Mesos Reviewbot Windows


On March 2, 2018, 5:42 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65876/
> ---
> 
> (Updated March 2, 2018, 5:42 a.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8620
> 

Re: Review Request 65875: Improved logging for offers and inverse offers.

2018-03-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65875/#review198509
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65875']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65875

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65875/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (128 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1202 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (40 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (45 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (87 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2526 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2552 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2476 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2500 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (478865 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65875/logs/mesos-tests-stderr.log):

```
I0302 06:41:13.10   876 master.cpp:10267] Updating the state of task 
54924781-ce1a-4905-bfcf-d16dbb73e9b9 of framework 
fa008a6a-5d12-42e4-8909-167ec4823ffd- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0302 06:41:13.10  9388 slave.cpp:3879] Shutting down framework 
fa008a6a-5d12-42e4-8909-167ec4823ffd-
I0302 06:41:13.10  9388 slave.cpp:6586] Shutting down executor 
'54924781-ce1a-4905-bfcf-d16dbb73e9b9' of framework 
fa008a6a-5d12-42e4-8909-167ec4823ffd-000I0302 06:41:12.380908  3844 
exec.cpp:162] Version: 1.6.0
I0302 06:41:12.408885 10992 exec.cpp:236] Executor registered on agent 
fa008a6a-5d12-42e4-8909-167ec4823ffd-S0
I0302 06:41:12.412886  6928 executor.cpp:176] Received SUBSCRIBED event
I0302 06:41:12.418277  6928 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0302 06:41:12.418884  6928 executor.cpp:176] Received LAUNCH event
I0302 06:41:12.422883  6928 executor.cpp:648] Starting task 
54924781-ce1a-4905-bfcf-d16dbb73e9b9
I0302 06:41:12.520910  6928 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0302 06:41:13.073887  6928 executor.cpp:661] Forked command at 11132
I0302 06:41:13.111887  6120 exec.cpp:445] Executor asked to shutdown
I0302 06:41:13.112887  6928 executor.cpp:176] Received SHUTDOWN event
I0302 06:41:13.113890  6928 executor.cpp:758] Shutting down
I0302 06:41:13.113890  6928 executor.cpp:868] Sending SIGTERM to process tree 
at pid 0 at executor(1)@10.3.1.5:59118
I0302 06:41:13.110888  9388 slave.cpp:922] Agent terminating
W0302 06:41:13.111887  9388 slave.cpp:3875] Ignoring shutdown framework 
fa008a6a-5d12-42e4-8909-167ec4823ffd- because it is terminating
I0302 06:41:13.111887   876 master.cpp:10366] Removing task 
54924781-ce1a-4905-bfcf-d16dbb73e9b9 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework fa008a6a-5d12-42e4-8909-167ec4823ffd- on 
agent fa008a6a-5d12-42e4-8909-167ec4823ffd-S0 at slave(398)@10.3.1.5:59096 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0302 06:41:13.113890   876 master.cpp:1306] Agent 
fa008a6a-5d12-42e4-8909-167ec4823ffd-S0 at slave(398)@10.3.1.5:59096 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0302 06:41:13.113890   876 master.cpp:3276] Disconnecting agent 
fa008a6a-5d12-42e4-8909-167ec4823ffd-S0 at slave(398)@10.3.1.5:59096 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0302 06:41:13.114889  1604 containerizer.cpp:2338] Destroying container 
a397f167-76f1-494f-942b-c9646b7fc3b1 in RUNNING state
I0302 06:41:13.114889  1604 containerizer.cpp:2952] Transitioning the state of 
container a397f167-76f1-494f-942b-c9646b7fc3b1 

Review Request 65876: Enabled `--fetch_stall_timeout` in curl-based URI fetcher plugins.

2018-03-01 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65876/
---

Review request for mesos and Gilbert Song.


Bugs: MESOS-8620
https://issues.apache.org/jira/browse/MESOS-8620


Repository: mesos


Description
---

This patch passes the `--fetch_stall_timeout` agent flag into
`DockerFetcherPlugin` (through setting flag `docker_stall_timeout` in
the Docker store) and `CurlFetcherPlugin` (through setting flag
`curl_stall_timeout` in the Appc store).


Diffs
-

  src/slave/containerizer/mesos/provisioner/appc/store.cpp 
c1f9661906eae019e43e1c3df7c28cccad3a6bb0 
  src/slave/containerizer/mesos/provisioner/docker/store.cpp 
c308bb04773fa81265d91a1247b3db3434b5746b 
  src/uri/fetchers/curl.hpp 909c2eb450630912c13831c3aa180fbccc8e9e4b 
  src/uri/fetchers/curl.cpp 43584aaa424dc57f00880d5c89cdb73f23685800 
  src/uri/fetchers/docker.hpp 6a89c0a69dd6f454cc22d025644900416e55ad50 
  src/uri/fetchers/docker.cpp 21f8eb15903d505ba1cabe1f77a8f5d7043964d5 


Diff: https://reviews.apache.org/r/65876/diff/1/


Testing
---

sudo make check


Thanks,

Chun-Hung Hsiao



Review Request 65875: Improved logging for offers and inverse offers.

2018-03-01 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65875/
---

Review request for mesos and Greg Mann.


Repository: mesos


Description
---

Printed offer IDs and inverse offer IDs when sending out offers and
inverse offers so it is easier to match them to their ACCEPT or DECLINE
calls and removals.


Diffs
-

  src/master/master.cpp f2c2dbe6a4568027f11ad14121c2f9a1d1a43f80 


Diff: https://reviews.apache.org/r/65875/diff/1/


Testing
---

sudo make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 65482: Improved handling of non-terminal operations after master failover.

2018-03-01 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65482/#review198470
---


Fix it, then Ship it!





src/master/master.cpp
Lines 7642 (patched)


s/ways/ways of/


- Greg Mann


On Feb. 16, 2018, 2:12 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65482/
> ---
> 
> (Updated Feb. 16, 2018, 2:12 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8536
> https://issues.apache.org/jira/browse/MESOS-8536
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixes the handling of non-terminal operations learned by a
> newly elected master after a master failover, so that only these
> operations are counted as using resources. Previously we did not count
> any operations as using resources which by accident produced expected
> behavior if the operation was already terminal when the master learned
> about them.
> 
> We do not address the issue of being unable to properly account for
> operations triggered by frameworks unknown to the master, see
> MESOS-8582. Instead we emit a warning for now since the master might
> continue to abort due to assertion failures due to incomplete resource
> accounting.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp b06d7a6e2fbbb81b97eaf537d5b6745c73dc867d 
> 
> 
> Diff: https://reviews.apache.org/r/65482/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`, also tested with a version of the test added in r/65045 which 
> triggered this issue.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 59989: Added a test `ProtobufTest.JsonifyMap`.

2018-03-01 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59989/#review198502
---


Fix it, then Ship it!





3rdparty/stout/tests/protobuf_tests.cpp
Lines 637 (patched)


Extra new line here.



3rdparty/stout/tests/protobuf_tests.cpp
Lines 675-676 (patched)


Do you want to use a raw string literal here to avoid all the escaping? (6) 
from here: http://en.cppreference.com/w/cpp/language/string_literal


- Benjamin Mahler


On Feb. 28, 2018, 11:28 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59989/
> ---
> 
> (Updated Feb. 28, 2018, 11:28 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Chun-Hung Hsiao, 
> and Zhitao Li.
> 
> 
> Bugs: MESOS-7656
> https://issues.apache.org/jira/browse/MESOS-7656
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a test `ProtobufTest.JsonifyMap`.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/tests/protobuf_tests.cpp 
> be35ad0d1e16501df67752a1818f79751419a43d 
> 
> 
> Diff: https://reviews.apache.org/r/59989/diff/4/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 59988: Added a new protobuf message `MapMessage` for protobuf tests.

2018-03-01 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59988/#review198501
---


Ship it!




Ship It!

- Benjamin Mahler


On Feb. 28, 2018, 11:28 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59988/
> ---
> 
> (Updated Feb. 28, 2018, 11:28 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Chun-Hung Hsiao, 
> and Zhitao Li.
> 
> 
> Bugs: MESOS-7656
> https://issues.apache.org/jira/browse/MESOS-7656
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a new protobuf message `MapMessage` for protobuf tests.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/tests/protobuf_tests.proto 
> cf8aadc7c003b3165421edb5ff2e71b6abf526f4 
> 
> 
> Diff: https://reviews.apache.org/r/59988/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65844: Added `upgradeResource` to some resource parsing functions.

2018-03-01 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65844/#review198500
---


Ship it!




This looks like a test was missing? Can you add a test that would have caught 
the issue?

- Benjamin Mahler


On Feb. 28, 2018, 9 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65844/
> ---
> 
> (Updated Feb. 28, 2018, 9 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Michael Park.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently only `Resources::parse()` calls `upgradeResource()`,
> but the actual parsing functions `Resources::fromJSON()` and
> `Resources::fromSimpleString()` which are also public utilities
> do not, this may produce invalid resource objects.
> 
> This patch moves the `upgradeResource()` to the two internal
> parsing functions, so that all resource objects parsed are
> valid.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp a0c29246b3e3b377aabd0eb00ba16aa6bbd804e3 
> 
> 
> Diff: https://reviews.apache.org/r/65844/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65845: Fixed broken test `SlavesEndpointFullResources`.

2018-03-01 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65845/#review198497
---


Ship it!





src/tests/persistent_volume_endpoints_tests.cpp
Line 2352 (original), 2352-2356 (patched)


Let's use -> instead of .get(). and CHECK_NOTERROR where we assume the get 
will succeed:

```
  EXPECT_EQ(
  Resources(CHECK_NOTERROR(
  Resources::fromJSON(expectedUnreserved->as(,
  Resources(CHECK_NOTERROR(
  Resources::fromJSON(unreservedValue.as();
```


- Benjamin Mahler


On Feb. 28, 2018, 9 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65845/
> ---
> 
> (Updated Feb. 28, 2018, 9 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Michael Park.
> 
> 
> Bugs: MESOS-8456
> https://issues.apache.org/jira/browse/MESOS-8456
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The test relies on the order of the JSON array output and is broken
> after the recent allocator changes in #65821.
> This patch fixes this test by parsing the JSON array to resources,
> so that the comparison is order agnostic.
> 
> 
> Diffs
> -
> 
>   src/tests/persistent_volume_endpoints_tests.cpp 
> 94b8caac0c298d0e660b9bfe523b02a919f81594 
> 
> 
> Diff: https://reviews.apache.org/r/65845/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65872: Windows: Fixed location of Docker's `config.json` file.

2018-03-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65872/#review198496
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65872']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65872

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65872/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (131 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1244 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (39 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (43 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (85 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2466 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2491 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2481 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2506 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (485182 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65872/logs/mesos-tests-stderr.log):

```
I0302 00:59:34.753571   112 master.cpp:10258] Updating the state of task 
05035c88-5e1e-4af4-8965-51d807731202 of framework 
94cf145a-4799-44b1-813a-5296c194619b- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0302 00:59:34.753571  7916 slave.cpp:3879] Shutting down framework 
94cf145a-4799-44b1-813a-5296c194619b-
I0302 00:59:34.754567  7916 slave.cpp:6586] Shutting down executor 
'05035c88-5e1e-4af4-8965-51d807731202' of framework 
94cf145a-4799-44b1-813a-5296c194619b- at executor(1)@10.3.1.5:54299
I0302 00:59:34.755568  7916 slave.cpp:922] Agent terminating
W0302 00:59:34.755568  7916 slave.cpp:3875] Ignoring shutdown framework 
94cf145a-4799-44b1-813a-5296c194619b- because it is terminating
I0302 00:59:34.756568   112 master.cpp:10357] Removing task 
05035c88-5e1e-4af4-8965-51d807731202 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; dI0302 00:59:34.039572  6872 exec.cpp:162] Version: 
1.6.0
I0302 00:59:34.075577  3844 exec.cpp:236] Executor registered on agent 
94cf145a-4799-44b1-813a-5296c194619b-S0
I0302 00:59:34.080571  8968 executor.cpp:176] Received SUBSCRIBED event
I0302 00:59:34.085582  8968 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0302 00:59:34.085582  8968 executor.cpp:176] Received LAUNCH event
I0302 00:59:34.090574  8968 executor.cpp:648] Starting task 
05035c88-5e1e-4af4-8965-51d807731202
I0302 00:59:34.171572  8968 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0302 00:59:34.716565  8968 executor.cpp:661] Forked command at 7412
I0302 00:59:34.756568  7964 exec.cpp:445] Executor asked to shutdown
I0302 00:59:34.757567  8968 executor.cpp:176] Received SHUTDOWN event
I0302 00:59:34.757567  8968 executor.cpp:758] Shutting down
I0302 00:59:34.757567  8968 executor.cpp:868] Sending SIGTERM to process tree 
at pid 7isk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework 
94cf145a-4799-44b1-813a-5296c194619b- on agent 
94cf145a-4799-44b1-813a-5296c194619b-S0 at slave(398)@10.3.1.5:54278 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0302 00:59:34.759573 10336 containerizer.cpp:2338] Destroying container 
ca3b3a86-6677-413c-993d-7dc72373c2f4 in RUNNING state
I0302 00:59:34.759573 10336 containerizer.cpp:2952] Transitioning the state of 
container ca3b3a86-6677-413c-993d-7dc72373c2f4 from RUNNING to DESTROYING
I0302 00:59:34.760567   112 master.cpp:1306] Agent 
94cf145a-4799-44b1-813a-5296c194619b-S0 at slave(398)@10.3.1.5:54278 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0302 00:59:34.760567   112 master.cpp:3276] Disconnecting agent 
94cf145a-4799-44b1-813a-5296c194619b-S0 at slave(398)@10.3.1.5:54278 

Re: Review Request 63862: Windows: Ported docker_tests.cpp.

2018-03-01 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63862/#review198494
---



Patch looks great!

Reviews applied: [65839, 65861, 65862, 65840, 63862]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Jan. 5, 2018, 6:33 p.m., Akash Gupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63862/
> ---
> 
> (Updated Jan. 5, 2018, 6:33 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Andrew Schwartzmeyer, Gaston 
> Kleiman, Jie Yu, John Kordich, Joseph Wu, and Michael Park.
> 
> 
> Bugs: MESOS-7342
> https://issues.apache.org/jira/browse/MESOS-7342
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Ported the disabled tests to run on Windows. The following tests
> could not be ported due to Windows platform limitations and remain
> diabled:
>   - ROOT_DOCKER_MountRelativeContainerPath (can't mount volumes inside
> sandbox).
>   - ROOT_DOCKER_NVIDIA_GPU_DeviceAllow (no GPU container support).
>   - ROOT_DOCKER_NVIDIA_GPU_InspectDevices (no GPU container support).
> 
> 
> Diffs
> -
> 
>   src/tests/containerizer/docker_common.hpp 
> 3d69e54647d34ba0f33930d84db6c052e46ebc89 
>   src/tests/containerizer/docker_tests.cpp 
> 0ac4a79e03d5e11ead5a57a9749e26c20a1ddd57 
> 
> 
> Diff: https://reviews.apache.org/r/63862/diff/7/
> 
> 
> Testing
> ---
> 
> Windows mesos-test:
> [==] 754 tests from 77 test cases ran. (270586 ms total)
> [  PASSED  ] 754 tests.
> 
> 
> Windows DockerTest only:
> [==] 11 tests from 1 test case ran. (85617 ms total)
> [  PASSED  ] 11 tests.
> 
> Linux DockerTest (only 12 tests instead of 14, because I don't have Nvidia 
> GPU):
> [==] 12 tests from 1 test case ran. (12413 ms total)
> [  PASSED  ] 12 tests.
> 
> 
> Thanks,
> 
> Akash Gupta
> 
>



Re: Review Request 65679: Removed direct unmock calls and added missing mock call expectations.

2018-03-01 Thread Meng Zhu


> On March 1, 2018, 11:19 a.m., Greg Mann wrote:
> > src/tests/slave_tests.cpp
> > Lines 5455 (patched)
> > 
> >
> > Is this necessary? Perhaps we could eliminate the `Future 
> > failure;`?
> 
> Meng Zhu wrote:
> We either need to wait explicitly or set up the expectation as 
> `atMost(1)`. I would prefer the explicit wait as it is unambiguous.
> 
> Greg Mann wrote:
> I think I prefer `AtMost(1)`, for two reasons:
> 1) It makes it clear to the reader that satisfaction of that expectation 
> is not essential to the test.
> 2) It's a pattern that we use elsewhere in similar situations.
> 
> I don't think I've seen other situations where we AWAIT on a future just 
> to make sure the expectation is satisfied, rather than using `AtMost(1)`, but 
> perhaps I'm wrong?

OK, sounds good.


- Meng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65679/#review198462
---


On Feb. 15, 2018, 5:22 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65679/
> ---
> 
> (Updated Feb. 15, 2018, 5:22 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8611
> https://issues.apache.org/jira/browse/MESOS-8611
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Directly invoking unmock calls in the test process can potentially
> cause races with the real mock slave process. It is more robust to
> dispatch the unmock calls to the real mock slave process.
> 
> Also added several mock expectations to avoid "uninteresting mock
> call" test warnings.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_tests.cpp e253317a67019302f18afe11e2a314e716cec226 
> 
> 
> Diff: https://reviews.apache.org/r/65679/diff/4/
> 
> 
> Testing
> ---
> 
> `./bin/mesos-tests.sh --gtest_filter=*SlaveTest* --gtest_repeat=-1 
> --gtest_break_on_failure` runs forever :)
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65346: Updated Web UI to show quota guarantee and limit for roles.

2018-03-01 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65346/#review198491
---


Ship it!




Looks great, thank you!

- Benjamin Mahler


On March 1, 2018, 10:50 p.m., Armand Grillet wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65346/
> ---
> 
> (Updated March 1, 2018, 10:50 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Kapil Arya, Michael Park, Meng 
> Zhu, and Till Toenshoff.
> 
> 
> Bugs: MESOS-8486
> https://issues.apache.org/jira/browse/MESOS-8486
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is no limit field yet thus the limit displayed is currently
> the guarantee. We will access this field once it will be exposed.
> 
> 
> Diffs
> -
> 
>   src/webui/master/static/css/mesos.css 
> 3e3a3b13cfc4d63aea73d0a6572b22a2de94541e 
>   src/webui/master/static/index.html 46cb2843c2221ccebb7d811c0045be2c124afec1 
>   src/webui/master/static/js/app.js 8960f9d4421f7b96ba06deb623afafb06013b622 
>   src/webui/master/static/roles.html 1c84ad33a554e078d854e25b4d5ca311c8507f91 
> 
> 
> Diff: https://reviews.apache.org/r/65346/diff/5/
> 
> 
> Testing
> ---
> 
> New UI:
> ![New UI](https://i.imgur.com/8YFZnmH.png)
> The group columns cannot be ordered as they have sub-columns, this is 
> expected.
> 
> 
> Thanks,
> 
> Armand Grillet
> 
>



Re: Review Request 65679: Removed direct unmock calls and added missing mock call expectations.

2018-03-01 Thread Greg Mann


> On March 1, 2018, 7:19 p.m., Greg Mann wrote:
> > src/tests/slave_tests.cpp
> > Lines 5455 (patched)
> > 
> >
> > Is this necessary? Perhaps we could eliminate the `Future 
> > failure;`?
> 
> Meng Zhu wrote:
> We either need to wait explicitly or set up the expectation as 
> `atMost(1)`. I would prefer the explicit wait as it is unambiguous.

I think I prefer `AtMost(1)`, for two reasons:
1) It makes it clear to the reader that satisfaction of that expectation is not 
essential to the test.
2) It's a pattern that we use elsewhere in similar situations.

I don't think I've seen other situations where we AWAIT on a future just to 
make sure the expectation is satisfied, rather than using `AtMost(1)`, but 
perhaps I'm wrong?


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65679/#review198462
---


On Feb. 16, 2018, 1:22 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65679/
> ---
> 
> (Updated Feb. 16, 2018, 1:22 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8611
> https://issues.apache.org/jira/browse/MESOS-8611
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Directly invoking unmock calls in the test process can potentially
> cause races with the real mock slave process. It is more robust to
> dispatch the unmock calls to the real mock slave process.
> 
> Also added several mock expectations to avoid "uninteresting mock
> call" test warnings.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_tests.cpp e253317a67019302f18afe11e2a314e716cec226 
> 
> 
> Diff: https://reviews.apache.org/r/65679/diff/3/
> 
> 
> Testing
> ---
> 
> `./bin/mesos-tests.sh --gtest_filter=*SlaveTest* --gtest_repeat=-1 
> --gtest_break_on_failure` runs forever :)
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65872: Windows: Fixed location of Docker's `config.json` file.

2018-03-01 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65872/#review198489
---


Ship it!




Ship It!

- Akash Gupta


On March 1, 2018, 11:57 p.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65872/
> ---
> 
> (Updated March 1, 2018, 11:57 p.m.)
> 
> 
> Review request for mesos, Akash Gupta, Jeff Coffler, and Joseph Wu.
> 
> 
> Bugs: MESOS-8619
> https://issues.apache.org/jira/browse/MESOS-8619
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Per MESOS-8619, Docker checks `$USERPROFILE/.docker/config.json`
> instead of `$HOME`. Mesos overrides this environment variable in order
> to point Docker to a `config.json` file in another location, so we
> have to fix the assumption we made about Docker.
> 
> We do not add this constant to stout, because it is not consistent
> across Windows applications. This particular logic is specific to the
> implementation of Docker. Other applications might check `$HOME` or
> `$HOMEPATH` on Windows.
> 
> 
> Diffs
> -
> 
>   src/docker/docker.cpp 876dfffc2ee68e345ff336fefa6cf908c3d2a5c4 
> 
> 
> Diff: https://reviews.apache.org/r/65872/diff/1/
> 
> 
> Testing
> ---
> 
> Deployed a `docker.zip` consisting of `.docker/config.json` in "Linux-form" 
> i.e. with base64 encoded password (not `wincred`) using the following task:
> 
> ```
> {
> "name": "fetcher-test",
> "task_id": {"value" : "fetcher"},
> "agent_id": {"value" : ""},
> "resources": [
> {
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
> "value": 1
> }
> },
> {
> "name": "mem",
> "type": "SCALAR",
> "scalar": {
> "value": 512
> }
> }
> ],
> "command": {
> "uris": [ {"value": "file://C:/Users/andschwa/docker.zip"} ],
> "shell": false
> },
> "container": {
> "type": "DOCKER",
> "docker": {"image": "andschwa/nanoserver:1709"}
> }
> }
> ```
> 
> The `andschwa/nanoserver:1709` is a _private_ repo, and `docker logout` was 
> run (and confirmed that the machine could not pull the image manually).
> 
> With this patch and the `docker.zip` URI, it was fetched, unzipped, and found 
> by `docker pull`, enabling it to successfully pull the private image.
> 
> ```
> > docker images
> REPOSITORYTAG IMAGE IDCREATED 
> SIZE
> andschwa/nanoserver   1709816017814fa22 weeks ago 
> 312MB
> ```
> 
> This is difficult to unit-test because it requires private credentials, an 
> external private docker repo, and global side effects of Docker images (i.e. 
> you can't have it cached). But I am, of course, open to ideas of how to 
> programmatically test this.
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Re: Review Request 65872: Windows: Fixed location of Docker's `config.json` file.

2018-03-01 Thread Jeff Coffler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65872/#review198488
---


Ship it!




Ship It!

- Jeff Coffler


On March 1, 2018, 11:57 p.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65872/
> ---
> 
> (Updated March 1, 2018, 11:57 p.m.)
> 
> 
> Review request for mesos, Akash Gupta, Jeff Coffler, and Joseph Wu.
> 
> 
> Bugs: MESOS-8619
> https://issues.apache.org/jira/browse/MESOS-8619
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Per MESOS-8619, Docker checks `$USERPROFILE/.docker/config.json`
> instead of `$HOME`. Mesos overrides this environment variable in order
> to point Docker to a `config.json` file in another location, so we
> have to fix the assumption we made about Docker.
> 
> We do not add this constant to stout, because it is not consistent
> across Windows applications. This particular logic is specific to the
> implementation of Docker. Other applications might check `$HOME` or
> `$HOMEPATH` on Windows.
> 
> 
> Diffs
> -
> 
>   src/docker/docker.cpp 876dfffc2ee68e345ff336fefa6cf908c3d2a5c4 
> 
> 
> Diff: https://reviews.apache.org/r/65872/diff/1/
> 
> 
> Testing
> ---
> 
> Deployed a `docker.zip` consisting of `.docker/config.json` in "Linux-form" 
> i.e. with base64 encoded password (not `wincred`) using the following task:
> 
> ```
> {
> "name": "fetcher-test",
> "task_id": {"value" : "fetcher"},
> "agent_id": {"value" : ""},
> "resources": [
> {
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
> "value": 1
> }
> },
> {
> "name": "mem",
> "type": "SCALAR",
> "scalar": {
> "value": 512
> }
> }
> ],
> "command": {
> "uris": [ {"value": "file://C:/Users/andschwa/docker.zip"} ],
> "shell": false
> },
> "container": {
> "type": "DOCKER",
> "docker": {"image": "andschwa/nanoserver:1709"}
> }
> }
> ```
> 
> The `andschwa/nanoserver:1709` is a _private_ repo, and `docker logout` was 
> run (and confirmed that the machine could not pull the image manually).
> 
> With this patch and the `docker.zip` URI, it was fetched, unzipped, and found 
> by `docker pull`, enabling it to successfully pull the private image.
> 
> ```
> > docker images
> REPOSITORYTAG IMAGE IDCREATED 
> SIZE
> andschwa/nanoserver   1709816017814fa22 weeks ago 
> 312MB
> ```
> 
> This is difficult to unit-test because it requires private credentials, an 
> external private docker repo, and global side effects of Docker images (i.e. 
> you can't have it cached). But I am, of course, open to ideas of how to 
> programmatically test this.
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Review Request 65872: Windows: Fixed location of Docker's `config.json` file.

2018-03-01 Thread Andrew Schwartzmeyer

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65872/
---

Review request for mesos, Akash Gupta, Jeff Coffler, and Joseph Wu.


Bugs: MESOS-8619
https://issues.apache.org/jira/browse/MESOS-8619


Repository: mesos


Description
---

Per MESOS-8619, Docker checks `$USERPROFILE/.docker/config.json`
instead of `$HOME`. Mesos overrides this environment variable in order
to point Docker to a `config.json` file in another location, so we
have to fix the assumption we made about Docker.

We do not add this constant to stout, because it is not consistent
across Windows applications. This particular logic is specific to the
implementation of Docker. Other applications might check `$HOME` or
`$HOMEPATH` on Windows.


Diffs
-

  src/docker/docker.cpp 876dfffc2ee68e345ff336fefa6cf908c3d2a5c4 


Diff: https://reviews.apache.org/r/65872/diff/1/


Testing
---

Deployed a `docker.zip` consisting of `.docker/config.json` in "Linux-form" 
i.e. with base64 encoded password (not `wincred`) using the following task:

```
{
"name": "fetcher-test",
"task_id": {"value" : "fetcher"},
"agent_id": {"value" : ""},
"resources": [
{
"name": "cpus",
"type": "SCALAR",
"scalar": {
"value": 1
}
},
{
"name": "mem",
"type": "SCALAR",
"scalar": {
"value": 512
}
}
],
"command": {
"uris": [ {"value": "file://C:/Users/andschwa/docker.zip"} ],
"shell": false
},
"container": {
"type": "DOCKER",
"docker": {"image": "andschwa/nanoserver:1709"}
}
}
```

The `andschwa/nanoserver:1709` is a _private_ repo, and `docker logout` was run 
(and confirmed that the machine could not pull the image manually).

With this patch and the `docker.zip` URI, it was fetched, unzipped, and found 
by `docker pull`, enabling it to successfully pull the private image.

```
> docker images
REPOSITORYTAG IMAGE IDCREATED   
  SIZE
andschwa/nanoserver   1709816017814fa22 weeks ago   
  312MB
```

This is difficult to unit-test because it requires private credentials, an 
external private docker repo, and global side effects of Docker images (i.e. 
you can't have it cached). But I am, of course, open to ideas of how to 
programmatically test this.


Thanks,

Andrew Schwartzmeyer



Re: Review Request 65346: Updated Web UI to show quota guarantee and limit for roles.

2018-03-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65346/#review198487
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65346']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65346

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65346/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (131 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1287 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (51 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (47 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (101 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2445 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2475 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2451 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2480 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (493883 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65346/logs/mesos-tests-stderr.log):

```
I0301 23:51:14.249897  8856 slave.cpp:922] Agent terminating
I0301 23:51:14.253895 10112 hierarchical.cpp:405] Deactivated framework 
f5fd0edc-a903-4f4d-9c8c-4e4a8736e1d4-
I0301 23:51:14.253895  8856 slave.cpp:3879] Shutting down framework 
f5fd0edc-a903-4f4d-9c8c-4e4a8736e1d4-
I0301 23:51:14.253895  9660 master.cpp:10258] Updating the state of task 
2d18c18a-64ad-4ae3-801c-442ded53e4c1 of framework 
f5fd0edc-a903-4f4d-9c8c-4e4a8736e1d4- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0301 23:51:14.255899  9660 master.cpp:10357] Removing task 
2d18c18a-64ad-4ae3-801c-442ded53e4c1 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework f5fd0edc-a903-4f4d-9c8c-4e4a8736e1d4- on 
agent f5fd0edc-a903-4f4d-9c8c-4e4a8736e1d4-S0 at slave(398)@10.3.1.5:51925 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 23:51:14.253895  88I0301 23:51:13.541932  9992 exec.cpp:162] Version: 
1.6.0
I0301 23:51:13.570891 10808 exec.cpp:236] Executor registered on agent 
f5fd0edc-a903-4f4d-9c8c-4e4a8736e1d4-S0
I0301 23:51:13.574920  5376 executor.cpp:176] Received SUBSCRIBED event
I0301 23:51:13.579936  5376 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0301 23:51:13.579936  5376 executor.cpp:176] Received LAUNCH event
I0301 23:51:13.584933  5376 executor.cpp:648] Starting task 
2d18c18a-64ad-4ae3-801c-442ded53e4c1
I0301 23:51:13.665928  5376 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0301 23:51:14.213949  5376 executor.cpp:661] Forked command at 8744
I0301 23:51:14.259930  3636 exec.cpp:445] Executor asked to shutdown
I0301 23:51:14.260900  5376 executor.cpp:176] Received SHUTDOWN event
I0301 23:51:14.260900  5376 executor.cpp:758] Shutting down
I0301 23:51:14.260900  5376 executor.cpp:868] Sending SIGTERM to process tree 
at pid 856 slave.cpp:6586] Shutting down executor 
'2d18c18a-64ad-4ae3-801c-442ded53e4c1' of framework 
f5fd0edc-a903-4f4d-9c8c-4e4a8736e1d4- at executor(1)@10.3.1.5:51946
I0301 23:51:14.258903   hierarchical.cpp:344] Removed framework 
f5fd0edc-a903-4f4d-9c8c-4e4a8736e1d4-
I0301 23:51:14.259930  9416 master.cpp:1306] Agent 
f5fd0edc-a903-4f4d-9c8c-4e4a8736e1d4-S0 at slave(398)@10.3.1.5:51925 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0301 23:51:14.259930  9416 master.cpp:3276] Disconnecting agent 
f5fd0edc-a903-4f4d-9c8c-4e4a8736e1d4-S0 at slave(398)@10.3.1.5:51925 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 23:51:14.259930  9416 master.cpp:3295] Deactivating agent 
f5fd0edc-a903-4f4d-9c8c-4e4a8736e1d4-S0 at slave(398)@10.3.1.5:51925 

Re: Review Request 65679: Removed direct unmock calls and added missing mock call expectations.

2018-03-01 Thread Meng Zhu


> On March 1, 2018, 11:19 a.m., Greg Mann wrote:
> > src/tests/slave_tests.cpp
> > Lines 4870-4878 (patched)
> > 
> >
> > To avoid flakiness, should we register these expectations before the 
> > call to `unmocked__run` above?

Sounds good. Updated.


> On March 1, 2018, 11:19 a.m., Greg Mann wrote:
> > src/tests/slave_tests.cpp
> > Lines 5455 (patched)
> > 
> >
> > Is this necessary? Perhaps we could eliminate the `Future 
> > failure;`?

We either need to wait explicitly or set up the expectation as `atMost(1)`. I 
would prefer the explicit wait as it is unambiguous.


- Meng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65679/#review198462
---


On Feb. 15, 2018, 5:22 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65679/
> ---
> 
> (Updated Feb. 15, 2018, 5:22 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8611
> https://issues.apache.org/jira/browse/MESOS-8611
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Directly invoking unmock calls in the test process can potentially
> cause races with the real mock slave process. It is more robust to
> dispatch the unmock calls to the real mock slave process.
> 
> Also added several mock expectations to avoid "uninteresting mock
> call" test warnings.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_tests.cpp e253317a67019302f18afe11e2a314e716cec226 
> 
> 
> Diff: https://reviews.apache.org/r/65679/diff/3/
> 
> 
> Testing
> ---
> 
> `./bin/mesos-tests.sh --gtest_filter=*SlaveTest* --gtest_repeat=-1 
> --gtest_break_on_failure` runs forever :)
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 64211: Added options to build the Python CLI and run unit tests.

2018-03-01 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64211/#review198482
---



Looks like this had errors on the Windows CI?

- Kevin Klues


On Feb. 21, 2018, 9:58 a.m., Armand Grillet wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64211/
> ---
> 
> (Updated Feb. 21, 2018, 9:58 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Kevin Klues.
> 
> 
> Bugs: MESOS-8240
> https://issues.apache.org/jira/browse/MESOS-8240
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> An update of the discarded review /r/52543.
> 
> Works with Autotools and CMake.
> 
> 
> Diffs
> -
> 
>   CMakeLists.txt 6702f02245e3867c06bbd9efbbf4e3b961a7d9aa 
>   cmake/CompilationConfigure.cmake 50cddf9476c8c5196c4824a7b060c2680a96b277 
>   configure.ac 30fbadc32d1d96f719d45fa8067f975283c25507 
>   docs/configuration/cmake.md 1e34657ea55fa324f65f865f7d0a67084c6719d9 
>   src/CMakeLists.txt 0c135034982a6a4844509cd0e9b2cc41804333ef 
>   src/Makefile.am a2ba1e6de0868c7a4fe58304ec479dce039bc288 
>   src/python/cli_new/CMakeLists.txt PRE-CREATION 
>   src/python/cli_new/tests/CMakeLists.txt PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64211/diff/13/
> 
> 
> Testing
> ---
> 
> Testing done on Fedora 25.
> 
> For Autotools:
> ```
> $ ./bootstrap
> $ mkdir build
> $ cd build
> $ ../configure --enable-new-cli --disable-java --disable-python
> $ make check
> ```
> 
> For CMake:
> ```
> $ ./bootstrap
> $ mkdir build
> $ cd build
> $ cmake .. -DENABLE_NEW_CLI=1
> $ cmake --build . -- -j16
> $ ./src/mesos
> Mesos CLI
> 
> Usage:
>   mesos (-h | --help)
>   mesos --version
>   mesos  [...]
> 
> Options:
>   -h --help  Show this screen.
>   --version  Show version info.
> 
> Commands:
>   agent   Interacts with the Mesos agents
>   config  Interacts with the Mesos CLI configuration file
>   taskInteracts with the tasks running in a Mesos cluster
> 
> See 'mesos help ' for more information on a specific command.
> $ cmake --build . --target tests -- -j16
> $ ctest -R CLI
> Test project /home/agrillet/apache-mesos/build
> Start 4: CLITests
> 1/1 Test #4: CLITests .   Passed3.63 sec
> 
> 100% tests passed, 0 tests failed out of 1
> ```
> 
> Checked that the the CLI tests were run, that the content of the directory 
> `build/src/cli` was as expected, and that `build/src/mesos` was correctly 
> running.
> 
> 
> Thanks,
> 
> Armand Grillet
> 
>



Re: Review Request 64970: Use tox for linting and testing code living uder src/python.

2018-03-01 Thread Kevin Klues

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64970/#review198478
---




src/python/cli_new/tox.ini
Lines 1-22 (patched)


I would pull the addition of this file out to it's own separate commit 
right after the commit changing `lib/tox.ini`.



src/python/cli_new/tox.ini
Lines 22 (patched)


Let's make `tests` a module and add it in here as well. Also let's 
explicitly add the `.py` files under `bin` to this list as well. That way all 
relevant python files will be linted by default when invoking tox directly here 
instead of via `support/mesos-style.py`



src/python/lib/tox.ini
Line 1 (original), 1 (patched)


I would pull the changing of this file out to it's own separate commit at 
the beginning of the chain of commits for this RR.



src/python/lib/tox.ini
Lines 22 (patched)


Let's add `setup.py` to this list by default as well.



support/mesos-style.py
Line 354 (original), 354-356 (patched)


s/run/lint/g



support/mesos-style.py
Line 373 (original), 373 (patched)


If we decide to add `@staticmethod` here, we should do it on all functions 
that qualify for this in the entire file.



support/mesos-style.py
Line 374 (original), 375-378 (patched)


Thi docstring is incorrect.



support/mesos-style.py
Lines 379-394 (patched)


Let's change this to:
```
   tox_cmd = [os.path.dirname(__file__), '.virtualenv', 'bin', 'tox')]
   tox_cmd += ['-c', configfile]
   if tox_env is not None:
   tox_cmd += ['-e', tox_env]
   if recreate:
   tox_cmd += ['--recreate']
   tox_cmd += ['--'] + args
```

I understand that using a generator would make it immutable and doing so is 
likely preferable in many circumstances, but I think here is reduces 
readability and is inconsistent with the rest of the code base.



support/mesos-style.py
Line 380 (original), 400 (patched)


Let's consider doing a blanket addition of @staticmethod where appropriate 
in a separate commit so that the code always remains consistent.



support/mesos-style.py
Line 382 (original), 407 (patched)


Let's consider renaming this function. Since it doesn't actually take the 
same parameters as `run_lint()`, I think we can be more descriptive in its 
naming.



support/mesos-style.py
Lines 384-385 (original), 409-410 (patched)


I think that this doc string is incorrect now.

It should also indicate what the return value of this function means.



support/mesos-style.py
Lines 413-414 (patched)


This could probably be one line.



support/mesos-style.py
Line 398 (original), 432 (patched)


This should be in it's own isolated commit before this review request with 
a good explanation of what it's doing and why it's necessary.



support/mesos-style.py
Lines 447 (patched)


Let's consider renaming this function. Since it doesn't actually take the 
same parameters as `run_lint()`, I think we can be more descriptive in its 
naming.


- Kevin Klues


On March 1, 2018, 10:34 p.m., Eric Chung wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64970/
> ---
> 
> (Updated March 1, 2018, 10:34 p.m.)
> 
> 
> Review request for mesos, Armand Grillet, Jason Lai, and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Use tox for linting and testing code living uder src/python.
> 
> At the moment, all linting is done through the same `pylint`
> installation under support/.virtualenv, which requires ALL dependencies
> (i.e. pip-requirements.txt, requirements.in scattered in various
> directories) to be installed in the same virtualenv, making things
> really messy -- e.g. when I've changed some code under `src/python/lib`,
> but don't have the dev virtualenv activated, linting will fail since
> none of the dependencies under `src/python/lib` have been installed.
> 
> Using tox, we can solve this problem by distributing a "test spec"
> (tox.ini) in each of the python source directories which are aware 

Re: Review Request 65346: Updated Web UI to show quota guarantee and limit for roles.

2018-03-01 Thread Armand Grillet

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65346/
---

(Updated March 1, 2018, 10:50 p.m.)


Review request for mesos, Benjamin Mahler, Kapil Arya, Michael Park, Meng Zhu, 
and Till Toenshoff.


Changes
---

Removed `data-valign="middle"` attribute.


Bugs: MESOS-8486
https://issues.apache.org/jira/browse/MESOS-8486


Repository: mesos


Description
---

There is no limit field yet thus the limit displayed is currently
the guarantee. We will access this field once it will be exposed.


Diffs (updated)
-

  src/webui/master/static/css/mesos.css 
3e3a3b13cfc4d63aea73d0a6572b22a2de94541e 
  src/webui/master/static/index.html 46cb2843c2221ccebb7d811c0045be2c124afec1 
  src/webui/master/static/js/app.js 8960f9d4421f7b96ba06deb623afafb06013b622 
  src/webui/master/static/roles.html 1c84ad33a554e078d854e25b4d5ca311c8507f91 


Diff: https://reviews.apache.org/r/65346/diff/5/

Changes: https://reviews.apache.org/r/65346/diff/4-5/


Testing
---

New UI:
![New UI](https://i.imgur.com/8YFZnmH.png)
The group columns cannot be ordered as they have sub-columns, this is expected.


Thanks,

Armand Grillet



Re: Review Request 64970: Use tox for linting and testing code living uder src/python.

2018-03-01 Thread Eric Chung

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64970/
---

(Updated March 1, 2018, 10:34 p.m.)


Review request for mesos, Armand Grillet, Jason Lai, and Kevin Klues.


Summary (updated)
-

Use tox for linting and testing code living uder src/python.


Repository: mesos


Description
---

Use tox for linting and testing code living uder src/python.

At the moment, all linting is done through the same `pylint`
installation under support/.virtualenv, which requires ALL dependencies
(i.e. pip-requirements.txt, requirements.in scattered in various
directories) to be installed in the same virtualenv, making things
really messy -- e.g. when I've changed some code under `src/python/lib`,
but don't have the dev virtualenv activated, linting will fail since
none of the dependencies under `src/python/lib` have been installed.

Using tox, we can solve this problem by distributing a "test spec"
(tox.ini) in each of the python source directories which are aware of
its local dependencies only. To test or lint the code there would be as
simple as running `tox -e py27-lint `, and the corresponding
virtualenv and test dependencies would automatically be setup.

This patch modifies `support/mesos-style.py` to install `tox` in
`support/.virtualenv` and delegates linting to a `tox` call when it sees
python directories that have tox setup for it. Linting for all other
languages will not be effected.

Testing Done:
1. intentionally create a lint error, such as extra spaces before a
parens in a python file
2. run the pre-commit hook and see tox in action

Reviewed at https://reviews.apache.org/r/64970/


Diffs
-

  src/python/cli_new/tox.ini PRE-CREATION 
  src/python/lib/requirements-test.in b2b73aab65377d9310797203ea84c5150ae60805 
  src/python/lib/tox.ini fd5e89c77c8608fea21e9caad814c6e111ad57db 
  support/mesos-style.py 47ec36949010fa511d1b3974739c5ad5c03f6f7f 


Diff: https://reviews.apache.org/r/64970/diff/4/


Testing
---

1. intentionally create a lint error, such as extra spaces before a parens in a 
python file
2. run the pre-commit hook and see tox in action


Thanks,

Eric Chung



Re: Review Request 64970: Replace ad hoc venv under support/ with tox.

2018-03-01 Thread Armand Grillet

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64970/#review198474
---



There is still one issue with this patch, I have seen some verbose logs that 
are repeated at each run:

```
apache-mesos (tox) $ git commit -m "Test."
No C++ files to lint
No JavaScript files to lint
Checking 1 Python file
py27-lint installed: 
astroid==1.4.8,attrs==17.4.0,backports.functools-lru-cache==1.2.1,configparser==3.5.0,coverage==4.5.1,docopt==0.6.2,funcsigs==1.0.2,isort==4.2.5,kazoo==2.3.1,lazy-object-proxy==1.2.2,mccabe==0.5.2,mock==2.0.0,parse==1.8.0,pbr==3.1.1,pluggy==0.6.0,py==1.5.2,Pygments==2.1.3,PyInstaller==3.1.1,pylint==1.6.4,pytest==3.4.1,pytest-cov==2.5.1,six==1.10.0,termcolor==1.1.0,toml==0.9.2,tox==2.7.0,virtualenv==15.1.0,wrapt==1.10.8
py27-lint runtests: PYTHONHASHSEED='1333548182'
py27-lint runtests: commands[0] | pylint 
--rcfile=../../../support/pylint.config 
--rcfile=/Users/Armand/Code/apache-mesos/support/pylint.config 
lib/cli/plugins/config/main.py
* Module cli.plugins.config.main
W: 80,19: Unused argument 'argv' (unused-argument)
ERROR: InvocationError: 
'/Users/Armand/Code/apache-mesos/src/python/cli_new/.tox/py27-lint/bin/pylint 
--rcfile=../../../support/pylint.config 
--rcfile=/Users/Armand/Code/apache-mesos/support/pylint.config 
lib/cli/plugins/config/main.py'
___ summary 
ERROR:   py27-lint: commands failed
Total errors found: 1
apache-mesos (tox) $ vi src/python/cli_new/lib/cli/plugins/config/main.py
apache-mesos (tox) $ git add src/python/cli_new/lib/cli/plugins/config/main.py
apache-mesos (tox) $ git commit -m "Test."
No C++ files to lint
No JavaScript files to lint
Checking 1 Python file
py27-lint installed: 
astroid==1.4.8,attrs==17.4.0,backports.functools-lru-cache==1.2.1,configparser==3.5.0,coverage==4.5.1,docopt==0.6.2,funcsigs==1.0.2,isort==4.2.5,kazoo==2.3.1,lazy-object-proxy==1.2.2,mccabe==0.5.2,mock==2.0.0,parse==1.8.0,pbr==3.1.1,pluggy==0.6.0,py==1.5.2,Pygments==2.1.3,PyInstaller==3.1.1,pylint==1.6.4,pytest==3.4.1,pytest-cov==2.5.1,six==1.10.0,termcolor==1.1.0,toml==0.9.2,tox==2.7.0,virtualenv==15.1.0,wrapt==1.10.8
py27-lint runtests: PYTHONHASHSEED='87451174'
py27-lint runtests: commands[0] | pylint 
--rcfile=../../../support/pylint.config 
--rcfile=/Users/Armand/Code/apache-mesos/support/pylint.config 
lib/cli/plugins/config/main.py
___ summary 
  py27-lint: commands succeeded
  congratulations :)
Total errors found: 0
[tox 46cc7db60] Test.
 1 file changed, 1 insertion(+), 1 deletion(-)
```
I'm talking about the lines starting with `py27-lint installed:`. Can we 
improve that by not reinstalling `py27-lint` if it is already installed or just 
reducing the verbosity of the logs to hide the installation?


support/mesos-style.py
Lines 376 (patched)


s/`Activate`/`Activates` and s/`run`/`runs`.



support/mesos-style.py
Lines 401 (patched)


Not sure if adding this method is necessary (specially in that patch) but 
it does not hurt.


- Armand Grillet


On Feb. 23, 2018, 7:58 a.m., Eric Chung wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64970/
> ---
> 
> (Updated Feb. 23, 2018, 7:58 a.m.)
> 
> 
> Review request for mesos, Armand Grillet, Jason Lai, and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Use tox for linting and testing code living uder src/python.
> 
> At the moment, all linting is done through the same `pylint`
> installation under support/.virtualenv, which requires ALL dependencies
> (i.e. pip-requirements.txt, requirements.in scattered in various
> directories) to be installed in the same virtualenv, making things
> really messy -- e.g. when I've changed some code under `src/python/lib`,
> but don't have the dev virtualenv activated, linting will fail since
> none of the dependencies under `src/python/lib` have been installed.
> 
> Using tox, we can solve this problem by distributing a "test spec"
> (tox.ini) in each of the python source directories which are aware of
> its local dependencies only. To test or lint the code there would be as
> simple as running `tox -e py27-lint `, and the corresponding
> virtualenv and test dependencies would automatically be setup.
> 
> This patch modifies `support/mesos-style.py` to install `tox` in
> `support/.virtualenv` and delegates linting to a `tox` call when it sees
> python directories that have tox setup for it. Linting for all other
> languages will not be effected.
> 
> Testing Done:
> 1. intentionally create 

Re: Review Request 65869: Removed unnecessary warning in agent `statusUpdate()`.

2018-03-01 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65869/#review198475
---



Patch looks great!

Reviews applied: [65679, 65869]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On March 1, 2018, 6:32 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65869/
> ---
> 
> (Updated March 1, 2018, 6:32 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If the task status update is generated by the agent
> before the executor creation (e.g. due to task authorization
> or unschedule GC failure), this "could not find executor"
> warning will be triggered. But this is expected.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp e2facb3c15a2f907f6497c58a36842ed707f2c70 
> 
> 
> Diff: https://reviews.apache.org/r/65869/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65713: Handled hanging docker `stop`, `inspect` commands in docker executor.

2018-03-01 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65713/#review198473
---


Fix it, then Ship it!





src/docker/executor.cpp
Lines 515-516 (original), 535-536 (patched)


Add comments:

docker stop may or may not finish. Our behavior is to give the subprocess a 
chance to finish until next time _killtask() isinvoked. Otherwise, the 
subprocess will not be discarded forever.



src/docker/executor.cpp
Lines 516-518 (original), 536-546 (patched)


Related to my previous comment:

`KILL_RETRY_INTERVAL` is for health check. probably we should only do 
`stop.after()` if `killedByHealthCheck` is true.



src/docker/executor.cpp
Line 518 (original), 538-539 (patched)


this log is not true for normal task kills that are invoked. E.g., if a 
task is not killed by health check, after the time out we log "Docker stop 
timed out ...", but actually the docker stop command is timed out/discarded 
when the next killtask is invoked. And it is even not true if a scheduler never 
retry killtask(). So this log is not true for a normal killtask(). we should 
log it for `killedBy HealthCheck` case only.


- Gilbert Song


On Feb. 27, 2018, 5:37 p.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65713/
> ---
> 
> (Updated Feb. 27, 2018, 5:37 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previosly, if `docker inspect` command hanged, the docker container
> ended up in an unkillable state. This patch adds a timeout for inspect
> command after receiving `killTask` analogically to `reaped` handler.
> In addition we've added a timeout for `docker stop` command. If docker
> `stop` or `inspect` command times out, we discard the related future,
> thus the docker library kills previously spawned docker cli subprocess.
> As a result, a scheduler can retry `killTask` operation to handle
> nasty docker bugs that lead to hanging docker cli.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65713/diff/5/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> Manual testing:
> 1. Build docker from sources: 
> http://oyvindsk.com/writing/docker-build-from-source
> 2. Modify `ContainerInspect` function from `docker/inspect.go`:
> ```
>  func (daemon *Daemon) ContainerInspect(name string, size bool, version 
> string) (interface{}, error) {
> +   time.Sleep(10 * time.Second)
> ```
> 3. Modify `ContainerStop` function from `docker/stop.go`:
> ```
>  func (daemon *Daemon) ContainerStop(name string, seconds *int) error {
> +   rand.Seed(time.Now().UTC().UnixNano())
> +   if rand.Intn(2) == 0 {
> +   time.Sleep(20 * time.Second)
> +   }
> ```
> 4. Rebuild docker: `sudo make build && sudo make binary`
> 5. Stop system docker daemon: `sudo service docker stop`
> 6. Start modified docker daemon: `sudo ./bundles/binary-daemon/dockerd-dev`
> 7. Modify `src/cli/execute.cpp`:
>   a) Add `delay(Seconds(15), self(), ::retryKill, task->task_id(), 
> offer.agent_id());` after 
> https://github.com/apache/mesos/blob/072ea2787ffca6f2a6dcb2d636f68c51823d6665/src/cli/execute.cpp#L606
>   b) Add a new method `retryKill` to `CommandScheduler`:
> ```
>   void retryKill(const TaskID& taskId, const AgentID& agentId)
>   {
> killTask(taskId, agentId);
> delay(Seconds(6), self(), ::retryKill, taskId, agentId);
>   }
> ```
> 8. Rebuild mesos
> 9. Run mesos master: `./bin/mesos-master.sh --work_dir='var/master-1'`
> 10. Run mesos agent: `GLOG_v=1 ./bin/mesos-agent.sh 
> --resources="cpus:1;mem:100" 
> --work_dir='/home/abudnik/mesos/build/var/agent-1' 
> --containerizers="docker,mesos" --master="127.0.1.1:5050"`
> 11. Submit a task for the docker executor: `./src/mesos-execute 
> --master="127.0.1.1:5050" --name="a" --containerizer=docker 
> --docker_image="ubuntu:xenial" --command="sleep "`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 65482: Improved handling of non-terminal operations after master failover.

2018-03-01 Thread Greg Mann


> On Feb. 15, 2018, 11:55 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 7647-7654 (patched)
> > 
> >
> > I'm sitting here trying to think of ways we might avoid crashing if the 
> > framework subscribes before the operation becomes terminal...
> > 
> > Would it be reasonable to add an `if (framework == nullptr)` check to 
> > `updateOperation()` so that we only recover resources if the framework is 
> > known to the master?
> 
> Greg Mann wrote:
> Er... wait that doesn't make sense :) I guess when we receive the 
> operation update, we have no way of knowing whether or not the framework had 
> subscribed when the master learned about the pending operation. As a 
> workaround for now, we could store in a set the operation UUIDs of operations 
> for which we do not track allocated resources (i.e., operations which hit 
> this block of code). Then, in `updateOperation` we could avoid recovering 
> resources if the operation's UUID is in the set?
> 
> Benjamin Bannier wrote:
> No matter what we do here, we will already have entered dangerous 
> territory with `CHECK` failures looming as soon as we added such an operation 
> to master state, since we cannot update the allocator on these used resources 
> (update the allocation to reflect operation results, or recover). We might 
> also end up unknowingly oversubscribing the resources used by the operation.
> 
> I am unsure whether working around this by e.g., no updating the 
> allocator when the operation becomes terminal is a good way forward since it 
> seems to delay the needed master failover for even longer. With the approach 
> I proposed we would failover when this operation gets terminal (i.e., when we 
> can safely reconcile this particular operation).
> 
> I added the framework ID to the log output since it might be useful for 
> operators debugging such scenarios.

I think I agree - let's merge this improvement for now and try to resolve 
MESOS-8582 for Mesos 1.6.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65482/#review197641
---


On Feb. 16, 2018, 2:12 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65482/
> ---
> 
> (Updated Feb. 16, 2018, 2:12 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8536
> https://issues.apache.org/jira/browse/MESOS-8536
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixes the handling of non-terminal operations learned by a
> newly elected master after a master failover, so that only these
> operations are counted as using resources. Previously we did not count
> any operations as using resources which by accident produced expected
> behavior if the operation was already terminal when the master learned
> about them.
> 
> We do not address the issue of being unable to properly account for
> operations triggered by frameworks unknown to the master, see
> MESOS-8582. Instead we emit a warning for now since the master might
> continue to abort due to assertion failures due to incomplete resource
> accounting.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp b06d7a6e2fbbb81b97eaf537d5b6745c73dc867d 
> 
> 
> Diff: https://reviews.apache.org/r/65482/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`, also tested with a version of the test added in r/65045 which 
> triggered this issue.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 63862: Windows: Ported docker_tests.cpp.

2018-03-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63862/#review198468
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65839', '65861', '65862', '65840', '63862']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63862

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63862/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (144 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1246 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (40 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (44 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (87 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2586 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2611 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2476 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2501 ms total)

[--] Global test environment tear-down
[==] 916 tests from 91 test cases ran. (471473 ms total)
[  PASSED  ] 915 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 210 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/63862/logs/mesos-tests-stderr.log):

```
I0301 20:24:08.754911  1120 master.cpp:10258] Updating the state of task 
3fe9fb67-1013-4cb8-963c-e2cba420dec4 of framework 
5141e1be-97ce-44d2-84a8-8183a8110499- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0301 20:24:08.754911  7984 slave.cpp:3879] Shutting down framework 
5141e1be-97ce-44d2-84a8-8183a8110499-
I0301 20:24:08.754911  7984 slave.cpp:6586] Shutting down executor 
'3fe9fb67-1013-4cb8-963c-e2cba420dec4' of framework 
5141e1be-97ce-44d2-84a8-8183a8110499- at executor(1)@10.3.1.11:53096
I0301 20:24:08.756911  7984 slave.cpp:922] Agent teI0301 20:24:08.048912  7596 
exec.cpp:162] Version: 1.6.0
I0301 20:24:08.078908  3588 exec.cpp:236] Executor registered on agent 
5141e1be-97ce-44d2-84a8-8183a8110499-S0
I0301 20:24:08.082911  9236 executor.cpp:176] Received SUBSCRIBED event
I0301 20:24:08.087911  9236 executor.cpp:180] Subscribed executor on 
build-srv-03.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0301 20:24:08.088912  9236 executor.cpp:176] Received LAUNCH event
I0301 20:24:08.093912  9236 executor.cpp:648] Starting task 
3fe9fb67-1013-4cb8-963c-e2cba420dec4
I0301 20:24:08.176914  9236 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0301 20:24:08.718909  9236 executor.cpp:661] Forked command at 8280
I0301 20:24:08.757910  6008 exec.cpp:445] Executor asked to shutdown
I0301 20:24:08.758909  9236 executor.cpp:176] Received SHUTDOWN event
I0301 20:24:08.758909  9236 executor.cpp:758] Shutting down
I0301 20:24:08.758909  9236 executor.cpp:868] Sending SIGTERM to process tree 
at pid 8rminating
W0301 20:24:08.757910  7984 slave.cpp:3875] Ignoring shutdown framework 
5141e1be-97ce-44d2-84a8-8183a8110499- because it is terminating
I0301 20:24:08.757910  1120 master.cpp:10357] Removing task 
3fe9fb67-1013-4cb8-963c-e2cba420dec4 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework 5141e1be-97ce-44d2-84a8-8183a8110499- on 
agent 5141e1be-97ce-44d2-84a8-8183a8110499-S0 at slave(398)@10.3.1.11:53075 
(build-srv-03.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 20:24:08.760911  1120 master.cpp:1306] Agent 
5141e1be-97ce-44d2-84a8-8183a8110499-S0 at slave(398)@10.3.1.11:53075 
(build-srv-03.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0301 20:24:08.760911  1120 master.cpp:3276] Disconnecting agent 
5141e1be-97ce-44d2-84a8-8183a8110499-S0 at slave(398)@10.3.1.11:53075 
(build-srv-03.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 20:24:08.760911  1120 master.cpp:3295] Deactivating agent 
5141e1be-97ce-44d2-84a8-8183a8110499-S0 at slave(398)@10.3.1.11:53075 
(build-srv-03.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 

Re: Review Request 65869: Removed unnecessary warning in agent `statusUpdate()`.

2018-03-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65869/#review198467
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65679', '65869']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65869

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65869/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (123 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1151 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (37 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (44 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (83 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2496 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2522 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2473 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2498 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (486104 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65869/logs/mesos-tests-stderr.log):

```
I0301 20:23:42.865205  7096 master.cpp:10258] Updating the state of task 
70f6128e-de3d-4ee4-9044-aab88d13ec7e of framework 
f6582520-4ec6-4db0-a950-71d264a5995d- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0301 20:23:42.865205  3344 slave.cpp:3879] Shutting down framework 
f6582520-4ec6-4db0-a950-71d264a5995d-
I0301 20:23:42.865205  3344 slave.cpp:6584] Shutting down executor '7I0301 
20:23:42.163199  6344 exec.cpp:162] Version: 1.6.0
I0301 20:23:42.191201 10352 exec.cpp:236] Executor registered on agent 
f6582520-4ec6-4db0-a950-71d264a5995d-S0
I0301 20:23:42.196173 10860 executor.cpp:176] Received SUBSCRIBED event
I0301 20:23:42.200199 10860 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0301 20:23:42.201202 10860 executor.cpp:176] Received LAUNCH event
I0301 20:23:42.206202 10860 executor.cpp:648] Starting task 
70f6128e-de3d-4ee4-9044-aab88d13ec7e
I0301 20:23:42.287200 10860 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0301 20:23:42.825179 10860 executor.cpp:661] Forked command at 9992
I0301 20:23:42.869204  8096 exec.cpp:445] Executor asked to shutdown
I0301 20:23:42.869204 10860 executor.cpp:176] Received SHUTDOWN event
I0301 20:23:42.870178 10860 executor.cpp:758] Shutting down
I0301 20:23:42.870178 10860 executor.cpp:868] Sending SIGTERM to process tree 
at pid 90f6128e-de3d-4ee4-9044-aab88d13ec7e' of framework 
f6582520-4ec6-4db0-a950-71d264a5995d- at executor(1)@10.3.1.5:49266
I0301 20:23:42.867203  3344 slave.cpp:922] Agent terminating
W0301 20:23:42.868202  3344 slave.cpp:3875] Ignoring shutdown framework 
f6582520-4ec6-4db0-a950-71d264a5995d- because it is terminating
I0301 20:23:42.868202  7096 master.cpp:10357] Removing task 
70f6128e-de3d-4ee4-9044-aab88d13ec7e with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework f6582520-4ec6-4db0-a950-71d264a5995d- on 
agent f6582520-4ec6-4db0-a950-71d264a5995d-S0 at slave(398)@10.3.1.5:49244 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 20:23:42.870178  7096 master.cpp:1306] Agent 
f6582520-4ec6-4db0-a950-71d264a5995d-S0 at slave(398)@10.3.1.5:49244 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I0301 20:23:42.870178  7096 master.cpp:3276] Disconnecting agent 
f6582520-4ec6-4db0-a950-71d264a5995d-S0 at slave(398)@10.3.1.5:49244 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 20:23:42.871181  9272 hierarchical.cpp:344] Removed framework 
f6582520-4ec6-4db0-a950-71d264a5995d-
I0301 20:23:42.871181  8028 containerizer.cpp:2338] Destroying container 
256dbc26-a591-46f4-9c83-d8c9d9b8d44f in RUNNING state
I0301 

Re: Review Request 65868: Added inspect retries to the docker containerizer in `update` method.

2018-03-01 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65868/#review198466
---



Patch looks great!

Reviews applied: [65786, 65787, 65751, 65750, 65683, 65713, 65759, 65868]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On March 1, 2018, 10:12 a.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65868/
> ---
> 
> (Updated March 1, 2018, 10:12 a.m.)
> 
> 
> Review request for mesos, Gilbert Song and Greg Mann.
> 
> 
> Bugs: MESOS-8605
> https://issues.apache.org/jira/browse/MESOS-8605
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixes the bug when a terminal status update is never sent
> after termination of the docker executor, when the Docker daemon hangs
> for `inspect` command. A terminal status update is postponed until
> containerizer `update` completes that uses `inspect` command to get
> a PID of the docker container. To address the issue, we retry `inspect`
> in the loop.
> 
> 
> Diffs
> -
> 
>   src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc 
>   src/slave/containerizer/docker.cpp 1f4eeb4628517707ab866014fcda9d9b15053d1f 
> 
> 
> Diff: https://reviews.apache.org/r/65868/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check (Fedora 25)
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 65868: Added inspect retries to the docker containerizer in `update` method.

2018-03-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65868/#review198463
---



PASS: Mesos patch 65868 was successfully built and tested.

Reviews applied: `['65713', '65759', '65868']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65868

- Mesos Reviewbot Windows


On March 1, 2018, 6:12 p.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65868/
> ---
> 
> (Updated March 1, 2018, 6:12 p.m.)
> 
> 
> Review request for mesos, Gilbert Song and Greg Mann.
> 
> 
> Bugs: MESOS-8605
> https://issues.apache.org/jira/browse/MESOS-8605
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixes the bug when a terminal status update is never sent
> after termination of the docker executor, when the Docker daemon hangs
> for `inspect` command. A terminal status update is postponed until
> containerizer `update` completes that uses `inspect` command to get
> a PID of the docker container. To address the issue, we retry `inspect`
> in the loop.
> 
> 
> Diffs
> -
> 
>   src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc 
>   src/slave/containerizer/docker.cpp 1f4eeb4628517707ab866014fcda9d9b15053d1f 
> 
> 
> Diff: https://reviews.apache.org/r/65868/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check (Fedora 25)
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 65679: Removed direct unmock calls and added missing mock call expectations.

2018-03-01 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65679/#review198462
---




src/tests/slave_tests.cpp
Lines 4870-4878 (patched)


To avoid flakiness, should we register these expectations before the call 
to `unmocked__run` above?



src/tests/slave_tests.cpp
Lines 5455 (patched)


Is this necessary? Perhaps we could eliminate the `Future 
failure;`?


- Greg Mann


On Feb. 16, 2018, 1:22 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65679/
> ---
> 
> (Updated Feb. 16, 2018, 1:22 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8611
> https://issues.apache.org/jira/browse/MESOS-8611
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Directly invoking unmock calls in the test process can potentially
> cause races with the real mock slave process. It is more robust to
> dispatch the unmock calls to the real mock slave process.
> 
> Also added several mock expectations to avoid "uninteresting mock
> call" test warnings.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_tests.cpp e253317a67019302f18afe11e2a314e716cec226 
> 
> 
> Diff: https://reviews.apache.org/r/65679/diff/2/
> 
> 
> Testing
> ---
> 
> `./bin/mesos-tests.sh --gtest_filter=*SlaveTest* --gtest_repeat=-1 
> --gtest_break_on_failure` runs forever :)
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Review Request 65869: Removed unnecessary warning in agent `statusUpdate()`.

2018-03-01 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65869/
---

Review request for mesos, Alexander Rukletsov and Greg Mann.


Repository: mesos


Description
---

If the task status update is generated by the agent
before the executor creation (e.g. due to task authorization
or unschedule GC failure), this "could not find executor"
warning will be triggered. But this is expected.


Diffs
-

  src/slave/slave.cpp e2facb3c15a2f907f6497c58a36842ed707f2c70 


Diff: https://reviews.apache.org/r/65869/diff/1/


Testing
---

make check


Thanks,

Meng Zhu



Re: Review Request 65713: Handled hanging docker `stop`, `inspect` commands in docker executor.

2018-03-01 Thread Greg Mann


> On March 1, 2018, 8:14 a.m., Gilbert Song wrote:
> > src/docker/executor.cpp
> > Lines 540-544 (patched)
> > 
> >
> > Should we just call stop.discard() here and return stop?
> > 
> > If it is not killed by health check, we timed out the docker stop and 
> > we should discard it. If it is killed by health check, .discard() would 
> > trigger we do os::killtree on that docker stop subprocess and then return a 
> > failure, which invokes the onFailed callback below and retry.
> 
> Andrei Budnik wrote:
> >Should we just call stop.discard() here and return stop?
> 
> If we trigger `onFailed` callback, then next `_killTask` will be 
> scheduled after 5 more seconds via `delay`, so in case of hanging docker, we 
> will retry every `2 * KILL_RETRY_INTERVAL`, instead of `KILL_RETRY_INTERVAL`.
> 
> >.discard() would trigger we do os::killtree on that docker stop 
> subprocess and then return a failure, which invokes the onFailed callback
> 
> I think it will call `onDiscarded` rather than `onFailed`.

That's correct - I implemented the discard behavior such that calling 
`.discard()` on the `stop` future will almost always transition it to the 
discarded state (when the Subprocess exits with any status code after we do 
killtree). In a rare case where the `process::reap()` used within `Subprocess` 
to reap the subprocess PID actually fails, then the `Future` could transition 
to the failed state.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65713/#review198436
---


On Feb. 28, 2018, 1:37 a.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65713/
> ---
> 
> (Updated Feb. 28, 2018, 1:37 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previosly, if `docker inspect` command hanged, the docker container
> ended up in an unkillable state. This patch adds a timeout for inspect
> command after receiving `killTask` analogically to `reaped` handler.
> In addition we've added a timeout for `docker stop` command. If docker
> `stop` or `inspect` command times out, we discard the related future,
> thus the docker library kills previously spawned docker cli subprocess.
> As a result, a scheduler can retry `killTask` operation to handle
> nasty docker bugs that lead to hanging docker cli.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65713/diff/5/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> Manual testing:
> 1. Build docker from sources: 
> http://oyvindsk.com/writing/docker-build-from-source
> 2. Modify `ContainerInspect` function from `docker/inspect.go`:
> ```
>  func (daemon *Daemon) ContainerInspect(name string, size bool, version 
> string) (interface{}, error) {
> +   time.Sleep(10 * time.Second)
> ```
> 3. Modify `ContainerStop` function from `docker/stop.go`:
> ```
>  func (daemon *Daemon) ContainerStop(name string, seconds *int) error {
> +   rand.Seed(time.Now().UTC().UnixNano())
> +   if rand.Intn(2) == 0 {
> +   time.Sleep(20 * time.Second)
> +   }
> ```
> 4. Rebuild docker: `sudo make build && sudo make binary`
> 5. Stop system docker daemon: `sudo service docker stop`
> 6. Start modified docker daemon: `sudo ./bundles/binary-daemon/dockerd-dev`
> 7. Modify `src/cli/execute.cpp`:
>   a) Add `delay(Seconds(15), self(), ::retryKill, task->task_id(), 
> offer.agent_id());` after 
> https://github.com/apache/mesos/blob/072ea2787ffca6f2a6dcb2d636f68c51823d6665/src/cli/execute.cpp#L606
>   b) Add a new method `retryKill` to `CommandScheduler`:
> ```
>   void retryKill(const TaskID& taskId, const AgentID& agentId)
>   {
> killTask(taskId, agentId);
> delay(Seconds(6), self(), ::retryKill, taskId, agentId);
>   }
> ```
> 8. Rebuild mesos
> 9. Run mesos master: `./bin/mesos-master.sh --work_dir='var/master-1'`
> 10. Run mesos agent: `GLOG_v=1 ./bin/mesos-agent.sh 
> --resources="cpus:1;mem:100" 
> --work_dir='/home/abudnik/mesos/build/var/agent-1' 
> --containerizers="docker,mesos" --master="127.0.1.1:5050"`
> 11. Submit a task for the docker executor: `./src/mesos-execute 
> --master="127.0.1.1:5050" --name="a" --containerizer=docker 
> --docker_image="ubuntu:xenial" --command="sleep "`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Review Request 65868: Added inspect retries to the docker containerizer in `update` method.

2018-03-01 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65868/
---

Review request for mesos, Gilbert Song and Greg Mann.


Bugs: MESOS-8605
https://issues.apache.org/jira/browse/MESOS-8605


Repository: mesos


Description
---

This patch fixes the bug when a terminal status update is never sent
after termination of the docker executor, when the Docker daemon hangs
for `inspect` command. A terminal status update is postponed until
containerizer `update` completes that uses `inspect` command to get
a PID of the docker container. To address the issue, we retry `inspect`
in the loop.


Diffs
-

  src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc 
  src/slave/containerizer/docker.cpp 1f4eeb4628517707ab866014fcda9d9b15053d1f 


Diff: https://reviews.apache.org/r/65868/diff/1/


Testing
---

sudo make check (Fedora 25)
internal CI


Thanks,

Andrei Budnik



Re: Review Request 65346: Updated Web UI to show quota guarantee and limit for roles.

2018-03-01 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65346/#review198455
---



Patch looks great!

Reviews applied: [65503, 65346]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On March 1, 2018, 3:23 p.m., Armand Grillet wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65346/
> ---
> 
> (Updated March 1, 2018, 3:23 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Kapil Arya, Michael Park, Meng 
> Zhu, and Till Toenshoff.
> 
> 
> Bugs: MESOS-8486
> https://issues.apache.org/jira/browse/MESOS-8486
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is no limit field yet thus the limit displayed is currently
> the guarantee. We will access this field once it will be exposed.
> 
> 
> Diffs
> -
> 
>   src/webui/master/static/css/mesos.css 
> 3e3a3b13cfc4d63aea73d0a6572b22a2de94541e 
>   src/webui/master/static/index.html 46cb2843c2221ccebb7d811c0045be2c124afec1 
>   src/webui/master/static/js/app.js 8960f9d4421f7b96ba06deb623afafb06013b622 
>   src/webui/master/static/roles.html 1c84ad33a554e078d854e25b4d5ca311c8507f91 
> 
> 
> Diff: https://reviews.apache.org/r/65346/diff/4/
> 
> 
> Testing
> ---
> 
> New UI:
> ![New UI](https://i.imgur.com/8YFZnmH.png)
> The group columns cannot be ordered as they have sub-columns, this is 
> expected.
> 
> 
> Thanks,
> 
> Armand Grillet
> 
>



Re: Review Request 65346: Updated Web UI to show quota guarantee and limit for roles.

2018-03-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65346/#review198451
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65346']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65346

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65346/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (151 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1245 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (37 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (43 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (83 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2482 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2506 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2483 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2508 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (484849 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65346/logs/mesos-tests-stderr.log):

```
I0301 16:30:48.914669  9844 executor.cpp:176] Received SUBSCRIBED event
I0301 16:30:48.918663  9844 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0301 16:30:48.919664  9844 executor.cpp:176] Received LAUNCH event
I0301 16:30:48.924666  9844 executor.cpp:648] Starting task 
923a8c79-580e-40e6-bdd2-ea3cd7646797
I0301 16:30:49.008666  9844 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0301 16:30:49.547646  9844 executor.cpp:661] Forked command at 6720
I0301 16:30:49.585644 10348 exec.cpp:445] Executor asked to shutdown
I0301 16:30:49.586691  9844 executor.cpp:176] Received SHUTDOWN event
I0301 16:30:49.586691  9844 executor.cpp:758] Shutting down
I0301 16:30:49.586691  9844 executor.cpp:868] Sending SIGTERM to process tree 
at pid 67c2688e741- (default) at 
scheduler-65ea8e28-9415-4c1d-9486-f9e724645bd4@10.3.1.5:60866
I0301 16:30:49.582643  4024 master.cpp:3239] Deactivating framework 
f6b9ff81-a5f5-4e81-bc89-f27c2688e741- (default) at 
scheduler-65ea8e28-9415-4c1d-9486-f9e724645bd4@10.3.1.5:60866
I0301 16:30:49.582643  7356 slave.cpp:922] Agent terminating
I0301 16:30:49.582643  6932 hierarchical.cpp:405] Deactivated framework 
f6b9ff81-a5f5-4e81-bc89-f27c2688e741-
I0301 16:30:49.583644  7356 slave.cpp:3879] Shutting down framework 
f6b9ff81-a5f5-4e81-bc89-f27c2688e741-
I0301 16:30:49.583644  4024 master.cpp:10258] Updating the state of task 
923a8c79-580e-40e6-bdd2-ea3cd7646797 of framework 
f6b9ff81-a5f5-4e81-bc89-f27c2688e741- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0301 16:30:49.583644  7356 slave.cpp:6586] Shutting down executor 
'923a8c79-580e-40e6-bdd2-ea3cd7646797' of framework 
f6b9ff81-a5f5-4e81-bc89-f27c2688e741- at executor(1)@10.3.1.5:60887
I0301 16:30:49.588641  5596 containerizer.cpp:2338] Destroying container 
737ae9eb-ae05-4f54-b250-b255608c5fdb in RUNNING state
I0301 16:30:49.587708  4024 master.cpp:10357] Removing task 
923a8c79-580e-40e6-bdd2-ea3cd7646797 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework f6b9ff81-a5f5-4e81-bc89-f27c2688e741- on 
agent f6b9ff81-a5f5-4e81-bc89-f27c2688e741-S0 at slave(398)@10.3.1.5:60866 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 16:30:49.588641  5596 containerizer.cpp:2952] Transitioning the state of 
container 737ae9eb-ae05-4f54-b250-b255608c5fdb from RUNNING to DESTROYING
I0301 16:30:49.590642  5596 launcher.cpp:156] Asked to destroy container 
737ae9eb-ae05-4f54-b250-b255608c5fdb
I0301 16:30:49.590642  4024 master.cpp:1306] Agent 
f6b9ff81-a5f5-4e81-bc89-f27c2688e741-S0 at slave(398)@10.3.1.5:60866 

Re: Review Request 65346: Updated Web UI to show quota guarantee and limit for roles.

2018-03-01 Thread Armand Grillet

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65346/
---

(Updated March 1, 2018, 3:23 p.m.)


Review request for mesos, Benjamin Mahler, Kapil Arya, Michael Park, Meng Zhu, 
and Till Toenshoff.


Changes
---

Fixed issues.


Bugs: MESOS-8486
https://issues.apache.org/jira/browse/MESOS-8486


Repository: mesos


Description (updated)
---

There is no limit field yet thus the limit displayed is currently
the guarantee. We will access this field once it will be exposed.


Diffs (updated)
-

  src/webui/master/static/css/mesos.css 
3e3a3b13cfc4d63aea73d0a6572b22a2de94541e 
  src/webui/master/static/index.html 46cb2843c2221ccebb7d811c0045be2c124afec1 
  src/webui/master/static/js/app.js 8960f9d4421f7b96ba06deb623afafb06013b622 
  src/webui/master/static/roles.html 1c84ad33a554e078d854e25b4d5ca311c8507f91 


Diff: https://reviews.apache.org/r/65346/diff/3/

Changes: https://reviews.apache.org/r/65346/diff/2-3/


Testing (updated)
---

New UI:
![New UI](https://i.imgur.com/8YFZnmH.png)
The group columns cannot be ordered as they have sub-columns, this is expected.


Thanks,

Armand Grillet



Re: Review Request 65833: Displayed resource provider resources in GET_AGENTS response.

2018-03-01 Thread Jan Schlicht

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65833/#review198448
---


Fix it, then Ship it!




I'm seeing a display issue for `protobuf_utils.cpp` in RB UI. Checked that file 
in a diff viewer after applying the patch to current `master`.


src/tests/api_tests.cpp
Line 229 (original), 229 (patched)


As this represents a single disk resource, let's name it `resource` or 
`disk`. `disk` is what's used in similar test in `master_tests.cpp`.


- Jan Schlicht


On Feb. 28, 2018, 1:55 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65833/
> ---
> 
> (Updated Feb. 28, 2018, 1:55 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8388
> https://issues.apache.org/jira/browse/MESOS-8388
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Displayed resource provider resources in GET_AGENTS response.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto f40caa2a6338ff58a325444cfb72cd24f2c99472 
>   include/mesos/v1/master/master.proto 
> 67c9560eaa7bd6b20b73af07f7666f0d98c4453d 
>   src/common/protobuf_utils.cpp d2ada356be4c26290692bc99e7ec5121039bda4e 
>   src/tests/api_tests.cpp 9c172f751b11b3b50f23253d212c2eb603da9c2c 
> 
> 
> Diff: https://reviews.apache.org/r/65833/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 65838: Updated validation of 'RAW' and 'BLOCK' disk resources.

2018-03-01 Thread Jan Schlicht

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65838/#review198447
---



Please add this to `v1/resources.cpp` as well.

- Jan Schlicht


On Feb. 28, 2018, 5:13 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65838/
> ---
> 
> (Updated Feb. 28, 2018, 5:13 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Jie Yu, and Jan Schlicht.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated validation of 'RAW' and 'BLOCK' disk resources.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp a0c29246b3e3b377aabd0eb00ba16aa6bbd804e3 
> 
> 
> Diff: https://reviews.apache.org/r/65838/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> `sudo make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 65832: Displayed resource provider resources in GET_RESOURCE_PROVIDER response.

2018-03-01 Thread Jan Schlicht

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65832/#review198446
---


Ship it!




Ship It!

- Jan Schlicht


On Feb. 28, 2018, 1:55 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65832/
> ---
> 
> (Updated Feb. 28, 2018, 1:55 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Bugs: MESOS-8388
> https://issues.apache.org/jira/browse/MESOS-8388
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Displayed resource provider resources in GET_RESOURCE_PROVIDER response.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 7d92cb8e3dc533081d973e488bd140c7d5ea2bbf 
>   include/mesos/v1/agent/agent.proto 59a9fd69df47c3605662529b5493cd3bf18c8397 
>   src/slave/http.cpp 7d7fa2b4ec2e1f8f65c5264ce72590d0d8195b9b 
>   src/tests/api_tests.cpp 9c172f751b11b3b50f23253d212c2eb603da9c2c 
> 
> 
> Diff: https://reviews.apache.org/r/65832/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 65864: Updated mesos code with the new `os::spawn`.

2018-03-01 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65864/#review198445
---



Patch looks great!

Reviews applied: [65839, 65861, 65862, 65840, 65841, 65863, 65842, 65864]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On March 1, 2018, 9:48 a.m., Akash Gupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65864/
> ---
> 
> (Updated March 1, 2018, 9:48 a.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer, Joseph Wu, and Michael Park.
> 
> 
> Bugs: MESOS-4549
> https://issues.apache.org/jira/browse/MESOS-4549
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> `os::spawn`'s return type was changed from `int` to `Option`, so
> the code in mesos was changed accordingly.
> 
> 
> Diffs
> -
> 
>   src/linux/fs.cpp dae094224321c0974c705023daf076409049de51 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
> c60c23f74f2abf6bef8dd32cc2e47e33bf666169 
>   src/tests/containerizer/perf_tests.cpp 
> d8aab08eb131f974821fb85662cbc6cc685d2f3e 
> 
> 
> Diff: https://reviews.apache.org/r/65864/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Akash Gupta
> 
>



Re: Review Request 65713: Handled hanging docker `stop`, `inspect` commands in docker executor.

2018-03-01 Thread Andrei Budnik


> On March 1, 2018, 8:14 a.m., Gilbert Song wrote:
> > src/docker/executor.cpp
> > Lines 467-468 (patched)
> > 
> >
> > In this case, should we log it?

We log inspect timeouts in the following patch inside inspect loop.


> On March 1, 2018, 8:14 a.m., Gilbert Song wrote:
> > src/docker/executor.cpp
> > Lines 540-544 (patched)
> > 
> >
> > Should we just call stop.discard() here and return stop?
> > 
> > If it is not killed by health check, we timed out the docker stop and 
> > we should discard it. If it is killed by health check, .discard() would 
> > trigger we do os::killtree on that docker stop subprocess and then return a 
> > failure, which invokes the onFailed callback below and retry.

>Should we just call stop.discard() here and return stop?

If we trigger `onFailed` callback, then next `_killTask` will be scheduled 
after 5 more seconds via `delay`, so in case of hanging docker, we will retry 
every `2 * KILL_RETRY_INTERVAL`, instead of `KILL_RETRY_INTERVAL`.

>.discard() would trigger we do os::killtree on that docker stop subprocess and 
>then return a failure, which invokes the onFailed callback

I think it will call `onDiscarded` rather than `onFailed`.


- Andrei


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65713/#review198436
---


On Feb. 28, 2018, 1:37 a.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65713/
> ---
> 
> (Updated Feb. 28, 2018, 1:37 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previosly, if `docker inspect` command hanged, the docker container
> ended up in an unkillable state. This patch adds a timeout for inspect
> command after receiving `killTask` analogically to `reaped` handler.
> In addition we've added a timeout for `docker stop` command. If docker
> `stop` or `inspect` command times out, we discard the related future,
> thus the docker library kills previously spawned docker cli subprocess.
> As a result, a scheduler can retry `killTask` operation to handle
> nasty docker bugs that lead to hanging docker cli.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65713/diff/5/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> Manual testing:
> 1. Build docker from sources: 
> http://oyvindsk.com/writing/docker-build-from-source
> 2. Modify `ContainerInspect` function from `docker/inspect.go`:
> ```
>  func (daemon *Daemon) ContainerInspect(name string, size bool, version 
> string) (interface{}, error) {
> +   time.Sleep(10 * time.Second)
> ```
> 3. Modify `ContainerStop` function from `docker/stop.go`:
> ```
>  func (daemon *Daemon) ContainerStop(name string, seconds *int) error {
> +   rand.Seed(time.Now().UTC().UnixNano())
> +   if rand.Intn(2) == 0 {
> +   time.Sleep(20 * time.Second)
> +   }
> ```
> 4. Rebuild docker: `sudo make build && sudo make binary`
> 5. Stop system docker daemon: `sudo service docker stop`
> 6. Start modified docker daemon: `sudo ./bundles/binary-daemon/dockerd-dev`
> 7. Modify `src/cli/execute.cpp`:
>   a) Add `delay(Seconds(15), self(), ::retryKill, task->task_id(), 
> offer.agent_id());` after 
> https://github.com/apache/mesos/blob/072ea2787ffca6f2a6dcb2d636f68c51823d6665/src/cli/execute.cpp#L606
>   b) Add a new method `retryKill` to `CommandScheduler`:
> ```
>   void retryKill(const TaskID& taskId, const AgentID& agentId)
>   {
> killTask(taskId, agentId);
> delay(Seconds(6), self(), ::retryKill, taskId, agentId);
>   }
> ```
> 8. Rebuild mesos
> 9. Run mesos master: `./bin/mesos-master.sh --work_dir='var/master-1'`
> 10. Run mesos agent: `GLOG_v=1 ./bin/mesos-agent.sh 
> --resources="cpus:1;mem:100" 
> --work_dir='/home/abudnik/mesos/build/var/agent-1' 
> --containerizers="docker,mesos" --master="127.0.1.1:5050"`
> 11. Submit a task for the docker executor: `./src/mesos-execute 
> --master="127.0.1.1:5050" --name="a" --containerizer=docker 
> --docker_image="ubuntu:xenial" --command="sleep "`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 65864: Updated mesos code with the new `os::spawn`.

2018-03-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65864/#review198443
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65839', '65861', '65862', '65840', '65841', '65863', 
'65842', '65864']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65864

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65864/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (122 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1161 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (37 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (42 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (82 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2452 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (2477 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2471 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (2496 ms total)

[--] Global test environment tear-down
[==] 915 tests from 90 test cases ran. (483944 ms total)
[  PASSED  ] 914 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CommandExecutorCheckTest.CommandCheckTimeout

 1 FAILED TEST
  YOU HAVE 211 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65864/logs/mesos-tests-stderr.log):

```
I0301 11:13:00.392838  3904 master.cpp:10258] Updating the state of task 
c089b288-d808-4c8c-9442-65a71ba36ab6 of framework 
1c9f8c1a-f5a9-4cd2-8de8-e0edd0b575e3- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0301 11:13:00.392838  7792 slave.cpp:3879] Shutting down framework 
1c9f8c1a-f5a9-4cd2-8de8-e0edd0b575e3-
I0301 11:13:00.393837  I0301 11:12:59.687856  8596 exec.cpp:162] Version: 1.6.0
I0301 11:12:59.715831  5672 exec.cpp:236] Executor registered on agent 
1c9f8c1a-f5a9-4cd2-8de8-e0edd0b575e3-S0
I0301 11:12:59.719864  2348 executor.cpp:176] Received SUBSCRIBED event
I0301 11:12:59.724831  2348 executor.cpp:180] Subscribed executor on 
build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I0301 11:12:59.725832  2348 executor.cpp:176] Received LAUNCH event
I0301 11:12:59.730831  2348 executor.cpp:648] Starting task 
c089b288-d808-4c8c-9442-65a71ba36ab6
I0301 11:12:59.820832  2348 executor.cpp:483] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0301 11:13:00.362879  2348 executor.cpp:661] Forked command at 7744
I0301 11:13:00.396836  8640 exec.cpp:445] Executor asked to shutdown
I0301 11:13:00.397835  2348 executor.cpp:176] Received SHUTDOWN event
I0301 11:13:00.397835  2348 executor.cpp:758] Shutting down
I0301 11:13:00.397835  2348 executor.cpp:868] Sending SIGTERM to process tree 
at pid 77792 slave.cpp:6586] Shutting down executor 
'c089b288-d808-4c8c-9442-65a71ba36ab6' of framework 
1c9f8c1a-f5a9-4cd2-8de8-e0edd0b575e3- at executor(1)@10.3.1.5:58360
I0301 11:13:00.395835  7792 slave.cpp:922] Agent terminating
W0301 11:13:00.395835  7792 slave.cpp:3875] Ignoring shutdown framework 
1c9f8c1a-f5a9-4cd2-8de8-e0edd0b575e3- because it is terminating
I0301 11:13:00.395835  3904 master.cpp:10357] Removing task 
c089b288-d808-4c8c-9442-65a71ba36ab6 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework 1c9f8c1a-f5a9-4cd2-8de8-e0edd0b575e3- on 
agent 1c9f8c1a-f5a9-4cd2-8de8-e0edd0b575e3-S0 at slave(398)@10.3.1.5:58339 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I0301 11:13:00.398835  1376 containerizer.cpp:2338] Destroying container 
6250952d-c4ad-42e8-a115-0c87830b89eb in RUNNING state
I0301 11:13:00.398835  2876 hierarchical.cpp:344] Removed framework 
1c9f8c1a-f5a9-4cd2-8de8-e0edd0b575e3-
I0301 11:13:00.398835  1376 containerizer.cpp:2952] Transitioning the state of 
container 6250952d-c4ad-42e8-a115-0c87830b89eb from RUNNING to DESTROYING
I0301 11:13:00.398835  3904 master.cpp:1306] Agent 
1c9f8c1a-f5a9-4cd2-8de8-e0edd0b575e3-S0 at slave(398)@10.3.1.5:58339 
(build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected

Re: Review Request 65856: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

2018-03-01 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65856/#review198441
---



Patch looks great!

Reviews applied: [65855, 65856]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 28, 2018, 9:17 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65856/
> ---
> 
> (Updated Feb. 28, 2018, 9:17 p.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8620
> https://issues.apache.org/jira/browse/MESOS-8620
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This flag specifies a timeout for `mesos-fetcher` to wait before
> aborting if the download speed keeps below 1 bytes/sec. This would avoid
> containers to get stuck at FETCHING. The default value is 1 minute.
> 
> 
> Diffs
> -
> 
>   include/mesos/fetcher/fetcher.proto 
> 6a5d807221681853444ffd3ab42a23827604e550 
>   src/launcher/fetcher.cpp 2f42fa221a42fdc131a8a97ded4e9433ce51ea77 
>   src/slave/constants.hpp 030fb05186f7f360010bb7e5b4948faac69771cc 
>   src/slave/containerizer/fetcher.cpp 
> a49411b7bac2d5a50a75d0b802842c2f61fe58c6 
>   src/slave/flags.hpp 0c67bf214ceb93ae7ff088bec2648fa26ddac59e 
>   src/slave/flags.cpp 943aaaf58b5f36555f0902019b8c5c6522ab7afc 
> 
> 
> Diff: https://reviews.apache.org/r/65856/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> Manually tested with Nginx servers that sleeps for 59 seconds and 1 mintue 
> before serving artifacts.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 65841: Stout: Changed `os::system` to return `Option` instead of `int`.

2018-03-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65841/#review198440
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['65839', '65840', '65841']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65841

Relevant logs:

- 
[libprocess-tests-cmake-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65841/logs/libprocess-tests-cmake-stdout.log):

```


   "D:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj" 
(default target) (1) ->
   "D:\DCOS\mesos\3rdparty\libprocess\src\process.vcxproj" (default target) 
(10) ->
   (ClCompile target) -> 
 D:\DCOS\mesos\mesos\3rdparty\libprocess\src\jwt.cpp(144): warning 
C4244: 'initializing': conversion from 'double' to 'int64_t', possible loss of 
data [D:\DCOS\mesos\3rdparty\libprocess\src\process.vcxproj]
 D:\DCOS\mesos\mesos\3rdparty\libprocess\src\jwt.cpp(144): warning 
C4244: 'initializing': conversion from 'double' to 'const int64_t', possible 
loss of data [D:\DCOS\mesos\3rdparty\libprocess\src\process.vcxproj]
 D:\DCOS\mesos\mesos\3rdparty\libprocess\src\ssl\utilities.cpp(360): 
warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of 
data [D:\DCOS\mesos\3rdparty\libprocess\src\process.vcxproj]
 
D:\DCOS\mesos\mesos\3rdparty\libprocess\src\libevent_ssl_socket.cpp(601): 
warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of 
data [D:\DCOS\mesos\3rdparty\libprocess\src\process.vcxproj]


   "D:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj" 
(default target) (1) ->
   "D:\DCOS\mesos\3rdparty\libprocess\src\tests\ssl-client.vcxproj" 
(default target) (18) ->
   (ClCompile target) -> 
 
D:\DCOS\mesos\mesos\3rdparty\libprocess\include\process/gtest.hpp(731): error 
C3861: 'AssertSignaled': identifier not found 
[D:\DCOS\mesos\3rdparty\libprocess\src\tests\ssl-client.vcxproj]
 
D:\DCOS\mesos\mesos\3rdparty\libprocess\include\process/gtest.hpp(770): error 
C3861: 'AssertTermSigEq': identifier not found 
[D:\DCOS\mesos\3rdparty\libprocess\src\tests\ssl-client.vcxproj]
 
D:\DCOS\mesos\mesos\3rdparty\libprocess\include\process/gtest.hpp(817): error 
C3861: 'AssertTermSigNe': identifier not found 
[D:\DCOS\mesos\3rdparty\libprocess\src\tests\ssl-client.vcxproj]


   "D:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj" 
(default target) (1) ->
   "D:\DCOS\mesos\3rdparty\libprocess\src\tests\benchmarks.vcxproj" 
(default target) (7) ->
 
D:\DCOS\mesos\mesos\3rdparty\libprocess\include\process/gtest.hpp(731): error 
C3861: 'AssertSignaled': identifier not found (compiling source file 
D:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\benchmarks.cpp) 
[D:\DCOS\mesos\3rdparty\libprocess\src\tests\benchmarks.vcxproj]
 
D:\DCOS\mesos\mesos\3rdparty\libprocess\include\process/gtest.hpp(770): error 
C3861: 'AssertTermSigEq': identifier not found (compiling source file 
D:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\benchmarks.cpp) 
[D:\DCOS\mesos\3rdparty\libprocess\src\tests\benchmarks.vcxproj]
 
D:\DCOS\mesos\mesos\3rdparty\libprocess\include\process/gtest.hpp(817): error 
C3861: 'AssertTermSigNe': identifier not found (compiling source file 
D:\DCOS\mesos\mesos\3rdparty\libprocess\src\tests\benchmarks.cpp) 
[D:\DCOS\mesos\3rdparty\libprocess\src\tests\benchmarks.vcxproj]

23 Warning(s)
6 Error(s)

Time Elapsed 00:06:02.26
```

- 
[mesos-tests-cmake-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65841/logs/mesos-tests-cmake-stdout.log):

```
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(392): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(414): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(415): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(437): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(438): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of data 
[D:\DCOS\mesos\src\java\mesos-java.vcxproj]
 D:\DCOS\mesos\mesos\src\java\jni\convert.cpp(460): warning C4267: 
'argument': conversion from 'size_t' to 'jsize', possible loss of 

Review Request 65861: Windows: Removed use of W* signal macros in libprocess.

2018-03-01 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65861/
---

Review request for mesos, Andrew Schwartzmeyer and Joseph Wu.


Repository: mesos


Description
---

Windows doesn't use signals, so all uses of the W* signal macros on
Windows has been #ifdef'd out.


Diffs
-

  3rdparty/libprocess/include/process/gtest.hpp 
dd4c9bdf7e6e0e45db347108fde3b41960974be6 


Diff: https://reviews.apache.org/r/65861/diff/1/


Testing
---


Thanks,

Akash Gupta



Review Request 65862: Windows: Removed signal macro in `checks/checker_process.cpp`.

2018-03-01 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65862/
---

Review request for mesos, Andrew Schwartzmeyer and Joseph Wu.


Repository: mesos


Description
---

`checker_process.cpp` used signals to determine if the task has been
killed. This logic has been #ifdef'd out, since signals aren't
supported on Windows. A different implemention will be needed when
nested command checks work on Windows.


Diffs
-

  src/checks/checker_process.cpp cf9ec053946e620eb36e92d647ab864c4e88d506 


Diff: https://reviews.apache.org/r/65862/diff/1/


Testing
---


Thanks,

Akash Gupta



Review Request 65863: Updated mesos code with the new `os::system`.

2018-03-01 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65863/
---

Review request for mesos, Andrew Schwartzmeyer, Joseph Wu, and Michael Park.


Bugs: MESOS-4549
https://issues.apache.org/jira/browse/MESOS-4549


Repository: mesos


Description
---

`os::system`'s return type was changed from `int` to `Option`, so
the code in mesos was changed accordingly.


Diffs
-

  src/slave/containerizer/mesos/launch.cpp 
75b7eaf9cd62d6b5f02896175168b651f4517e12 
  src/tests/containerizer/cgroups_isolator_tests.cpp 
59b23be78efb4025bfef53a19ab97d9873ab8dc9 
  src/tests/containerizer/memory_pressure_tests.cpp 
0c3e738ce05553a2ee5c38c6748d6d28a1eb93d3 
  src/tests/environment.cpp 1cba274e0e684b123ce1e4d9cd296f428022fcdc 


Diff: https://reviews.apache.org/r/65863/diff/1/


Testing
---


Thanks,

Akash Gupta



Re: Review Request 65842: Stout: Changed `os::spawn` to return `Option` instead of `int`.

2018-03-01 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65842/
---

(Updated March 1, 2018, 9:50 a.m.)


Review request for mesos, Andrew Schwartzmeyer, Joseph Wu, and Michael Park.


Changes
---

split commit into 2.


Summary (updated)
-

Stout: Changed `os::spawn` to return `Option` instead of `int`.


Bugs: MESOS-4549
https://issues.apache.org/jira/browse/MESOS-4549


Repository: mesos


Description
---

Similar to `os::system`, `os::spawn` returned -1 on error, which is a
valid exit code on Windows. Using the same solution for `os::system`,
now when `os::spawn` fails, `None` will be returned. Otherwise, the
process exit code will be returned.


Diffs (updated)
-

  3rdparty/stout/include/stout/os/posix/copyfile.hpp 
0d9ed5babbff05a84740df3dbe1594f0e3012360 
  3rdparty/stout/include/stout/os/posix/shell.hpp 
b878718e137e4c8b599e0f4dac67672d8df27f7d 
  3rdparty/stout/include/stout/os/windows/shell.hpp 
1696d084c8453fa0eb9ef41e0a4128e547f7f53c 


Diff: https://reviews.apache.org/r/65842/diff/2/

Changes: https://reviews.apache.org/r/65842/diff/1-2/


Testing
---

make check


Thanks,

Akash Gupta



Review Request 65864: Updated mesos code with the new `os::spawn`.

2018-03-01 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65864/
---

Review request for mesos.


Repository: mesos


Description
---

`os::spawn`'s return type was changed from `int` to `Option`, so
the code in mesos was changed accordingly.


Diffs
-

  src/linux/fs.cpp dae094224321c0974c705023daf076409049de51 
  src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
c60c23f74f2abf6bef8dd32cc2e47e33bf666169 
  src/tests/containerizer/perf_tests.cpp 
d8aab08eb131f974821fb85662cbc6cc685d2f3e 


Diff: https://reviews.apache.org/r/65864/diff/1/


Testing
---


Thanks,

Akash Gupta



Re: Review Request 65839: Windows: Removed stout W* signal macros in `windows.hpp`.

2018-03-01 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65839/
---

(Updated March 1, 2018, 9:45 a.m.)


Review request for mesos, Andrew Schwartzmeyer and Joseph Wu.


Changes
---

split commit into three


Summary (updated)
-

Windows: Removed stout W* signal macros in `windows.hpp`.


Repository: mesos


Description
---

`windows.hpp` defined a few macros for the status value of `waitpid`
that were either unused in the code base or were only used in gtest.
These macros handled signal logic, so they were removed since they
aren't used on Windows.


Diffs (updated)
-

  3rdparty/stout/include/stout/gtest.hpp 
ce6e4de6a9f69ea121ff0ca6048316d678c8c857 
  3rdparty/stout/include/stout/windows.hpp 
b35e6b94ba6709254450be9429b6f48f2d276689 


Diff: https://reviews.apache.org/r/65839/diff/2/

Changes: https://reviews.apache.org/r/65839/diff/1-2/


Testing
---


Thanks,

Akash Gupta



Re: Review Request 65759: Added inspect retries to the Docker executor.

2018-03-01 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65759/#review198438
---


Ship it!




Ship It!

- Gilbert Song


On Feb. 28, 2018, 10:05 a.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65759/
> ---
> 
> (Updated Feb. 28, 2018, 10:05 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
> Michael Park.
> 
> 
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds retries for `inspect` command to workaround docker
> daemon hangs. We assume that the docker daemon can be temporarily
> unresponsive. If it's unresponsive, then any started docker cli
> command hangs. To address the issue, we retry `inspect` in the loop.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65759/diff/6/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> Manually, described in /r/65713
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 65713: Handled hanging docker `stop`, `inspect` commands in docker executor.

2018-03-01 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65713/#review198436
---




src/docker/executor.cpp
Lines 467-468 (patched)


In this case, should we log it?



src/docker/executor.cpp
Lines 540-544 (patched)


Should we just call stop.discard() here and return stop?

If it is not killed by health check, we timed out the docker stop and we 
should discard it. If it is killed by health check, .discard() would trigger we 
do os::killtree on that docker stop subprocess and then return a failure, which 
invokes the onFailed callback below and retry.


- Gilbert Song


On Feb. 27, 2018, 5:37 p.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65713/
> ---
> 
> (Updated Feb. 27, 2018, 5:37 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-8574
> https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previosly, if `docker inspect` command hanged, the docker container
> ended up in an unkillable state. This patch adds a timeout for inspect
> command after receiving `killTask` analogically to `reaped` handler.
> In addition we've added a timeout for `docker stop` command. If docker
> `stop` or `inspect` command times out, we discard the related future,
> thus the docker library kills previously spawned docker cli subprocess.
> As a result, a scheduler can retry `killTask` operation to handle
> nasty docker bugs that lead to hanging docker cli.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65713/diff/5/
> 
> 
> Testing
> ---
> 
> internal CI
> 
> Manual testing:
> 1. Build docker from sources: 
> http://oyvindsk.com/writing/docker-build-from-source
> 2. Modify `ContainerInspect` function from `docker/inspect.go`:
> ```
>  func (daemon *Daemon) ContainerInspect(name string, size bool, version 
> string) (interface{}, error) {
> +   time.Sleep(10 * time.Second)
> ```
> 3. Modify `ContainerStop` function from `docker/stop.go`:
> ```
>  func (daemon *Daemon) ContainerStop(name string, seconds *int) error {
> +   rand.Seed(time.Now().UTC().UnixNano())
> +   if rand.Intn(2) == 0 {
> +   time.Sleep(20 * time.Second)
> +   }
> ```
> 4. Rebuild docker: `sudo make build && sudo make binary`
> 5. Stop system docker daemon: `sudo service docker stop`
> 6. Start modified docker daemon: `sudo ./bundles/binary-daemon/dockerd-dev`
> 7. Modify `src/cli/execute.cpp`:
>   a) Add `delay(Seconds(15), self(), ::retryKill, task->task_id(), 
> offer.agent_id());` after 
> https://github.com/apache/mesos/blob/072ea2787ffca6f2a6dcb2d636f68c51823d6665/src/cli/execute.cpp#L606
>   b) Add a new method `retryKill` to `CommandScheduler`:
> ```
>   void retryKill(const TaskID& taskId, const AgentID& agentId)
>   {
> killTask(taskId, agentId);
> delay(Seconds(6), self(), ::retryKill, taskId, agentId);
>   }
> ```
> 8. Rebuild mesos
> 9. Run mesos master: `./bin/mesos-master.sh --work_dir='var/master-1'`
> 10. Run mesos agent: `GLOG_v=1 ./bin/mesos-agent.sh 
> --resources="cpus:1;mem:100" 
> --work_dir='/home/abudnik/mesos/build/var/agent-1' 
> --containerizers="docker,mesos" --master="127.0.1.1:5050"`
> 11. Submit a task for the docker executor: `./src/mesos-execute 
> --master="127.0.1.1:5050" --name="a" --containerizer=docker 
> --docker_image="ubuntu:xenial" --command="sleep "`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 65853: Updated createQuotaInfo helper to handle quota limit.

2018-03-01 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65853/#review198437
---



Patch looks great!

Reviews applied: [65334, 65851, 65852, 65780, 65781, 65782, 65783, 65784, 
65785, 65853]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 28, 2018, 6:34 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65853/
> ---
> 
> (Updated Feb. 28, 2018, 6:34 p.m.)
> 
> 
> Review request for mesos, Michael Park and Meng Zhu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Note that this has no effect on the existing code paths since the
> limit will be empty.
> 
> 
> Diffs
> -
> 
>   src/master/quota.hpp 7f43996d9aa3d3f462a6ed750733a55d4ef770c8 
>   src/master/quota.cpp 58bab6a678bac9e41a7994ba0b7cc1ed069a8a18 
> 
> 
> Diff: https://reviews.apache.org/r/65853/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>