Re: Review Request 64932: Added example framework converting disk resources.

2018-01-04 Thread Gaston Kleiman


> On Jan. 4, 2018, 12:49 p.m., Gaston Kleiman wrote:
> > src/examples/test_csi_user_framework.cpp
> > Lines 165-184 (patched)
> > 
> >
> > This framework will never launch a task, so this is probably not needed.
> > 
> > Furthermore it should probably crash if it receivres an executor 
> > failure...
> 
> Greg Mann wrote:
> FAILURE events can also signal that an agent was removed, so this could 
> happen.

Ok, that makes sense, I'd keep the `Agent failed` section and remove the other 
one.


- Gaston


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64932/#review194780
---


On Jan. 3, 2018, 2:50 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64932/
> ---
> 
> (Updated Jan. 3, 2018, 2:50 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch introduces an example HTTP framework which transforms
> 'RAW' disk resources from resource providers into 'MOUNT' volumes and
> subsequently unreserves them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 30cd4d426e797e4c8ee556d1bc3de99830a5fe41 
>   src/examples/CMakeLists.txt d4f1af4f072efdc68fa0b722f42b1d8aa1779b6e 
>   src/examples/test_csi_user_framework.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64932/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 64932: Added example framework converting disk resources.

2018-01-04 Thread Greg Mann


> On Jan. 4, 2018, 8:49 p.m., Gaston Kleiman wrote:
> > src/examples/test_csi_user_framework.cpp
> > Lines 149 (patched)
> > 
> >
> > This framework never starts tasks, so should we crash or log something 
> > saying that we don't expect to get task status updates?

+1


> On Jan. 4, 2018, 8:49 p.m., Gaston Kleiman wrote:
> > src/examples/test_csi_user_framework.cpp
> > Lines 165-184 (patched)
> > 
> >
> > This framework will never launch a task, so this is probably not needed.
> > 
> > Furthermore it should probably crash if it receivres an executor 
> > failure...

FAILURE events can also signal that an agent was removed, so this could happen.


> On Jan. 4, 2018, 8:49 p.m., Gaston Kleiman wrote:
> > src/examples/test_csi_user_framework.cpp
> > Lines 331-332 (patched)
> > 
> >
> > I'd phrase it:
> > 
> > If we didn't create an `ACCEPT` operation, create a `DELCINE` operation.

+1


> On Jan. 4, 2018, 8:49 p.m., Gaston Kleiman wrote:
> > src/examples/test_csi_user_framework.cpp
> > Lines 335 (patched)
> > 
> >
> > Remove this line, the if statement ensures that no accept call was 
> > created.

The accept field could be set but contain no operations, so I think this line 
should stay.


> On Jan. 4, 2018, 8:49 p.m., Gaston Kleiman wrote:
> > src/examples/test_csi_user_framework.cpp
> > Lines 380-386 (patched)
> > 
> >
> > This method doesn't seem to be used. `int main(...)` should probably 
> > call `Flags::setUsageMessage` instead of this.

Yea weird, we have these in a couple other example frameworks as well. Should 
probably clean them up when we have a chance.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64932/#review194780
---


On Jan. 3, 2018, 10:50 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64932/
> ---
> 
> (Updated Jan. 3, 2018, 10:50 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch introduces an example HTTP framework which transforms
> 'RAW' disk resources from resource providers into 'MOUNT' volumes and
> subsequently unreserves them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 30cd4d426e797e4c8ee556d1bc3de99830a5fe41 
>   src/examples/CMakeLists.txt d4f1af4f072efdc68fa0b722f42b1d8aa1779b6e 
>   src/examples/test_csi_user_framework.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64932/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 64932: Added example framework converting disk resources.

2018-01-04 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64932/#review194830
---


Fix it, then Ship it!





src/examples/test_csi_user_framework.cpp
Lines 262 (patched)


s/single/single call/



src/examples/test_csi_user_framework.cpp
Lines 271 (patched)


Suggestion:

s/resourceType/resourcesByType/

here and below



src/examples/test_csi_user_framework.cpp
Lines 282 (patched)


Is it not possible that a single resource provider can offer both reserved 
and unreserved resources? Even if the current SLRP provides this guarantee, can 
we be sure that all RPs will honor it?



src/examples/test_csi_user_framework.cpp
Lines 318-319 (patched)


Nit: could you add a newline between these lines for consistency?


- Greg Mann


On Jan. 3, 2018, 10:50 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64932/
> ---
> 
> (Updated Jan. 3, 2018, 10:50 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch introduces an example HTTP framework which transforms
> 'RAW' disk resources from resource providers into 'MOUNT' volumes and
> subsequently unreserves them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 30cd4d426e797e4c8ee556d1bc3de99830a5fe41 
>   src/examples/CMakeLists.txt d4f1af4f072efdc68fa0b722f42b1d8aa1779b6e 
>   src/examples/test_csi_user_framework.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64932/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 64939: Abort libprocess when a Process throws an uncaught exception.

2018-01-04 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64939/#review194824
---



PASS: Mesos patch 64939 was successfully built and tested.

Reviews applied: `['64939']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64939

- Mesos Reviewbot Windows


On Jan. 4, 2018, 1:30 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64939/
> ---
> 
> (Updated Jan. 4, 2018, 1:30 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Hindman, Ilya Pronin, 
> Jie Yu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously, we logged the exception, terminated the throwing
> Process, and continued to run. However, currently there exists
> no known user-level code that I'm aware of that handles the
> unexpected termination due to an uncaught exception.
> 
> Generally, this means that when an exception is thrown (e.g.
> a bad call to std::map::at), the process terminates with a log
> message but things get "stuck" and the user has to debug what
> is wrong / kill the process.
> 
> Libprocess would likely need to provide some primitives to
> better support handling unexpected termination of a Process
> in order for us to provide a strategy where we continue running.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/process.cpp 
> 75cf1d3b6d3d257ba9bc81c68017a74a6511cebf 
> 
> 
> Diff: https://reviews.apache.org/r/64939/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Review Request 64968: Removed the inaccurate `slaves.transitioning` function in the master.

2018-01-04 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64968/
---

Review request for mesos, Vinod Kone and Jiang Yan Xu.


Bugs: MESOS-8390
https://issues.apache.org/jira/browse/MESOS-8390


Repository: mesos


Description
---

This function used to return whether the agent was transitioning
between states, however it was updated in e510813f93e253480005ce
to return only when the agent is recovered.

This patch removes the now confusing function in favor of just
directly checking the state we care about in the reconciliation
logic. Since there were no other usages, the function is removed.


Diffs
-

  src/master/master.hpp 130f6e28cc62a8912aac66ecfbf014fe1ee444e3 
  src/master/master.cpp 28d8be3a4769b418b61cff0b95845e4232135bc7 


Diff: https://reviews.apache.org/r/64968/diff/1/


Testing
---

make check


Thanks,

Benjamin Mahler



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-04 Thread Vinod Kone


> On Jan. 5, 2018, 1:25 a.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Lines 10037-10056 (original), 10039-10062 (patched)
> > 
> >
> > I think we shouldn't create a TASK_UNREACHABLE status update and call 
> > `updateTask` or `forward` for a terminal task at all. . Also, `forward` 
> > sends TASK_UNREACHABLE update for terminal task to the framework which 
> > looks incorrect.
> > 
> > 
> > Ideally, we want terminal but unacknowledged tasks to still be marked 
> > unreachable in some way, either via task state being TASK_UNREACHABLE or 
> > task being present in `unreachableTasks`. This allows, for example, the 
> > WebUI to not show sandbox links for unreachable tasks irrespective of 
> > whether they were terminal or not before going unreachable. 
> > 
> > But doing this is tricky for various reasons:
> > 
> > --> `updateTask()` doesn't allow a terminal state to be transitioned to 
> > TASK_UNREACHABLE. Right now when we call `updateTask` for a terminal task, 
> > it adds TASK_UNREACHABLE status to `Task.statuses` and also sends it to 
> > operator API stream subscribers which looks incorrect. The fact that 
> > `updateTask` internally deals with already terminal tasks is a bad design 
> > decision in retrospect. I think the callers shouldn't call it for terminal 
> > tasks instead.
> > 
> > --> It's not clear to our users what a `completed` task means. The 
> > intention was for this to hold a cache of terminal and acknowledged tasks 
> > for storing recent history. The users of the WebUI probably equate 
> > "Completed Tasks" to terminal tasks irrespective of their acknowledgement 
> > status, which is why it is confusing for them to see terminal but 
> > unacknowledged tasks in the "Active tasks" section in the WebUI.
> > 
> > --> When a framework reconciles the state of a task on an unreachable 
> > agent, master replies with TASK_UNREACHABLE irrespective of whether the 
> > task was in a non-terminal state or terminal but un-acknowledged state or 
> > terminal and acknowledged state when the agent went unreachable.  
> > 
> > I think the direction we want to go towards is
> > 
> > --> Completed tasks should consist of terminal unacknowledged and 
> > terminal acknowled tasks, likely in two different data structures.
> > --> Unreachable tasks should consist of all non-complete tasks on an 
> > unreachable agent.  All the tasks in this map should be in TASK_UNREACHABLE 
> > state.
> > 
> > 
> > Given all the above is a very involved change, I would recommend 
> > keeping what you have here but with a giant TODO (your current comment in 
> > #10058 doesn't go into enough detail about the complexity here) that talks 
> > about the above stuff. Your change at least keeps the parity with the 
> > (broken) semantics that we have in 1.4 and earlier so that's a bit better.

Ignore the first line. Forgot to delete it.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194794
---


On Jan. 4, 2018, 1:35 a.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 4, 2018, 1:35 a.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
> https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If an agent is lost, we try to remove all the tasks that might
> have been lost. However, if a task is already terminal, it hasn't
> really been lost so we should not be tracking it in the framework's
> unreachable tasks list.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 130f6e28cc62a8912aac66ecfbf014fe1ee444e3 
>   src/master/master.cpp 28d8be3a4769b418b61cff0b95845e4232135bc7 
>   src/tests/partition_tests.cpp 3813139f576ea01db0197f0fe8a73597db1bb69a 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/3/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-04 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194794
---




src/master/master.cpp
Lines 10037-10056 (original), 10039-10062 (patched)


I think we shouldn't create a TASK_UNREACHABLE status update and call 
`updateTask` or `forward` for a terminal task at all. . Also, `forward` sends 
TASK_UNREACHABLE update for terminal task to the framework which looks 
incorrect.

Ideally, we want terminal but unacknowledged tasks to still be marked 
unreachable in some way, either via task state being TASK_UNREACHABLE or task 
being present in `unreachableTasks`. This allows, for example, the WebUI to not 
show sandbox links for unreachable tasks irrespective of whether they were 
terminal or not before going unreachable. 

But doing this is tricky for various reasons:

--> `updateTask()` doesn't allow a terminal state to be transitioned to 
TASK_UNREACHABLE. Right now when we call `updateTask` for a terminal task, it 
adds TASK_UNREACHABLE status to `Task.statuses` and also sends it to operator 
API stream subscribers which looks incorrect. The fact that `updateTask` 
internally deals with already terminal tasks is a bad design decision in 
retrospect. I think the callers shouldn't call it for terminal tasks instead.

--> It's not clear to our users what a `completed` task means. The 
intention was for this to hold a cache of terminal and acknowledged tasks for 
storing recent history. The users of the WebUI probably equate "Completed 
Tasks" to terminal tasks irrespective of their acknowledgement status, which is 
why it is confusing for them to see terminal but unacknowledged tasks in the 
"Active tasks" section in the WebUI.

--> When a framework reconciles the state of a task on an unreachable 
agent, master replies with TASK_UNREACHABLE irrespective of whether the task 
was in a non-terminal state or terminal but un-acknowledged state or terminal 
and acknowledged state when the agent went unreachable.  

I think the direction we want to go towards is

--> Completed tasks should consist of terminal unacknowledged and terminal 
acknowled tasks, likely in two different data structures.
--> Unreachable tasks should consist of all non-complete tasks on an 
unreachable agent.  All the tasks in this map should be in TASK_UNREACHABLE 
state.

Given all the above is a very involved change, I would recommend keeping 
what you have here but with a giant TODO (your current comment in #10058 
doesn't go into enough detail about the complexity here) that talks about the 
above stuff. Your change at least keeps the parity with the (broken) semantics 
that we have in 1.4 and earlier so that's a bit better.



src/master/master.cpp
Line 10273 (original), 10279-10283 (patched)


I would kill this CHECK in favor of the CHECK that already exists in the 
`Framework::removeTask`. AFAICT `Master::removeTask` shouldn't care about it.



src/tests/master_tests.cpp
Lines 7601 (patched)


How about:

// This test verifies that when the master removes an agent any 
unacknowledged but terminal tasks
// on the agent are moved to `completed` but keep their original terminal 
state.



src/tests/master_tests.cpp
Lines 7630 (patched)


`the task`, which task? are both the tasks using the same executor?



src/tests/master_tests.cpp
Lines 7630-7638 (patched)


This seems a bit complicated. You could alternatively just start the 
scheduler driver with explicit acknowledgements and just not acknowledge 
terminal updates when you receive them.



src/tests/master_tests.cpp
Lines 7645-7653 (patched)


Why are you launching 2 tasks instead of just one? You just want to test 
the unacknowledged terminal task case right? Doesn't really matter if it is 
TASK_FINISHED or TASK_LOST right?



src/tests/master_tests.cpp
Lines 7692-7711 (patched)


Are you verifying that the latest state of the task has been preserved? I 
would recommend hitting "/tasks" endpoint instead of "state-summary" to be more 
direct.


- Vinod Kone


On Jan. 4, 2018, 1:35 a.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 4, 2018, 1:35 a.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337

Re: Review Request 63860: Windows: Mapped the Docker network info types.

2018-01-04 Thread Jie Yu


> On Jan. 4, 2018, 8:42 p.m., Jie Yu wrote:
> > I like the direction in the latest diff (platform specific defaults).
> > 
> > One question is: what's the difference between nat and bridge? are they the 
> > same? If yes, can we not have NAT in the API? In other words, BRIDGE on 
> > Windows will map to "nat".
> 
> Akash Gupta wrote:
> I don't remember if they are 100% the same, but they are very similar. We 
> can map BRIDGE to "nat" on Windows with a note in the doc that says that 
> BRIDGE="nat" on Windows.

Yes, please do that.


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63860/#review194784
---


On Jan. 5, 2018, 12:29 a.m., Akash Gupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63860/
> ---
> 
> (Updated Jan. 5, 2018, 12:29 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Andrew Schwartzmeyer, Gaston 
> Kleiman, Jie Yu, John Kordich, Joseph Wu, and Michael Park.
> 
> 
> Bugs: MESOS-7342
> https://issues.apache.org/jira/browse/MESOS-7342
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current Network enum in DockerInfo is specific to Linux containers.
> Instead of supporting {host, bridge, none, user} networks, Windows
> docker supports {nat, none, user} networks. Now, the Network enum
> supports {host, bridge, nat, none, user}. If the network isn't
> specified, then the default is host on Linux and nat on Windows.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto c677a8be07d0ef209d42622ae32056d36e55ff78 
>   src/docker/docker.cpp 722a54ad113fc4e2bb22a8f08e307ab38d5fbfed 
> 
> 
> Diff: https://reviews.apache.org/r/63860/diff/5/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/63862/ for test results.
> 
> 
> Thanks,
> 
> Akash Gupta
> 
>



Re: Review Request 64936: Improved the documentation of protos related to operation feedback.

2018-01-04 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64936/#review194815
---



PASS: Mesos patch 64936 was successfully built and tested.

Reviews applied: `['64936']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64936

- Mesos Reviewbot Windows


On Jan. 3, 2018, 11:49 p.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64936/
> ---
> 
> (Updated Jan. 3, 2018, 11:49 p.m.)
> 
> 
> Review request for mesos and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Improved the documentation of protos related to operation feedback.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto c677a8be07d0ef209d42622ae32056d36e55ff78 
>   include/mesos/scheduler/scheduler.proto 
> 4258fb38a412a2c9d977b1b38c97cc3ab9d5090e 
>   include/mesos/v1/mesos.proto da7b4587891c47c02345209e0fdca60585a36fdc 
>   include/mesos/v1/scheduler/scheduler.proto 
> 688ba55b11cf21bc71c15a711c2b2ac5d8655c9f 
> 
> 
> Diff: https://reviews.apache.org/r/64936/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>



Re: Review Request 64387: Windows: Ported docker health check tests.

2018-01-04 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64387/
---

(Updated Jan. 5, 2018, 12:33 a.m.)


Review request for mesos, Alexander Rukletsov, Andrew Schwartzmeyer, Gaston 
Kleiman, Jie Yu, Joseph Wu, and Michael Park.


Changes
---

Rebased.


Repository: mesos


Description
---

The `HealthCheckTest.ROOT_DOCKER_*` and
`DockerContainerizerHealthCheckTest.*` tests now work on Windows.


Diffs (updated)
-

  src/tests/environment.cpp 72bd621f02f97ea5fd553f3dc0bd52adb8ddee8f 
  src/tests/health_check_tests.cpp 1893c85169f5e94e164434b93e6a24268224225d 
  src/tests/mesos.hpp 93913f2e01898c73e09de58a975aa467e714d882 


Diff: https://reviews.apache.org/r/64387/diff/4/

Changes: https://reviews.apache.org/r/64387/diff/3-4/


Testing
---

Windows Server:
[==] Running 5 tests from 2 test cases.
[--] Global test environment set-up.
[--] 2 tests from HealthCheckTest
[ RUN  ] HealthCheckTest.ROOT_DOCKER_DockerHealthyTask
[   OK ] HealthCheckTest.ROOT_DOCKER_DockerHealthyTask (21263 ms)
[ RUN  ] HealthCheckTest.ROOT_DOCKER_DockerHealthStatusChange
[   OK ] HealthCheckTest.ROOT_DOCKER_DockerHealthStatusChange (23512 ms)
[--] 2 tests from HealthCheckTest (44835 ms total)

[--] 3 tests from NetworkProtocol/DockerContainerizerHealthCheckTest
[ RUN  ] 
NetworkProtocol/DockerContainerizerHealthCheckTest.ROOT_DOCKER_USERNETWORK_NETNAMESPACE_HealthyTaskViaHTTP/0
[   OK ] 
NetworkProtocol/DockerContainerizerHealthCheckTest.ROOT_DOCKER_USERNETWORK_NETNAMESPACE_HealthyTaskViaHTTP/0
 (28487 ms)
[ RUN  ] 
NetworkProtocol/DockerContainerizerHealthCheckTest.ROOT_DOCKER_USERNETWORK_NETNAMESPACE_HealthyTaskViaHTTPS/0
[   OK ] 
NetworkProtocol/DockerContainerizerHealthCheckTest.ROOT_DOCKER_USERNETWORK_NETNAMESPACE_HealthyTaskViaHTTPS/0
 (26447 ms)
[ RUN  ] 
NetworkProtocol/DockerContainerizerHealthCheckTest.ROOT_DOCKER_USERNETWORK_NETNAMESPACE_HealthyTaskViaTCP/0
[   OK ] 
NetworkProtocol/DockerContainerizerHealthCheckTest.ROOT_DOCKER_USERNETWORK_NETNAMESPACE_HealthyTaskViaTCP/0
 (26264 ms)
[--] 3 tests from NetworkProtocol/DockerContainerizerHealthCheckTest 
(81268 ms total)

[--] Global test environment tear-down
[==] 5 tests from 2 test cases ran. (126559 ms total)
[  PASSED  ] 5 tests

Rest of tests pass.

Windows Client (Disabled network health checks):
Proof that network health checks are skipped on Windows Client.
C:\Program Files\Docker\Docker\Resources\bin\docker.exe: Error response from 
daemon: sharing of hyperv containers network is not supported.
356b087e7fa640f83fe27ebeb3396bfc7b2bbebd917aeaec0508b887b41d31f4
-
We cannot run any Docker health checks tests because:
Running in another container's namespace is not supported on this version of 
Windows.

Rest rests pass.

Linux:
make check passes


Thanks,

Akash Gupta



Re: Review Request 64570: Windows: Temporarily fixed the docker executor.

2018-01-04 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64570/
---

(Updated Jan. 5, 2018, 12:32 a.m.)


Review request for mesos, Alexander Rukletsov, Andrew Schwartzmeyer, Gaston 
Kleiman, Jie Yu, and Michael Park.


Changes
---

Rebased.


Repository: mesos


Description
---

The docker executor was setting its `PATH` through
`os::host_default_path()`, so it could not find the docker executable.
Now, on Windows, the docker executor inherits the environment variables
from the agent. Further investigation needs to be done on why `PATH`
was this way and if the docker executor could simply inherit the
agent's environment.


Diffs (updated)
-

  src/slave/containerizer/docker.cpp 890cb2aba44fe76e891117833eac8ccca00b759b 


Diff: https://reviews.apache.org/r/64570/diff/2/

Changes: https://reviews.apache.org/r/64570/diff/1-2/


Testing
---

See https://reviews.apache.org/r/64386/


Thanks,

Akash Gupta



Re: Review Request 64604: Windows: Updated heath-checks.md with Windows implementation.

2018-01-04 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64604/
---

(Updated Jan. 5, 2018, 12:32 a.m.)


Review request for mesos, Alexander Rukletsov, Andrew Schwartzmeyer, Gaston 
Kleiman, Jie Yu, and Michael Park.


Changes
---

Rebased.


Repository: mesos


Description
---

Updated health-checks.md with details of the Windows health check
implementations for the mesos and docker executors.


Diffs (updated)
-

  docs/health-checks.md b9b6327dea8c70dbc4aab204487a088f76c79479 


Diff: https://reviews.apache.org/r/64604/diff/2/

Changes: https://reviews.apache.org/r/64604/diff/1-2/


Testing
---


Thanks,

Akash Gupta



Re: Review Request 64386: Windows: Enabled docker health checks.

2018-01-04 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64386/
---

(Updated Jan. 5, 2018, 12:31 a.m.)


Review request for mesos, Alexander Rukletsov, Andrew Schwartzmeyer, Gaston 
Kleiman, Jie Yu, Joseph Wu, and Michael Park.


Changes
---

Rebased.


Repository: mesos


Description
---

The network health checks called curl and then executed setns to
enter to container network namespace, which did not work on Windows.
To do the equivalent, Windows nows calls docker run with powershell's
curl equivalent (Invoke-WebRequest) and uses the
network=container: flag to enter the container's namespace.


Diffs (updated)
-

  src/checks/checker.hpp 93502270f31e80c5f7c94b5b456625e9cdea1837 
  src/checks/checker.cpp fff0aac504b4283a210f936e00c977fa60d88b3d 
  src/checks/checker_process.hpp 510f3b2e6e689faaf26595214ce377c2b5518f28 
  src/checks/checker_process.cpp ddb197b8cc2c503fef5ae20af32f5881fff9833f 
  src/checks/health_checker.hpp 019fbd791f250ecc28ff59d779f90e7ccbf0c685 
  src/checks/health_checker.cpp eaf9a18817eeeff7c29c7a4b9d1b183f398760a3 
  src/docker/docker.hpp d9e71f8841a868082170d28fc4f3d495e2eb1e61 
  src/docker/executor.cpp e4c53d558e414e50b1c429fba8e31e504c63744a 
  src/launcher/executor.cpp 794bf7ca07e68c7d83993546c134f85cac5a68e3 


Diff: https://reviews.apache.org/r/64386/diff/4/

Changes: https://reviews.apache.org/r/64386/diff/3-4/


Testing
---

See https://reviews.apache.org/r/64387/


Thanks,

Akash Gupta



Re: Review Request 63862: Windows: Ported docker_tests.cpp.

2018-01-04 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63862/
---

(Updated Jan. 5, 2018, 12:30 a.m.)


Review request for mesos, Alexander Rukletsov, Andrew Schwartzmeyer, Gaston 
Kleiman, Jie Yu, John Kordich, Joseph Wu, and Michael Park.


Changes
---

Rebased.


Bugs: MESOS-7342
https://issues.apache.org/jira/browse/MESOS-7342


Repository: mesos


Description
---

Ported the disabled tests to run on Windows. The following tests
could not be ported due to Windows platform limitations and remain
diabled:
  - ROOT_DOCKER_MountRelativeContainerPath (can't mount volumes inside
sandbox).
  - ROOT_DOCKER_NVIDIA_GPU_DeviceAllow (no GPU container support).
  - ROOT_DOCKER_NVIDIA_GPU_InspectDevices (no GPU container support).


Diffs (updated)
-

  src/tests/containerizer/docker_tests.cpp 
0ac4a79e03d5e11ead5a57a9749e26c20a1ddd57 


Diff: https://reviews.apache.org/r/63862/diff/4/

Changes: https://reviews.apache.org/r/63862/diff/3-4/


Testing
---

Windows mesos-test:
[==] 754 tests from 77 test cases ran. (270586 ms total)
[  PASSED  ] 754 tests.


Windows DockerTest only:
[==] 11 tests from 1 test case ran. (85617 ms total)
[  PASSED  ] 11 tests.

Linux DockerTest (only 12 tests instead of 14, because I don't have Nvidia GPU):
[==] 12 tests from 1 test case ran. (12413 ms total)
[  PASSED  ] 12 tests.


Thanks,

Akash Gupta



Re: Review Request 63861: Windows: Updated networking doc.

2018-01-04 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63861/
---

(Updated Jan. 5, 2018, 12:29 a.m.)


Review request for mesos, Alexander Rukletsov, Andrew Schwartzmeyer, Gaston 
Kleiman, Jie Yu, John Kordich, and Michael Park.


Changes
---

Rebased.


Bugs: MESOS-7342
https://issues.apache.org/jira/browse/MESOS-7342


Repository: mesos


Description
---

The networking docs now describe how the Docker network modes in the
`Network` enum work on Windows, since the enum only has Linux network
modes.


Diffs (updated)
-

  docs/networking.md bdd3a762435aae163fc659ccfea7da825983 


Diff: https://reviews.apache.org/r/63861/diff/6/

Changes: https://reviews.apache.org/r/63861/diff/5-6/


Testing
---


Thanks,

Akash Gupta



Re: Review Request 63860: Windows: Mapped the Docker network info types.

2018-01-04 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63860/
---

(Updated Jan. 5, 2018, 12:29 a.m.)


Review request for mesos, Alexander Rukletsov, Andrew Schwartzmeyer, Gaston 
Kleiman, Jie Yu, John Kordich, Joseph Wu, and Michael Park.


Changes
---

Rebased.


Bugs: MESOS-7342
https://issues.apache.org/jira/browse/MESOS-7342


Repository: mesos


Description
---

The current Network enum in DockerInfo is specific to Linux containers.
Instead of supporting {host, bridge, none, user} networks, Windows
docker supports {nat, none, user} networks. Now, the Network enum
supports {host, bridge, nat, none, user}. If the network isn't
specified, then the default is host on Linux and nat on Windows.


Diffs (updated)
-

  include/mesos/mesos.proto c677a8be07d0ef209d42622ae32056d36e55ff78 
  src/docker/docker.cpp 722a54ad113fc4e2bb22a8f08e307ab38d5fbfed 


Diff: https://reviews.apache.org/r/63860/diff/5/

Changes: https://reviews.apache.org/r/63860/diff/4-5/


Testing
---

See https://reviews.apache.org/r/63862/ for test results.


Thanks,

Akash Gupta



Re: Review Request 63859: Windows: Fixed mock signal values in stout.

2018-01-04 Thread Akash Gupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63859/
---

(Updated Jan. 5, 2018, 12:28 a.m.)


Review request for mesos, Andrew Schwartzmeyer and John Kordich.


Changes
---

Rebased.


Bugs: MESOS-7342
https://issues.apache.org/jira/browse/MESOS-7342


Repository: mesos


Description
---

Also fixed the WEXITSTATUS macro to cast the exit code instead of
bit-masking it, since Windows exit codes are 32 bit unsigned ints.


Diffs (updated)
-

  3rdparty/stout/include/stout/windows.hpp 
7aa0ba72c4cc3b688ce6374b3308945ea8cb7572 


Diff: https://reviews.apache.org/r/63859/diff/5/

Changes: https://reviews.apache.org/r/63859/diff/4-5/


Testing
---

See https://reviews.apache.org/r/63862/ for test results.


Thanks,

Akash Gupta



Re: Review Request 63860: Windows: Mapped the Docker network info types.

2018-01-04 Thread Akash Gupta


> On Jan. 4, 2018, 8:42 p.m., Jie Yu wrote:
> > I like the direction in the latest diff (platform specific defaults).
> > 
> > One question is: what's the difference between nat and bridge? are they the 
> > same? If yes, can we not have NAT in the API? In other words, BRIDGE on 
> > Windows will map to "nat".

I don't remember if they are 100% the same, but they are very similar. We can 
map BRIDGE to "nat" on Windows with a note in the doc that says that 
BRIDGE="nat" on Windows.


- Akash


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63860/#review194784
---


On Jan. 4, 2018, 9:40 p.m., Akash Gupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63860/
> ---
> 
> (Updated Jan. 4, 2018, 9:40 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Andrew Schwartzmeyer, Gaston 
> Kleiman, Jie Yu, John Kordich, Joseph Wu, and Michael Park.
> 
> 
> Bugs: MESOS-7342
> https://issues.apache.org/jira/browse/MESOS-7342
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current Network enum in DockerInfo is specific to Linux containers.
> Instead of supporting {host, bridge, none, user} networks, Windows
> docker supports {nat, none, user} networks. Now, the Network enum
> supports {host, bridge, nat, none, user}. If the network isn't
> specified, then the default is host on Linux and nat on Windows.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 839ddb1cb41471d36423a2fc149acf90b973d413 
>   src/docker/docker.cpp 722a54ad113fc4e2bb22a8f08e307ab38d5fbfed 
> 
> 
> Diff: https://reviews.apache.org/r/63860/diff/4/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/63862/ for test results.
> 
> 
> Thanks,
> 
> Akash Gupta
> 
>



Re: Review Request 64964: Bumped the CSI spec bundle to 0.1.0.

2018-01-04 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64964/#review194809
---


Ship it!




Ship It!

- Jie Yu


On Jan. 4, 2018, 11:45 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64964/
> ---
> 
> (Updated Jan. 4, 2018, 11:45 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-8394
> https://issues.apache.org/jira/browse/MESOS-8394
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The bundle is generated as follows:
> 
> 1. Download the tarball from the following link:
>https://github.com/container-storage-interface/spec/releases
> 2. Run the following commands:
>tar zxf v0.1.0.tar.gz
>mv spec-0.1.0 csi-0.1.0
>tar zcf csi-0.1.0.tar.gz csi-0.1.0
> 
> 
> Diffs
> -
> 
>   3rdparty/csi-0.1.0.tar.gz d2ff83e1ae7eed09f6bbab24b2602c93ade00ed8 
> 
> 
> Diff: https://reviews.apache.org/r/64964/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check with `--enable-grpc`
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 64957: Updated README for the gRPC bundle.

2018-01-04 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64957/#review194808
---


Ship it!




Ship It!

- Jie Yu


On Jan. 4, 2018, 9:05 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64957/
> ---
> 
> (Updated Jan. 4, 2018, 9:05 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added the commands for generating the bundle.
> 
> 
> Diffs
> -
> 
>   3rdparty/Makefile.am 805c6e534555ca3ff4320d4cdb6bfb64a36c5600 
>   3rdparty/grpc-1.8.3.readme PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64957/diff/1/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Review Request 64964: Bumped the CSI spec bundle to 0.1.0.

2018-01-04 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64964/
---

Review request for mesos and Jie Yu.


Bugs: MESOS-8394
https://issues.apache.org/jira/browse/MESOS-8394


Repository: mesos


Description
---

The bundle is generated as follows:

1. Download the tarball from the following link:
   https://github.com/container-storage-interface/spec/releases
2. Run the following commands:
   tar zxf v0.1.0.tar.gz
   mv spec-0.1.0 csi-0.1.0
   tar zcf csi-0.1.0.tar.gz csi-0.1.0


Diffs
-

  3rdparty/csi-0.1.0.tar.gz d2ff83e1ae7eed09f6bbab24b2602c93ade00ed8 


Diff: https://reviews.apache.org/r/64964/diff/1/


Testing
---

sudo make check with `--enable-grpc`


Thanks,

Chun-Hung Hsiao



Re: Review Request 64879: Updated fetcher cache tests for the default executor.

2018-01-04 Thread Gaston Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64879/#review194806
---


Ship it!




Ship It!

- Gaston Kleiman


On Dec. 30, 2017, 12:06 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64879/
> ---
> 
> (Updated Dec. 30, 2017, 12:06 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, and Jie Yu.
> 
> 
> Bugs: MESOS-8366
> https://issues.apache.org/jira/browse/MESOS-8366
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated the task launch helper to obtain the ContainerID of the
> launched task, which can be used to correctly construct the sandbox
> path for both the command and default executors.
> 
> Updated the test expectations in cases where the default executor
> container will invoke the fetcher more times than the test is
> expecting.
> 
> Updated various test helper functions to propagate more detailed
> error results that make test failure easier to debug.
> 
> 
> Diffs
> -
> 
>   src/tests/fetcher_cache_tests.cpp 7db7b7dcef27b1686ccae5a7408ff2811a3b9255 
> 
> 
> Diff: https://reviews.apache.org/r/64879/diff/2/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 64879: Updated fetcher cache tests for the default executor.

2018-01-04 Thread James Peach


> On Jan. 2, 2018, 7:23 p.m., Gaston Kleiman wrote:
> > src/tests/fetcher_cache_tests.cpp
> > Lines 391-393 (original), 393-399 (patched)
> > 
> >
> > Can't we just do `return os::access(path, X_OK);` here?

Good point!


> On Jan. 2, 2018, 7:23 p.m., Gaston Kleiman wrote:
> > src/tests/fetcher_cache_tests.cpp
> > Line 503 (original), 523 (patched)
> > 
> >
> > you're overwriting `_task.runDirectory` here, is that intentional?

Yep. I added a comment.


- James


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64879/#review194603
---


On Dec. 30, 2017, 8:06 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64879/
> ---
> 
> (Updated Dec. 30, 2017, 8:06 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, and Jie Yu.
> 
> 
> Bugs: MESOS-8366
> https://issues.apache.org/jira/browse/MESOS-8366
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated the task launch helper to obtain the ContainerID of the
> launched task, which can be used to correctly construct the sandbox
> path for both the command and default executors.
> 
> Updated the test expectations in cases where the default executor
> container will invoke the fetcher more times than the test is
> expecting.
> 
> Updated various test helper functions to propagate more detailed
> error results that make test failure easier to debug.
> 
> 
> Diffs
> -
> 
>   src/tests/fetcher_cache_tests.cpp 7db7b7dcef27b1686ccae5a7408ff2811a3b9255 
> 
> 
> Diff: https://reviews.apache.org/r/64879/diff/2/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 64963: Fixed typos in SRLP tests.

2018-01-04 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64963/#review194804
---


Ship it!




Ship It!

- Chun-Hung Hsiao


On Jan. 4, 2018, 11:16 p.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64963/
> ---
> 
> (Updated Jan. 4, 2018, 11:16 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Greg Mann, and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed typos in SRLP tests.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> b522b406205ab4a8cb430bedd18ec00c9f437e8a 
> 
> 
> Diff: https://reviews.apache.org/r/64963/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>



Review Request 64963: Fixed typos in SRLP tests.

2018-01-04 Thread Gaston Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64963/
---

Review request for mesos, Chun-Hung Hsiao, Greg Mann, and Jie Yu.


Repository: mesos


Description
---

Fixed typos in SRLP tests.


Diffs
-

  src/tests/storage_local_resource_provider_tests.cpp 
b522b406205ab4a8cb430bedd18ec00c9f437e8a 


Diff: https://reviews.apache.org/r/64963/diff/1/


Testing
---


Thanks,

Gaston Kleiman



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-04 Thread James Peach


> On Jan. 4, 2018, 10:09 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp
> > Lines 10061 (patched)
> > 
> >
> > Perhaps move all the logic around determining the unreachable task down 
> > here.
> > 
> > ```
> > // We need to inform `removeTask` whether the task became unreachable 
> > due to the agent being removed. Note that terminal tasks are not considered 
> > unreachable. We can not rely on the task's state because it may have been 
> > set to `TASK_LOST` for backwards-compatibility.
> > bool unreachable =
> >   unreachableTime.isSome() && !protobuf::isTerminalState(task->state());
> > ```
> > 
> > Inlining `unreachable` is probably fine too but separating it helps 
> > clarify it with comments?

You have to sample the task state before the update because it might transition 
to a terminal state at that time (and LOST is terminal).


- James


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194777
---


On Jan. 4, 2018, 1:35 a.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 4, 2018, 1:35 a.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
> https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If an agent is lost, we try to remove all the tasks that might
> have been lost. However, if a task is already terminal, it hasn't
> really been lost so we should not be tracking it in the framework's
> unreachable tasks list.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 130f6e28cc62a8912aac66ecfbf014fe1ee444e3 
>   src/master/master.cpp 28d8be3a4769b418b61cff0b95845e4232135bc7 
>   src/tests/master_tests.cpp 5546fd937d078c1f757964d5163449ffd993388e 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/2/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-04 Thread James Peach


> On Jan. 4, 2018, 10:09 p.m., Jiang Yan Xu wrote:
> > src/tests/master_tests.cpp
> > Lines 7637 (patched)
> > 
> >
> > You could just wait for the the status update (being droppped) instead 
> > of capaturing `Slave::executorTerminated` and risk the race condition of 
> > the master not receiving the status update before the agent is deemed 
> > partitioned?

Dealing with all the status updates was less reliable.


- James


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194777
---


On Jan. 4, 2018, 1:35 a.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 4, 2018, 1:35 a.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
> https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If an agent is lost, we try to remove all the tasks that might
> have been lost. However, if a task is already terminal, it hasn't
> really been lost so we should not be tracking it in the framework's
> unreachable tasks list.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 130f6e28cc62a8912aac66ecfbf014fe1ee444e3 
>   src/master/master.cpp 28d8be3a4769b418b61cff0b95845e4232135bc7 
>   src/tests/master_tests.cpp 5546fd937d078c1f757964d5163449ffd993388e 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/2/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-04 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194796
---




src/master/master.cpp
Lines 10061 (patched)


You have to sample the task state before the update because it might 
transition to a terminal state at that time (and LOST is terminal).


- James Peach


On Jan. 4, 2018, 1:35 a.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 4, 2018, 1:35 a.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
> https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If an agent is lost, we try to remove all the tasks that might
> have been lost. However, if a task is already terminal, it hasn't
> really been lost so we should not be tracking it in the framework's
> unreachable tasks list.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 130f6e28cc62a8912aac66ecfbf014fe1ee444e3 
>   src/master/master.cpp 28d8be3a4769b418b61cff0b95845e4232135bc7 
>   src/tests/master_tests.cpp 5546fd937d078c1f757964d5163449ffd993388e 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/2/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-04 Thread James Peach


> On Jan. 4, 2018, 10:09 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp
> > Lines 10058-10060 (patched)
> > 
> >
> > Perhaps we just need to comment on the logic we use to determine if the 
> > task is `unreachable` and not what `removeTask` is going to do about it 
> > (because it's hidden here)? i.e., remove the comment here and use something 
> > like what I suggested below?

Sincce this is particularly obscure I think it's worth the comment.


> On Jan. 4, 2018, 10:09 p.m., Jiang Yan Xu wrote:
> > src/tests/master_tests.cpp
> > Lines 7686 (patched)
> > 
> >
> > No need to settle?

This is settling to ensure that events following from marking the agent down 
are fully processed.


- James


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194777
---


On Jan. 4, 2018, 1:35 a.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 4, 2018, 1:35 a.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
> https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If an agent is lost, we try to remove all the tasks that might
> have been lost. However, if a task is already terminal, it hasn't
> really been lost so we should not be tracking it in the framework's
> unreachable tasks list.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 130f6e28cc62a8912aac66ecfbf014fe1ee444e3 
>   src/master/master.cpp 28d8be3a4769b418b61cff0b95845e4232135bc7 
>   src/tests/master_tests.cpp 5546fd937d078c1f757964d5163449ffd993388e 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/2/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-04 Thread James Peach


> On Jan. 4, 2018, 7:06 p.m., Ilya Pronin wrote:
> > src/master/master.cpp
> > Lines 10061 (patched)
> > 
> >
> > Nit: The expression can be simplified to `!terminal && unreachable`.

I think it's a matter of taste which is more readable ;)


> On Jan. 4, 2018, 7:06 p.m., Ilya Pronin wrote:
> > src/tests/master_tests.cpp
> > Lines 7626 (patched)
> > 
> >
> > Just curious: why do you pause the clock here? To speed up the test 
> > (i.e. not wait for `allocation_interval`)?

Yeh that was weird, I removed it.


- James


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194772
---


On Jan. 4, 2018, 1:35 a.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 4, 2018, 1:35 a.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
> https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If an agent is lost, we try to remove all the tasks that might
> have been lost. However, if a task is already terminal, it hasn't
> really been lost so we should not be tracking it in the framework's
> unreachable tasks list.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 8fe9420dbe03ea2cefc6a40b0f64284aa9fe7915 
>   src/master/master.cpp 282fdf8ac38e3613c621c1c8e5c50f27bde9dafd 
>   src/tests/master_tests.cpp 5546fd937d078c1f757964d5163449ffd993388e 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/1/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-04 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194777
---




src/master/master.cpp
Lines 10058-10060 (patched)


Perhaps we just need to comment on the logic we use to determine if the 
task is `unreachable` and not what `removeTask` is going to do about it 
(because it's hidden here)? i.e., remove the comment here and use something 
like what I suggested below?



src/master/master.cpp
Lines 10061 (patched)


Perhaps move all the logic around determining the unreachable task down 
here.

```
// We need to inform `removeTask` whether the task became unreachable due 
to the agent being removed. Note that terminal tasks are not considered 
unreachable. We can not rely on the task's state because it may have been set 
to `TASK_LOST` for backwards-compatibility.
bool unreachable =
  unreachableTime.isSome() && !protobuf::isTerminalState(task->state());
```

Inlining `unreachable` is probably fine too but separating it helps clarify 
it with comments?



src/master/master.cpp
Line 10273 (original), 10279-10280 (patched)


So we had to use a boolean because we cannot rely on the task *state*, so 
"any of the unreachable states" is a bit misleading?

I think the CHECK message is already clear enough for now? (Like you did in 
`removeTask`)

I suggest we don't comment further on the removable-ness of the task 
because we might want to change it: 
https://issues.apache.org/jira/browse/MESOS-8389



src/tests/master_tests.cpp
Lines 7601-7602 (patched)


This reads like "terminal tasks" are stored as "either finished or lost" 
when it's only the non-terminal task that is transitioned to lost.

Can we just say that we are verifying "the terminal task is stored in its 
original state when parition happens"?



src/tests/master_tests.cpp
Lines 7603 (patched)


Most agent partition related tests are put in partition_tests.cpp, can we 
put this one there as well for better grouping? 

In particular there is a test named `TaskCompletedOnPartitionedAgent` there 
which is somewhat related. We can name this one 
`AgentWithCompletedTaskPartitioned` or `TaskCompletedBeforeAgentPartitioned`

The main word I want to change is `Lost`. While I can understand you are 
using it interchangably with *partitioned*, in this context when it's call 
partitioned elsewhere and there's `TaskLost` to confuse with it, perhaps it's 
better to use the more precise word.



src/tests/master_tests.cpp
Lines 7621 (patched)


We typically name this `offers` since it's a vector. You'll be using the 
first and only one so later you should

```
  AWAIT_READY(offers);
  ASSERT_FALSE(offers->empty());
  const Offer& offer = offers->at(0);
```



src/tests/master_tests.cpp
Lines 7637 (patched)


You could just wait for the the status update (being droppped) instead of 
capaturing `Slave::executorTerminated` and risk the race condition of the 
master not receiving the status update before the agent is deemed partitioned?



src/tests/master_tests.cpp
Lines 7641 (patched)


Instead of settling it's more direct to just wait for the offer before 
using it (like I mentioned above).



src/tests/master_tests.cpp
Lines 7675 (patched)


"fast fast" to emphasize? :)



src/tests/master_tests.cpp
Lines 7681 (patched)


If you wait for the ping, you don't need to settle?



src/tests/master_tests.cpp
Lines 7686 (patched)


No need to settle?


- Jiang Yan Xu


On Jan. 3, 2018, 5:35 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 3, 2018, 5:35 p.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
> https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If an agent is lost, we try to remove all the tasks that might
> have been lost. However, if a task is already terminal, it hasn't
> really been lost so we should not be tracking it in the framework's
> unreachable tasks list.
> 
> 
>

Re: Review Request 64743: Enabled function sections.

2018-01-04 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64743/
---

(Updated Jan. 4, 2018, 9:48 p.m.)


Review request for mesos, Andrew Schwartzmeyer and Benjamin Bannier.


Changes
---

Added build configuration to enable this.


Bugs: MESOS-8348
https://issues.apache.org/jira/browse/MESOS-8348


Repository: mesos


Description
---

If we tell the compiler to place each function in a separate
section, this allows the linker to garbage collect unused
sections. This significantly decreases the size of the final
build artifacts and provides some modest improvements in build
times.


Diffs (updated)
-

  cmake/CompilationConfigure.cmake 0ae05047265e33aefab6eef67a4243b6b00f0318 
  configure.ac b6eb98bfe5539c68a416831c36c182ee907d9421 


Diff: https://reviews.apache.org/r/64743/diff/4/

Changes: https://reviews.apache.org/r/64743/diff/3-4/


Testing
---

make check


Thanks,

James Peach



Review Request 64957: Updated README for the gRPC bundle.

2018-01-04 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64957/
---

Review request for mesos and Jie Yu.


Repository: mesos


Description
---

Added the commands for generating the bundle.


Diffs
-

  3rdparty/Makefile.am 805c6e534555ca3ff4320d4cdb6bfb64a36c5600 
  3rdparty/grpc-1.8.3.readme PRE-CREATION 


Diff: https://reviews.apache.org/r/64957/diff/1/


Testing
---

N/A


Thanks,

Chun-Hung Hsiao



Re: Review Request 64932: Added example framework converting disk resources.

2018-01-04 Thread Gaston Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64932/#review194780
---



Should we add this new framework to `src/tests/examples_tests.cpp`?


src/examples/test_csi_user_framework.cpp
Lines 83-84 (patched)


I see that this is also indented wrong in `test_http_framework.cpp`, but 
according to the style guide, the proper indentation would e:

```
  HTTPScheduler(
  const FrameworkInfo& _framework,
  const string& _master)
```



src/examples/test_csi_user_framework.cpp
Lines 90-92 (patched)


Ditto indentation.



src/examples/test_csi_user_framework.cpp
Lines 149 (patched)


This framework never starts tasks, so should we crash or log something 
saying that we don't expect to get task status updates?



src/examples/test_csi_user_framework.cpp
Lines 165-184 (patched)


This framework will never launch a task, so this is probably not needed.

Furthermore it should probably crash if it receivres an executor failure...



src/examples/test_csi_user_framework.cpp
Lines 278 (patched)


I'd rephrase this:

```
Check whether the given resources are reserved, ensuring that all RAW/MOUNT 
disk resources offered by the resource provider are either reserved or 
unreserved.
```

Wouldn't this break if a resource provider offers both reserved and 
unreserved disk resources? I guess this will be usual once the agent's default 
disk resources (non-csi) are handled by an SLRP.



src/examples/test_csi_user_framework.cpp
Lines 279 (patched)


Why do you use an option here?

Wouldn't `CHECK(!resources.isEmpty())` be clearer?



src/examples/test_csi_user_framework.cpp
Lines 302 (patched)


Use `endl` instead of `'\n'`.



src/examples/test_csi_user_framework.cpp
Lines 318 (patched)


Use `endl` instead of `'\n'`.



src/examples/test_csi_user_framework.cpp
Lines 331-332 (patched)


I'd phrase it:

If we didn't create an `ACCEPT` operation, create a `DELCINE` operation.



src/examples/test_csi_user_framework.cpp
Lines 334 (patched)


Ditto `endl`.



src/examples/test_csi_user_framework.cpp
Lines 335 (patched)


Remove this line, the if statement ensures that no accept call was created.



src/examples/test_csi_user_framework.cpp
Lines 380-386 (patched)


This method doesn't seem to be used. `int main(...)` should probably call 
`Flags::setUsageMessage` instead of this.



src/examples/test_csi_user_framework.cpp
Lines 399 (patched)


Why don't we make this a `string` instead of an `Option`?

That way we could remove the: `if (flags.master.isNone())` block in the 
main function.



src/examples/test_csi_user_framework.cpp
Lines 444-447 (patched)


Shouldn't this be a flag like in `no_executor_framework.cpp` and in 
`long_lived_framework.cpp`?


- Gaston Kleiman


On Jan. 3, 2018, 2:50 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64932/
> ---
> 
> (Updated Jan. 3, 2018, 2:50 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch introduces an example HTTP framework which transforms
> 'RAW' disk resources from resource providers into 'MOUNT' volumes and
> subsequently unreserves them.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 30cd4d426e797e4c8ee556d1bc3de99830a5fe41 
>   src/examples/CMakeLists.txt d4f1af4f072efdc68fa0b722f42b1d8aa1779b6e 
>   src/examples/test_csi_user_framework.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/64932/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 63860: Windows: Mapped the Docker network info types.

2018-01-04 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63860/#review194784
---



I like the direction in the latest diff (platform specific defaults).

One question is: what's the difference between nat and bridge? are they the 
same? If yes, can we not have NAT in the API? In other words, BRIDGE on Windows 
will map to "nat".

- Jie Yu


On Dec. 7, 2017, 9:34 p.m., Akash Gupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63860/
> ---
> 
> (Updated Dec. 7, 2017, 9:34 p.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer, Jie Yu, John Kordich, Joseph 
> Wu, and Michael Park.
> 
> 
> Bugs: MESOS-7342
> https://issues.apache.org/jira/browse/MESOS-7342
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current Network enum in DockerInfo is specific to Linux containers.
> Instead of supporting {host, bridge, none, user} networks, Windows
> docker supports {nat, none, user} networks. Now, the Network enum
> supports {host, bridge, nat, none, user}. If the network isn't
> specified, then the default is host on Linux and nat on Windows.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 839ddb1cb41471d36423a2fc149acf90b973d413 
>   src/docker/docker.cpp 722a54ad113fc4e2bb22a8f08e307ab38d5fbfed 
> 
> 
> Diff: https://reviews.apache.org/r/63860/diff/4/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/63862/ for test results.
> 
> 
> Thanks,
> 
> Akash Gupta
> 
>



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-04 Thread Ilya Pronin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194772
---


Fix it, then Ship it!




Looks good to me!


src/master/master.cpp
Lines 10061 (patched)


Nit: The expression can be simplified to `!terminal && unreachable`.



src/tests/master_tests.cpp
Lines 7626 (patched)


Just curious: why do you pause the clock here? To speed up the test (i.e. 
not wait for `allocation_interval`)?


- Ilya Pronin


On Jan. 3, 2018, 5:35 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 3, 2018, 5:35 p.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
> https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If an agent is lost, we try to remove all the tasks that might
> have been lost. However, if a task is already terminal, it hasn't
> really been lost so we should not be tracking it in the framework's
> unreachable tasks list.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 8fe9420dbe03ea2cefc6a40b0f64284aa9fe7915 
>   src/master/master.cpp 282fdf8ac38e3613c621c1c8e5c50f27bde9dafd 
>   src/tests/master_tests.cpp 5546fd937d078c1f757964d5163449ffd993388e 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/1/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 64939: Abort libprocess when a Process throws an uncaught exception.

2018-01-04 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64939/#review194774
---


Ship it!




Ship It!

- Jiang Yan Xu


On Jan. 3, 2018, 5:30 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64939/
> ---
> 
> (Updated Jan. 3, 2018, 5:30 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Hindman, Ilya Pronin, 
> Jie Yu, and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously, we logged the exception, terminated the throwing
> Process, and continued to run. However, currently there exists
> no known user-level code that I'm aware of that handles the
> unexpected termination due to an uncaught exception.
> 
> Generally, this means that when an exception is thrown (e.g.
> a bad call to std::map::at), the process terminates with a log
> message but things get "stuck" and the user has to debug what
> is wrong / kill the process.
> 
> Libprocess would likely need to provide some primitives to
> better support handling unexpected termination of a Process
> in order for us to provide a strategy where we continue running.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/process.cpp 
> 75cf1d3b6d3d257ba9bc81c68017a74a6511cebf 
> 
> 
> Diff: https://reviews.apache.org/r/64939/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 64938: Fixed RecoverTest.CatchupTruncated test flakiness.

2018-01-04 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64938/#review194771
---


Ship it!




Ship It!

- Jie Yu


On Jan. 4, 2018, 12:07 a.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64938/
> ---
> 
> (Updated Jan. 4, 2018, 12:07 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-8377
> https://issues.apache.org/jira/browse/MESOS-8377
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Most likely the "lock already held by process" LevelDB error was caused
> by a Shared still retained by one of the processes when the
> test tries to recreate it. This change makes sure that the test code is
> the only owner of the replica.
> 
> 
> Diffs
> -
> 
>   src/tests/log_tests.cpp 7554a1e281323222d3be12aa1ec6fd809799ff65 
> 
> 
> Diff: https://reviews.apache.org/r/64938/diff/1/
> 
> 
> Testing
> ---
> 
> Ran `./src/mesos-tests --gtest_filter='RecoverTest.CatchupTruncated' 
> --gtest_repeat=1000 --gtest_break_on_failure=1`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Review Request 64953: Added error message in tests for orphaned containers.

2018-01-04 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64953/
---

Review request for mesos, Alexander Rukletsov, Greg Mann, Joseph Wu, and Jan 
Schlicht.


Repository: mesos


Description
---

Added error message in tests for orphaned containers.


Diffs
-

  src/tests/cluster.cpp f964bf0cd0cf22374877e5748ba142dcb5fee133 


Diff: https://reviews.apache.org/r/64953/diff/1/


Testing
---

make check


Thanks,

Andrei Budnik



Re: Review Request 64891: Future-proofed use of agent capabilities in tests.

2018-01-04 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64891/
---

(Updated Jan. 4, 2018, 11:06 a.m.)


Review request for mesos, Benno Evers and Jie Yu.


Changes
---

Rebased.


Repository: mesos


Description
---

Even though currently the resource provider capability is the only
capability which can be toggled by users, when examining agent flags
we expect either no capabilities at all or a full set of capabilities
(including both toggleable and required capabilities).

This patch cleans up test code agent capabilities by delegating the
bulk of the work to a helper function providing a set of
default-enabled capabilities and only adds a single capability to that
set. This not only makes it clearer which exact capability a test
cares about, but also future-proofs the code for the case where we
extend the set of required capabilities in the future.


Diffs (updated)
-

  src/tests/agent_resource_provider_config_api_tests.cpp 
f2486cbf5cf9599ed3bd1caa3b1f660ddcb9d15f 
  src/tests/api_tests.cpp 9246f4222548f8d19a8f4f89fb3c22ecae5b58b0 
  src/tests/persistent_volume_tests.cpp 
372037f5121dfcf80ec02affdb6e9eee4029dedf 
  src/tests/reservation_tests.cpp 938306ed882f5bc94cd373653af30d09dd4b42d9 
  src/tests/resource_provider_manager_tests.cpp 
2a76ccf7cbfefbeb5cb3fcccd38829dcc407d515 
  src/tests/slave_tests.cpp fb49077d7cb81db450d9fa24f75dbd9c79ef645c 
  src/tests/storage_local_resource_provider_tests.cpp 
22b81a5c2fcca1e97a6e7d4bde790fc21022d362 


Diff: https://reviews.apache.org/r/64891/diff/2/

Changes: https://reviews.apache.org/r/64891/diff/1-2/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 64889: Fixed handling of checkpointed resources for RP-capable agents.

2018-01-04 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64889/
---

(Updated Jan. 4, 2018, 11:06 a.m.)


Review request for mesos, Benno Evers, Jie Yu, and Jan Schlicht.


Changes
---

Addressed comment from Jie.


Bugs: MESOS-8350
https://issues.apache.org/jira/browse/MESOS-8350


Repository: mesos


Description
---

The master will not resend checkpointed resources when a resource
provider-capable agent reregisters. Instead the checkpointed resources
sent as part of the agent reregistration should be evaluated by the
master and be used to update its state.

This patch fixes the handling of checkpointed resources sent as part
of the agent reregistration so that the resources are used to update
the master state.


Diffs (updated)
-

  src/master/master.hpp 8fe9420dbe03ea2cefc6a40b0f64284aa9fe7915 
  src/master/master.cpp 282fdf8ac38e3613c621c1c8e5c50f27bde9dafd 
  src/tests/slave_recovery_tests.cpp 387c2ff62d3ccfe918775b5c059eafb9d4fa8d87 


Diff: https://reviews.apache.org/r/64889/diff/3/

Changes: https://reviews.apache.org/r/64889/diff/2-3/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 64888: Added test of handling of checkpointed resources in reregistration.

2018-01-04 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64888/
---

(Updated Jan. 4, 2018, 11:06 a.m.)


Review request for mesos, Benno Evers, Jie Yu, and Jan Schlicht.


Changes
---

Addressed comments from Jie.


Summary (updated)
-

Added test of handling of checkpointed resources in reregistration.


Bugs: MESOS-8350
https://issues.apache.org/jira/browse/MESOS-8350


Repository: mesos


Description (updated)
---

This patch adds a test that confirms that the master resends
checkpointed resources to the agent on reregistration.


Diffs (updated)
-

  src/tests/slave_recovery_tests.cpp 387c2ff62d3ccfe918775b5c059eafb9d4fa8d87 


Diff: https://reviews.apache.org/r/64888/diff/3/

Changes: https://reviews.apache.org/r/64888/diff/2-3/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 64933: Fixed flaky PartitionedSlaveReregistrationMasterFailover test.

2018-01-04 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64933/#review194744
---


Ship it!




Ship It!

- Alexander Rukletsov


On Jan. 3, 2018, 10:51 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64933/
> ---
> 
> (Updated Jan. 3, 2018, 10:51 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Megha Sharma.
> 
> 
> Bugs: MESOS-8334
> https://issues.apache.org/jira/browse/MESOS-8334
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The test was flaky due to a race condition that causes status updates
> to be dropped because the agent reregisters before the schedulers.
> 
> 
> Diffs
> -
> 
>   src/tests/partition_tests.cpp 7f4b9ed938fd478fdf750b464c531bf76de7d798 
> 
> 
> Diff: https://reviews.apache.org/r/64933/diff/1/
> 
> 
> Testing
> ---
> 
> make check. Ran the affected test for 1000 iterations.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>