Re: Review Request 61128: Improved log messages in master when adding/removing tasks/executors.

2018-11-15 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61128/
---

(Updated Nov. 15, 2018, 9:40 p.m.)


Review request for mesos, Benno Evers and Qian Zhang.


Repository: mesos


Description
---

Made the log messages and the calling sites consistent and also added
one for adding an executor.


Diffs (updated)
-

  src/master/master.cpp 1e326ec42a7f79a0835529a4655e7ec272f1cf40 


Diff: https://reviews.apache.org/r/61128/diff/3/

Changes: https://reviews.apache.org/r/61128/diff/2-3/


Testing
---

make check


Thanks,

Vinod Kone



Re: Review Request 69293: Disabled parallel test execution for reviewbot.

2018-11-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69293/#review210401
---


Ship it!




Ship It!

- Vinod Kone


On Nov. 7, 2018, 10:57 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69293/
> ---
> 
> (Updated Nov. 7, 2018, 10:57 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently no dedicated handling of XML reporting from multiple gtest
> shards is implemented in our gtest runner.
> 
> This patch disables the now default-enabled parallel test runner for
> any build & tests cycles under the ASF CI reviewbot job.
> 
> 
> Diffs
> -
> 
>   support/jenkins/reviewbot.sh 0dd81fd1c067dfe84c4e638ebcee87bc6d4d73a7 
> 
> 
> Diff: https://reviews.apache.org/r/69293/diff/1/
> 
> 
> Testing
> ---
> 
> NOTE: The flags for `Mesos-Buildbot` are configured directly in Jenkins and 
> already adjusted.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Review Request 69291: Simplified writing out test report xml files.

2018-11-07 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69291/
---

Review request for mesos and James Peach.


Repository: mesos


Description
---

Simplified writing out test report xml files.


Diffs
-

  support/mesos-build/entrypoint.sh 012f003b3cf22b49a442df293f1b6224c074e383 


Diff: https://reviews.apache.org/r/69291/diff/1/


Testing
---


Thanks,

Vinod Kone



Re: Review Request 69291: Simplified writing out test report xml files.

2018-11-07 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69291/
---

(Updated Nov. 7, 2018, 10:36 p.m.)


Review request for mesos and James Peach.


Changes
---

trailing slash.


Repository: mesos


Description
---

Simplified writing out test report xml files.


Diffs (updated)
-

  support/mesos-build/entrypoint.sh 012f003b3cf22b49a442df293f1b6224c074e383 


Diff: https://reviews.apache.org/r/69291/diff/2/

Changes: https://reviews.apache.org/r/69291/diff/1-2/


Testing
---


Thanks,

Vinod Kone



Review Request 69279: Updated Build bot to write out test report xml files.

2018-11-07 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69279/
---

Review request for mesos and James Peach.


Repository: mesos


Description
---

This can be used by Jenkins to display test trends.


Diffs
-

  support/mesos-build.sh d146cc104ea330bfdcbfed497267e6a1465db0bc 
  support/mesos-build/entrypoint.sh ec98cc8b1fdcd0a32ed32cfeb69dfb976f82b81d 


Diff: https://reviews.apache.org/r/69279/diff/1/


Testing
---

Tested manually with a custom jenkins job.


Thanks,

Vinod Kone



Re: Review Request 69193: Added filters info to the `DECLINE` call log message.

2018-10-26 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69193/#review210099
---


Ship it!




Ship It!

- Vinod Kone


On Oct. 26, 2018, 5:42 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69193/
> ---
> 
> (Updated Oct. 26, 2018, 5:42 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Gastón Kleiman.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This helps to investigate allocator allocation decisions
> without verbose logging.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 0c95c438975efd949cbf86f7d8bfea940c20a43a 
> 
> 
> Diff: https://reviews.apache.org/r/69193/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 69123: Fixed an early fd close in the cgroups event notifier.

2018-10-23 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69123/#review209921
---




src/linux/cgroups.cpp
Line 1124 (original), 1137 (patched)
<https://reviews.apache.org/r/69123/#comment294558>

Didn't quite follow why you had make this an Option?


- Vinod Kone


On Oct. 23, 2018, 3:08 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69123/
> ---
> 
> (Updated Oct. 23, 2018, 3:08 a.m.)
> 
> 
> Review request for mesos and Qian Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The cgroups event notifier was closing the eventfd while an
> `io::read()` operation may be in progress. This can lead to
> bugs where the fd gets re-used and read from a stale io::read.
> 
> 
> Diffs
> -
> 
>   src/linux/cgroups.cpp 95efc1def3318dd93d30d91e9c8a91376cc610b8 
> 
> 
> Diff: https://reviews.apache.org/r/69123/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 68951: Updated verify-reviews.py to use current interpreter in subprocesses.

2018-10-08 Thread Vinod Kone


> On Oct. 8, 2018, 6:12 p.m., Till Toenshoff wrote:
> > support/verify-reviews.py
> > Line 97 (original), 97 (patched)
> > <https://reviews.apache.org/r/68951/diff/1/?file=2095334#file2095334line97>
> >
> > IIUC, then `sys.executable` may be `None`. Shall we guard against that?
> 
> Armand Grillet wrote:
> As described in 
> https://docs.python.org/3/library/sys.html#sys.executable: "If Python is 
> unable to retrieve the real path to its executable, sys.executable will be an 
> empty string or None.".
> In the first case, "", the shebang in the first line of 
> `support/apply-reviews.py` 
> (https://github.com/apache/mesos/blob/8375e426d1c07f625ee18cd439e0dd4f1dc804c5/support/apply-reviews.py#L1)
>  will make us use Python 3 AFAIK. 
> 
> The reason why I set the current interpreter instead of nothing is that a 
> user might specify an interpreter when running `support/apply-reviews.py`. 
> This is not the first time we do this in our support scripts: 
> https://github.com/apache/mesos/blob/8375e426d1c07f625ee18cd439e0dd4f1dc804c5/support/mesos-style.py#L291

i'm dropping this and committing. spoke with @till offline.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68951/#review209317
---


On Oct. 8, 2018, 6:06 p.m., Armand Grillet wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68951/
> ---
> 
> (Updated Oct. 8, 2018, 6:06 p.m.)
> 
> 
> Review request for mesos, Till Toenshoff and Vinod Kone.
> 
> 
> Bugs: MESOS-9253
> https://issues.apache.org/jira/browse/MESOS-9253
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes the command used in `support/verify-reviews.py` when
> running `support/apply-reviews.py` as a subprocess. It was previously
> `"python"`, which is generally Python 2, and is now `sys.executable`.
> 
> That way, if verify-reviews.py is run with Python 3 (as it should),
> apply-reviews.py will be run with the same Python 3 interpreter. This
> should fix the `ImportError` issues we have recently seen in our CI.
> 
> 
> Diffs
> -
> 
>   support/verify-reviews.py 56321ae65b38a1c62f5589b6a8aaa3993fa3dd5f 
> 
> 
> Diff: https://reviews.apache.org/r/68951/diff/1/
> 
> 
> Testing
> ---
> 
> I have created a simple `test.py` file to check that the interpreter was 
> correctly found:
> 
> ```
> import sys
> 
> 
> def apply_review(review_id):
> """Apply a review using the script apply-reviews.py."""
> print("Applying review %s" % review_id)
> print("%s support/apply-reviews.py -n -r %s" % (sys.executable, 
> review_id))
> 
> apply_review(1337)
> apply_review("1337")
> ```
> 
> In both cases, `python3 test.py` prints `/usr/local/bin/python3 
> support/apply-reviews.py -n -r 1337` which is what I expected.
> 
> 
> Thanks,
> 
> Armand Grillet
> 
>



Re: Review Request 68951: Updated verify-reviews.py to use current interpreter in subprocesses.

2018-10-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68951/#review209324
---


Ship it!




Ship It!

- Vinod Kone


On Oct. 8, 2018, 6:06 p.m., Armand Grillet wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68951/
> ---
> 
> (Updated Oct. 8, 2018, 6:06 p.m.)
> 
> 
> Review request for mesos, Till Toenshoff and Vinod Kone.
> 
> 
> Bugs: MESOS-9253
> https://issues.apache.org/jira/browse/MESOS-9253
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This changes the command used in `support/verify-reviews.py` when
> running `support/apply-reviews.py` as a subprocess. It was previously
> `"python"`, which is generally Python 2, and is now `sys.executable`.
> 
> That way, if verify-reviews.py is run with Python 3 (as it should),
> apply-reviews.py will be run with the same Python 3 interpreter. This
> should fix the `ImportError` issues we have recently seen in our CI.
> 
> 
> Diffs
> -
> 
>   support/verify-reviews.py 56321ae65b38a1c62f5589b6a8aaa3993fa3dd5f 
> 
> 
> Diff: https://reviews.apache.org/r/68951/diff/1/
> 
> 
> Testing
> ---
> 
> I have created a simple `test.py` file to check that the interpreter was 
> correctly found:
> 
> ```
> import sys
> 
> 
> def apply_review(review_id):
> """Apply a review using the script apply-reviews.py."""
> print("Applying review %s" % review_id)
> print("%s support/apply-reviews.py -n -r %s" % (sys.executable, 
> review_id))
> 
> apply_review(1337)
> apply_review("1337")
> ```
> 
> In both cases, `python3 test.py` prints `/usr/local/bin/python3 
> support/apply-reviews.py -n -r 1337` which is what I expected.
> 
> 
> Thanks,
> 
> Armand Grillet
> 
>



Re: Review Request 68912: Added a log line to `MesosContainerizer::kill()`.

2018-10-03 Thread Vinod Kone


> On Oct. 3, 2018, 4:05 p.m., Mesos Reviewbot Windows wrote:
> > FAIL: Some of the unit tests failed. Please check the relevant logs.
> > 
> > Reviews applied: `['68912']`
> > 
> > Failed command: `Start-MesosCITesting`
> > 
> > All the build artifacts available at: 
> > http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2413/mesos-review-68912
> > 
> > Relevant logs:
> > 
> > - 
> > [mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2413/mesos-review-68912/logs/mesos-tests.log):
> > 
> > ```
> > W1003 16:05:33.435480 12912 slave.cpp:3917] Ignoring shutdown framework 
> > d7d1a6b8-7f3b-439c-a27a-847a8acd066c- because it is terminating
> > I1003 16:05:33.436484 15652 master.cpp:1251] Agent 
> > d7d1a6b8-7f3b-439c-a27a-847a8acd066c-S0 at slave(461)@192.10.1.5:54865 
> > (windows-02.aa0q4n2kgcyefckmv0xukjvy4f.xx.internal.cloudapp.net) 
> > disconnected
> > I1003 16:05:33.437479 15652 master.cpp:3267] Disconnecting agent 
> > d7d1a6b8-7f3b-439c-a27a-847a8acd066c-S0 at slave(461)@192.10.1.5:54865 
> > (windows-02.aa0q4n2kgcyefckmv0xukjvy4f.xx.internal.cloudapp.net)
> > I1003 16:05:33.437479 15652 master.cpp:3286] Deactivating agent 
> > d7d1a6b8-7f3b-439c-a27a-847a8acd066c-S0 at slave(461)@192.10.1.5:54865 
> > (windows-02.aa0q4n2kgcyefckmv0xukjvy4f.xx.internal.cloudapp.net)
> > I1003 16:05:33.437479  7924 hierarchical.cpp:359] Removed framework 
> > d7d1a6b8-7f3b-439c-a27a-847a8acd066c-
> > I1003 16:05:33.437479  7924 hierarchical.cpp:803] Agent 
> > d7d1a6b8-7f3b-439c-a27a-847a8acd066c-S0 deactivated
> > I1003 16:05:33.439498  8116 containerizer.cpp:2455] Destroying container 
> > 0da637c0-98a8-4ad4-ac65-6cf607cbba39 in RUNNING state
> > I1003 16:05:33.439498  8116 containerizer.cpp:3122] Transitioning the state 
> > of container 0da637c0-98a8-4ad4-ac65-6cf607cbba39 from RUNNING to DESTROYING
> > I1003 16:05:33.439498  8116 launcher.cpp:166] Asked to destroy containe[
> >OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (584 ms)
> > [--] 1 test from IsolationFlag/MemoryIsolatorTest (603 ms total)
> > 
> > [--] Global test environment tear-down
> > d:\dcos\mesos\mesos\src\tests\environment.cpp(1126): error: Failed
> > Tests completed with child processes remaining:
> > -+- 1244 0024FFD8E6DC
> >  --- 15936 0024FFD8E6DC
> > [==] 1051 tests from 103 test cases ran. (489504 ms total)
> > [  PASSED  ] 1051 tests.
> > [  FAILED  ] 0 tests, listed below:
> > 
> >  0 FAILED TESTS
> > 
> >   YOU HAVE 232 DISABLED TESTS
> > 
> > r 0da637c0-98a8-4ad4-ac65-6cf607cbba39
> > I1003 16:05:33.471307  7924 containerizer.cpp:2961] Container 
> > 0da637c0-98a8-4ad4-ac65-6cf607cbba39 has exited
> > I1003 16:05:33.500296 14928 master.cpp:1093] Master terminating
> > I1003 16:05:33.502277 15652 hierarchical.cpp:645] Removed agent 
> > d7d1a6b8-7f3b-439c-a27a-847a8acd066c-S0
> > I1003 16:05:33.824251 17340 process.cpp:926] Stopped the socket accept loop
> > ```

@andy Is this a known issue with windows reviewbot?


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68912/#review209187
---


On Oct. 3, 2018, 3:05 p.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68912/
> ---
> 
> (Updated Oct. 3, 2018, 3:05 p.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a log line to `MesosContainerizer::kill()`.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/containerizer.cpp 
> 82d64e9928abd5e22493fa697d7b0910211533f3 
> 
> 
> Diff: https://reviews.apache.org/r/68912/diff/1/
> 
> 
> Testing
> ---
> 
> make -j3 check
> 
> Ran it through internal CI.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 68866: Waited for TEARDOWN response in v1 Java scheduler shim.

2018-10-03 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68866/#review209186
---


Ship it!




Ship It!

- Vinod Kone


On Sept. 27, 2018, 5:41 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68866/
> ---
> 
> (Updated Sept. 27, 2018, 5:41 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Nicholas Parker.
> 
> 
> Bugs: MESOS-9274
> https://issues.apache.org/jira/browse/MESOS-9274
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   src/java/jni/org_apache_mesos_v1_scheduler_V1Mesos.cpp 
> 2a5fbd79ac7bad933067cd96e38186849af8edc4 
> 
> 
> Diff: https://reviews.apache.org/r/68866/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`on various Linux distro
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 68866: Waited for TEARDOWN response in v1 Java scheduler shim.

2018-10-03 Thread Vinod Kone


> On Sept. 27, 2018, 6:23 p.m., Vinod Kone wrote:
> > src/java/jni/org_apache_mesos_v1_scheduler_V1Mesos.cpp
> > Lines 294 (patched)
> > <https://reviews.apache.org/r/68866/diff/1/?file=2092453#file2092453line294>
> >
> > Looks like you are doing `call` here instead of `send` so that you have 
> > a future to wait on? Skipping `send` which does validation and 
> > authentication seems wrong. Let's not do that.
> > 
> > I would recommend putting a sleep in the client code instead of here 
> > for now.
> 
> Alexander Rukletsov wrote:
> But `call` does 
> [validation](https://github.com/apache/mesos/blob/6e21e94ddca5b776d44636fe3eba8500bf88dc25/src/scheduler/scheduler.cpp#L269-L270)
>  and 
> [authentication](https://github.com/apache/mesos/blob/6e21e94ddca5b776d44636fe3eba8500bf88dc25/src/scheduler/scheduler.cpp#L301)
>  too. Not sure I follow, Vinod.
> 
> Greg Mann wrote:
> Yea I think we have the necessary validation/authentication in the 
> `call()` path as well. This solution is nice because it will wait for as 
> little time as possible, up to 5 seconds, rather than always waiting for 5 
> seconds.
> 
> In fact, since this code will return early if a response is actually 
> received, I would feel comfortable with a slightly longer await - maybe 10 or 
> 20 seconds?

I didn't realize `call` was a newish interface method added to the library. At 
a quick glance I thought it was the continuation of `send` and an internal 
method. My bad.

So I'm ok with doing a `call` here.

Suggestion: The comment above doesn't have enough context for a future to 
understand why the library could be destructured. I would recommend expanding 
the comment a bit more explaining. For example "The JVM could potentially call 
JNI `finalize` if the Java scheduler nullified its reference to `V1Mesos` 
library immediately after sending TEARDOWN." Also, maybe link to MESOS-9274 for 
posterity.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68866/#review209074
---


On Sept. 27, 2018, 5:41 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68866/
> ---
> 
> (Updated Sept. 27, 2018, 5:41 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Nicholas Parker.
> 
> 
> Bugs: MESOS-9274
> https://issues.apache.org/jira/browse/MESOS-9274
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   src/java/jni/org_apache_mesos_v1_scheduler_V1Mesos.cpp 
> 2a5fbd79ac7bad933067cd96e38186849af8edc4 
> 
> 
> Diff: https://reviews.apache.org/r/68866/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`on various Linux distro
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Review Request 68912: Added a log line to `MesosContainerizer::kill()`.

2018-10-03 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68912/
---

Review request for mesos and Gilbert Song.


Repository: mesos


Description
---

Added a log line to `MesosContainerizer::kill()`.


Diffs
-

  src/slave/containerizer/mesos/containerizer.cpp 
82d64e9928abd5e22493fa697d7b0910211533f3 


Diff: https://reviews.apache.org/r/68912/diff/1/


Testing
---

make -j3 check

Ran it through internal CI.


Thanks,

Vinod Kone



Re: Review Request 68899: Updated the MesosCon 2018 location.

2018-10-02 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68899/#review209156
---


Ship it!




Ship It!

- Vinod Kone


On Oct. 2, 2018, 6:59 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68899/
> ---
> 
> (Updated Oct. 2, 2018, 6:59 p.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated the MesosCon 2018 location.
> 
> 
> Diffs
> -
> 
>   site/source/blog/2018-07-30-mesoscon-2018-cfp-is-now-open.md 
> 09cf1a85f5d497c421d96c97bb855b9a486ccc0d 
> 
> 
> Diff: https://reviews.apache.org/r/68899/diff/1/
> 
> 
> Testing
> ---
> 
> Built the website using `site/mesos-website-dev.sh`, see the attached 
> screenshot.
> 
> 
> File Attachments
> 
> 
> Blog post screenshot
>   
> https://reviews.apache.org/media/uploaded/files/2018/10/02/ddbc84c9-ff2f-44f0-b68e-c6bc9070b11a__screenshot.png
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 68865: Put `TerminateEvent` at the end of the queue in the Mesos library.

2018-09-27 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68865/#review209075
---




src/scheduler/scheduler.cpp
Line 1041 (original), 1041 (patched)
<https://reviews.apache.org/r/68865/#comment293336>

Can you add a comment on why you are doing this for posterity?


- Vinod Kone


On Sept. 27, 2018, 5:41 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68865/
> ---
> 
> (Updated Sept. 27, 2018, 5:41 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Nicholas Parker.
> 
> 
> Bugs: MESOS-9274
> https://issues.apache.org/jira/browse/MESOS-9274
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   src/scheduler/scheduler.cpp 471152945d6af660c8983324b38702d872657f89 
> 
> 
> Diff: https://reviews.apache.org/r/68865/diff/1/
> 
> 
> Testing
> ---
> 
> See https://reviews.apache.org/r/68866/
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 68866: Waited for TEARDOWN response in v1 Java scheduler shim.

2018-09-27 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68866/#review209074
---




src/java/jni/org_apache_mesos_v1_scheduler_V1Mesos.cpp
Lines 294 (patched)
<https://reviews.apache.org/r/68866/#comment293335>

Looks like you are doing `call` here instead of `send` so that you have a 
future to wait on? Skipping `send` which does validation and authentication 
seems wrong. Let's not do that.

I would recommend putting a sleep in the client code instead of here for 
now.


- Vinod Kone


On Sept. 27, 2018, 5:41 p.m., Alexander Rukletsov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68866/
> ---
> 
> (Updated Sept. 27, 2018, 5:41 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Nicholas Parker.
> 
> 
> Bugs: MESOS-9274
> https://issues.apache.org/jira/browse/MESOS-9274
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   src/java/jni/org_apache_mesos_v1_scheduler_V1Mesos.cpp 
> 2a5fbd79ac7bad933067cd96e38186849af8edc4 
> 
> 
> Diff: https://reviews.apache.org/r/68866/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`on various Linux distro
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>



Re: Review Request 68826: Fixed bug in `verify-reviews` due to mismatched types.

2018-09-24 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68826/#review208959
---


Ship it!




LGTM from my limited understanding of python

- Vinod Kone


On Sept. 24, 2018, 6:27 p.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68826/
> ---
> 
> (Updated Sept. 24, 2018, 6:27 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Vinod Kone.
> 
> 
> Bugs: MESOS-9253
> https://issues.apache.org/jira/browse/MESOS-9253
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Because Python is not type-safe, we encountered a bug in the code
> executed on non-Windows platforms that was expecting `output` to be a
> normal Python string instead of a Python byte string (with encoded
> content). To fix this, we now always decode the bytes into a string,
> so that the logic afterwards only has one type to deal with.
> 
> 
> Diffs
> -
> 
>   support/verify-reviews.py 72b7eb5b9baaf8eaa352b55dad55e62881d87323 
> 
> 
> Diff: https://reviews.apache.org/r/68826/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Re: Review Request 68257: Fixed incorrect `mnt` namespace detection of command executor's task.

2018-08-13 Thread Vinod Kone


> On Aug. 7, 2018, 4:03 p.m., Vinod Kone wrote:
> > src/slave/containerizer/mesos/utils.cpp
> > Line 102 (original), 105 (patched)
> > <https://reviews.apache.org/r/68257/diff/1/?file=2069850#file2069850line108>
> >
> > Are we guaranteed that there are no short-lived processes, other than 
> > the task process, at the 2nd level? If not, we will have the same issue 
> > right?
> > 
> > Modulo the above question, the change LGTM.
> 
> Andrei Budnik wrote:
> There are two types of 2nd-level processes:
> 1) the command executor's task
> 2) the nested container's task
> 
> E.g. the process tree can be like the following:
> 0. mesos-containerizer (`init`)
>1. mesos-executor (command executor)
>   2. sleep 1000 (command executor's task)
>1. mesos-containerizer (`init` of a nested container) <- enters `mnt` 
> namespace of command executor's task before forking a task
>   2. echo "echo" (nested container's task)
> 
> Since we skip 1st-level processes whose `mnt` namespace is not the same 
> as `init` process (PID 1), the algorithm doesn't iterate over their 2nd-level 
> processes. That gives a gurarantee that we only detect command executor's 
> task, but not the short-lived processes.
> 
> Alexander Rukletsov wrote:
> Let's make it explicit to the reader. I suggest to:
> 1. Collect _candidates_ (2nd level children) first.
> 2. Assert there are >= 1 candidates.
> 3. Filter all candidates except those whose (`mnt` differs from root's 
> `mnt` && parent's `mnt` equals to root's `mnt`).
> 4. Assert there are now 0 or 1 candidates left.
> 5. Return pid of that single candidate or root's pid if there are none.

@andrei Thanks for the info. IIUC, any processes launched by the mesos-executor 
or the task using `LAUNCH_NESTED_CONTAINER` API will end up with the process 
tree you showed above. I just wanted to make sure it is not possible for us to 
see a child process of command executor that is a sibling of the task process 
(e.g., task launched a process without using LNC API but it ended up as a 
sibling, command executor runs a script/binary without using LNC API etc).


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68257/#review206947
---


On Aug. 7, 2018, 1:46 p.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68257/
> ---
> 
> (Updated Aug. 7, 2018, 1:46 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Jie Yu, and 
> Kevin Klues.
> 
> 
> Bugs: MESOS-9116
> https://issues.apache.org/jira/browse/MESOS-9116
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously, we were walking the process tree from the container's
> `init` process to find the first process along the way whose `mnt`
> namespace differs from the `init` process. We expected this algorithm
> to always return the PID of the command executor's task. However, if
> someone launches multiple nested containers within the process tree,
> the algorithm might detect the PID of the nested container instead of
> the command executor's task. The detected PID might belong to a
> short-lived container, so the container's process might terminate at
> the moment the containerizer launcher (aka `nanny`) process tries to
> enter its `mnt` namespace. This patch fixes the detection algorithm
> so that it always returns PID of the command executor's task.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/utils.cpp 
> 30e76d1d91651975033078f5450e45f5f2fd8ba0 
> 
> 
> Diff: https://reviews.apache.org/r/68257/diff/1/
> 
> 
> Testing
> ---
> 
> 1) Internal CI with disabled 
> `ROOT_CGROUPS_LaunchNestedContainerSessionsInParallel` test (see previous 
> patch).
> 2) Fedora 25: `./src/mesos-tests 
> --gtest_filter=*AgentAPITest.LaunchNestedContainerSessionInParallel* 
> --gtest_break_on_failure --gtest_repeat=100 --verbose`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 68257: Fixed incorrect `mnt` namespace detection of command executor's task.

2018-08-07 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68257/#review206947
---




src/slave/containerizer/mesos/utils.cpp
Line 102 (original), 105 (patched)
<https://reviews.apache.org/r/68257/#comment290125>

Are we guaranteed that there are no short-lived processes, other than the 
task process, at the 2nd level? If not, we will have the same issue right?

Modulo the above question, the change LGTM.


- Vinod Kone


On Aug. 7, 2018, 1:46 p.m., Andrei Budnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68257/
> ---
> 
> (Updated Aug. 7, 2018, 1:46 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Jie Yu, and 
> Kevin Klues.
> 
> 
> Bugs: MESOS-9116
> https://issues.apache.org/jira/browse/MESOS-9116
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously, we were walking the process tree from the container's
> `init` process to find the first process along the way whose `mnt`
> namespace differs from the `init` process. We expected this algorithm
> to always return the PID of the command executor's task. However, if
> someone launches multiple nested containers within the process tree,
> the algorithm might detect the PID of the nested container instead of
> the command executor's task. The detected PID might belong to a
> short-lived container, so the container's process might terminate at
> the moment the containerizer launcher (aka `nanny`) process tries to
> enter its `mnt` namespace. This patch fixes the detection algorithm
> so that it always returns PID of the command executor's task.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/utils.cpp 
> 30e76d1d91651975033078f5450e45f5f2fd8ba0 
> 
> 
> Diff: https://reviews.apache.org/r/68257/diff/1/
> 
> 
> Testing
> ---
> 
> 1) Internal CI with disabled 
> `ROOT_CGROUPS_LaunchNestedContainerSessionsInParallel` test (see previous 
> patch).
> 2) Fedora 25: `./src/mesos-tests 
> --gtest_filter=*AgentAPITest.LaunchNestedContainerSessionInParallel* 
> --gtest_break_on_failure --gtest_repeat=100 --verbose`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>



Re: Review Request 68111: Added 'MesosCon 2018 CFP is now open!' blog post.

2018-07-30 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68111/#review206610
---


Fix it, then Ship it!





site/source/blog/2018-07-30-mesoscon-2018-cfp-is-now-open.md
Lines 23 (patched)
<https://reviews.apache.org/r/68111/#comment289640>

s/2018/2017/


- Vinod Kone


On July 30, 2018, 8:12 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68111/
> ---
> 
> (Updated July 30, 2018, 8:12 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added 'MesosCon 2018 CFP is now open!' blog post.
> 
> 
> Diffs
> -
> 
>   site/source/blog/2018-07-30-mesoscon-2018-cfp-is-now-open.md PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68111/diff/1/
> 
> 
> Testing
> ---
> 
> Used `./mesos-website-dev.sh` to verify that the website is properly rendered.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 67965: Optimized ranges subtraction operation.

2018-07-26 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67965/#review206530
---


Ship it!




Ship It!

- Vinod Kone


On July 26, 2018, 7:14 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67965/
> ---
> 
> (Updated July 26, 2018, 7:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gastón Kleiman, and Vinod Kone.
> 
> 
> Bugs: MESOS-9086
> https://issues.apache.org/jira/browse/MESOS-9086
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current ranges subtraction uses boost IntervalSet.
> Based on the profiling result of MESOS-8989, the ranges
> subtraction operation is about 2~3 times more expensive
> than that of addition. The conversion cost from Ranges
> to IntervalSet may be the culprit.
> 
> This patch optimizes the ranges subtraction operation by
> converting Ranges to a vector of internal sub-ranges, sorting
> the vector based on sub-range start and then applying a
> one-pass algorithm similar to that of addition.
> 
> 
> Diffs
> -
> 
>   src/common/values.cpp afe4137f82115dd4ec9967b5eba16a1dd15401c8 
>   src/tests/values_tests.cpp 2f5abedfd461c114d03f5e941d81ebe414188033 
>   src/v1/values.cpp d2c31f6c91998382dec1d8834b40613013716cdd 
> 
> 
> Diff: https://reviews.apache.org/r/67965/diff/4/
> 
> 
> Testing
> ---
> 
> make check
> 
> Benchmarking:
> 
> tl:dr: more than 80% performance improvment across board. Performance of 
> subtraction is now on par or better than the addition, especially when there 
> are a large number of sub-ranges.
> 
> Build with -O2 optimization, ran on a multicore machine with peak frequency 
> at 2.2GHz:
> 
> Took 1.617706ms to perform 1000 'a += b' operations on ports:[1-6, 11-16, 
> 21-26...91-96] and ports:[3-8, 13-18, 23-28..., 93-98] with 10 sub-ranges
> Took 1.607999ms to perform 1000 'a -= b' operations on ports:[1-6, 11-16, 
> 21-26...91-96] and ports:[3-8, 13-18, 23-28..., 93-98] with 10 sub-ranges
> Took 3.094113ms to perform 1000 'a + b' operations on ports:[1-6, 11-16, 
> 21-26...91-96] and ports:[3-8, 13-18, 23-28..., 93-98] with 10 sub-ranges
> Took 3.152291ms to perform 1000 'a - b' operations on ports:[1-6, 11-16, 
> 21-26...91-96] and ports:[3-8, 13-18, 23-28..., 93-98] with 10 sub-ranges
> 
> Took 14.908924ms to perform 1000 'a += b' operations on ports:[1-6, 11-16, 
> 21-26...991-996] and ports:[3-8, 13-18, 23-28..., 993-998] with 100 sub-ranges
> Took 13.564767ms to perform 1000 'a -= b' operations on ports:[1-6, 11-16, 
> 21-26...991-996] and ports:[3-8, 13-18, 23-28..., 993-998] with 100 sub-ranges
> Took 25.523443ms to perform 1000 'a + b' operations on ports:[1-6, 11-16, 
> 21-26...991-996] and ports:[3-8, 13-18, 23-28..., 993-998] with 100 sub-ranges
> Took 24.871216ms to perform 1000 'a - b' operations on ports:[1-6, 11-16, 
> 21-26...991-996] and ports:[3-8, 13-18, 23-28..., 993-998] with 100 sub-ranges
> 
> Took 234.22483ms to perform 1000 'a += b' operations on ports:[1-6, 11-16, 
> 21-26...9991-9996] and ports:[3-8, 13-18, 23-28..., 9993-9998] with 1000 
> sub-ranges
> Took 144.417381ms to perform 1000 'a -= b' operations on ports:[1-6, 11-16, 
> 21-26...9991-9996] and ports:[3-8, 13-18, 23-28..., 9993-9998] with 1000 
> sub-ranges
> Took 322.548021ms to perform 1000 'a + b' operations on ports:[1-6, 11-16, 
> 21-26...9991-9996] and ports:[3-8, 13-18, 23-28..., 9993-9998] with 1000 
> sub-ranges
> Took 227.835441ms to perform 1000 'a - b' operations on ports:[1-6, 11-16, 
> 21-26...9991-9996] and ports:[3-8, 13-18, 23-28..., 9993-9998] with 1000 
> sub-ranges
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 67965: Optimized ranges subtraction operation.

2018-07-24 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67965/#review206414
---




src/common/values.cpp
Line 231 (original), 231 (patched)
<https://reviews.apache.org/r/67965/#comment289353>

extra blank line.



src/common/values.cpp
Lines 239 (patched)
<https://reviews.apache.org/r/67965/#comment289354>

`ranges`



src/common/values.cpp
Lines 263 (patched)
<https://reviews.apache.org/r/67965/#comment289355>

either use `iteratorLeft` and `iteratorRight` or `itLeft` and `itRight`. 
i've looked around and I don't think we use `itr*` anywhere.



src/common/values.cpp
Lines 265 (patched)
<https://reviews.apache.org/r/67965/#comment289356>

don't think you need the comment here. it's pretty clear?



src/common/values.cpp
Lines 290-311 (patched)
<https://reviews.apache.org/r/67965/#comment289368>

This part is a bit hard to follow. Can you break it down into 4 cases

1) left is subsumed in right
2) right is subsumed in left
3) left overlaps right but left's start is smaller
4) left overlaps right but right's start is smaller

Even if some of these cases ends up with same resulting code, I think it's 
worth duplicating for readability.



src/common/values.cpp
Lines 313 (patched)
<https://reviews.apache.org/r/67965/#comment289369>

I like that you moved this outside the loop!



src/common/values.cpp
Line 371 (original), 464 (patched)
<https://reviews.apache.org/r/67965/#comment289370>

Can you add a TODO here to make this consistent with how we do subtraction?



src/v1/values.cpp
Lines 233 (patched)
<https://reviews.apache.org/r/67965/#comment289371>

See below.


- Vinod Kone


On July 24, 2018, 5:29 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67965/
> ---
> 
> (Updated July 24, 2018, 5:29 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gastón Kleiman, and Vinod Kone.
> 
> 
> Bugs: MESOS-9086
> https://issues.apache.org/jira/browse/MESOS-9086
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The current ranges subtraction uses boost IntervalSet.
> Based on the profiling result of MESOS-8989, the ranges
> subtraction operation is about 2~3 times more expensive
> than that of addition. The conversion cost from Ranges
> to IntervalSet may be the culprit.
> 
> This patch optimizes the ranges subtraction operation by
> converting Ranges to a vector of internal sub-ranges, sorting
> the vector based on sub-range start and then applying a
> one-pass algorithm similar to that of addition.
> 
> 
> Diffs
> -
> 
>   src/common/values.cpp afe4137f82115dd4ec9967b5eba16a1dd15401c8 
>   src/tests/values_tests.cpp 2f5abedfd461c114d03f5e941d81ebe414188033 
>   src/v1/values.cpp d2c31f6c91998382dec1d8834b40613013716cdd 
> 
> 
> Diff: https://reviews.apache.org/r/67965/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> Benchmarking:
> 
> tl:dr: more than 80% performance improvment across board. Performance of 
> subtraction is now on par or better than the addition, especially when there 
> are a large number of sub-ranges.
> 
> Build with -O2 optimization, ran on a multicore machine with peak frequency 
> at 2.2GHz:
> 
> Took 1.617706ms to perform 1000 'a += b' operations on ports:[1-6, 11-16, 
> 21-26...91-96] and ports:[3-8, 13-18, 23-28..., 93-98] with 10 sub-ranges
> Took 1.607999ms to perform 1000 'a -= b' operations on ports:[1-6, 11-16, 
> 21-26...91-96] and ports:[3-8, 13-18, 23-28..., 93-98] with 10 sub-ranges
> Took 3.094113ms to perform 1000 'a + b' operations on ports:[1-6, 11-16, 
> 21-26...91-96] and ports:[3-8, 13-18, 23-28..., 93-98] with 10 sub-ranges
> Took 3.152291ms to perform 1000 'a - b' operations on ports:[1-6, 11-16, 
> 21-26...91-96] and ports:[3-8, 13-18, 23-28..., 93-98] with 10 sub-ranges
> 
> Took 14.908924ms to perform 1000 'a += b' operations on ports:[1-6, 11-16, 
> 21-26...991-996] and ports:[3-8, 13-18, 23-28..., 993-998] with 100 sub-ranges
> Took 13.564767ms to perform 1000 'a -= b' operations on ports:[1-6, 11-16, 
> 21-26...991-996] and ports:[3-8, 13-18, 23-28..., 993-998] with 100 sub-ranges
> Took 25.523443ms to perform 1000 'a + b' operations on ports:[1-6, 11-16, 
> 21-26...991-996] and ports:[3-8, 13-18, 23-28..., 993-998] with 100 sub-ranges
> Took 24.871216ms to perform 1000 'a - b' operations on ports:[1-6, 11-16, 
> 21-26...991-996] and ports:[3-8, 13-18, 23-28..., 993-998] 

Re: Review Request 67891: Added multi-framework scalability guidelines to the documentation.

2018-07-20 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67891/#review206283
---




docs/app-framework-development-guide.md
Lines 55 (patched)
<https://reviews.apache.org/r/67891/#comment289182>

Do we want to give any guidance on what values we considere large? 
Otherwise, it might add more confusion than necessary?


- Vinod Kone


On July 13, 2018, 9:33 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67891/
> ---
> 
> (Updated July 13, 2018, 9:33 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, Meng Zhu, and Vinod Kone.
> 
> 
> Bugs: MESOS-8238
> https://issues.apache.org/jira/browse/MESOS-8238
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Running multiple frameworks requires co-operation from framework
> developers; this outlines some initial guidelines that should be
> followed to ensure running multiple frameworks together performs
> well.
> 
> 
> Diffs
> -
> 
>   docs/app-framework-development-guide.md 
> 035ac1f80e063c75786845475d573c1ee03570c0 
> 
> 
> Diff: https://reviews.apache.org/r/67891/diff/2/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 67857: Improved release policy documentation visibility.

2018-07-09 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67857/#review205871
---


Ship it!




Ship It!

- Vinod Kone


On July 9, 2018, 8:29 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67857/
> ---
> 
> (Updated July 9, 2018, 8:29 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Since the release policy can be a very important document for Mesos
> operators, add a top-level link to it. Update the HTTP API version
> link to point to the section that discusses that policy.
> 
> 
> Diffs
> -
> 
>   docs/home.md 68b1c74c60b21c573ffcc1e36c744d48dc769193 
>   docs/versioning.md a5e55ce87772bd12fead36460fd42cb2ad1f3428 
> 
> 
> Diff: https://reviews.apache.org/r/67857/diff/1/
> 
> 
> Testing
> ---
> 
> Manually viewed docs and verified that the links work.
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 67791: Prevented master from asking agents to shutdown on auth failures.

2018-07-05 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67791/#review205758
---


Ship it!




Ship It!

- Vinod Kone


On July 3, 2018, 11:57 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67791/
> ---
> 
> (Updated July 3, 2018, 11:57 p.m.)
> 
> 
> Review request for mesos, Alexander Rojas, Greg Mann, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8987
> https://issues.apache.org/jira/browse/MESOS-8987
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The Mesos master sends a `ShutdownMessage` to an agent if there is an
> authentication or an authorization error during agent (re)registration.
> 
> Upon receipt of this message, the agent kills alls its tasks and commits
> suicide. This means that transient auth errors can lead to whole agents
> being killed along with it's tasks.
> 
> This patch prevents the master from sending a `ShutdownMessage` in these
> cases.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ddc8df0ea82241be6c733237feef1553c7669eb2 
>   src/tests/authentication_tests.cpp bd46cbc6d565ea8f2f6956c0424a76ad58607017 
>   src/tests/master_authorization_tests.cpp 
> 80b9d49ba334b915461ff5d6df6c9f922d7593e3 
> 
> 
> Diff: https://reviews.apache.org/r/67791/diff/3/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh` on GNU/Linux
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 67817: Improved logging for offers and inverse offers.

2018-07-03 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67817/#review205683
---


Fix it, then Ship it!





src/master/master.cpp
Lines 9573 (patched)
<https://reviews.apache.org/r/67817/#comment288583>

Can we log `*slave` here instead of just slave id? im assuming that pointer 
is still valid here.


- Vinod Kone


On July 3, 2018, 7:16 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67817/
> ---
> 
> (Updated July 3, 2018, 7:16 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Greg Mann, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Log offer IDs and inverse offer IDs when sending out offers and
> inverse offers so it is easier to match them to their ACCEPT or DECLINE
> calls and removals.
> 
> Also log at `VLOG(2)` level which resources are offered.
> 
> NOTE: It is possible to enable `VLOG(2)` logs just for `master.cpp` by
> setting the following env variable when starting the master:
> `GLOG_vmodule=master=2`.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ddc8df0ea82241be6c733237feef1553c7669eb2 
> 
> 
> Diff: https://reviews.apache.org/r/67817/diff/1/
> 
> 
> Testing
> ---
> 
> Manual testing =).
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-05-31 Thread Vinod Kone


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > Can you add a unit test for this?
> 
> Benno Evers wrote:
> It's tricky because we need very precise control over the scheduling, and 
> I'm not sure our testing infrastructure provides it. But I'll look into it.

I see.  The bug is in the allocator, so you cannot use a mock allocator 
unfortunately for control. Consider pausing the clock to have better control in 
the test.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
---


On May 31, 2018, 4:32 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated May 31, 2018, 4:32 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-05-31 Thread Vinod Kone


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Line 9459 (original), 9459 (patched)
> > <https://reviews.apache.org/r/67403/diff/1/?file=2033322#file2033322line9459>
> >
> > not yours, but can you log framework id here too?

s/framework id/*framework/


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Lines 9470 (patched)
> > <https://reviews.apache.org/r/67403/diff/1/?file=2033322#file2033322line9470>
> >
> > log framework id?

s/framework id/*framework/


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
---


On May 31, 2018, 4:32 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated May 31, 2018, 4:32 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-05-31 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
---



Can you add a unit test for this?


src/master/master.cpp
Line 9459 (original), 9459 (patched)
<https://reviews.apache.org/r/67403/#comment286512>

not yours, but can you log framework id here too?



src/master/master.cpp
Lines 9466 (patched)
<https://reviews.apache.org/r/67403/#comment286510>

s/allocator/master/ ? we care about master invariant here right?



src/master/master.cpp
Lines 9470 (patched)
<https://reviews.apache.org/r/67403/#comment286511>

log framework id?


- Vinod Kone


On May 31, 2018, 4:32 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated May 31, 2018, 4:32 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 66982: Added a `decline` flag in `recoverResources` in the allocator.

2018-05-07 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66982/#review202598
---



Do we need this optimization?

- Vinod Kone


On May 7, 2018, 6:09 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66982/
> ---
> 
> (Updated May 7, 2018, 6:09 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Kapil Arya, Till Toenshoff, and 
> Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The `decline` flag acts as an hint to the allocator that this resource
> recover is triggered by the framework "declining" the offered resources.
> Currently this includes: framework `DECLINE` calls, framework `ACCEPT`
> calls with filters explicitly set and offer timeouts. Note, framework
> `ACCEPT` calls with unset filters (i.e. default filters) are not
> considered as "declining" offered resources.
> 
> 
> Diffs
> -
> 
>   include/mesos/allocator/allocator.hpp 
> e5a5355646c834280008867e89eb258eaefc2f1d 
>   src/master/allocator/mesos/allocator.hpp 
> c453c015b234deff7efd00269da25dcec8cbf1ae 
>   src/master/allocator/mesos/hierarchical.hpp 
> 955ae3e6a9e3c790fb260311ec4b4ef725a826d3 
>   src/master/allocator/mesos/hierarchical.cpp 
> 1000968be6a2935a4cac571414d7f06d7df7acf0 
>   src/master/master.cpp 3b5d2eba3f602f68a6bb1e00444b01fb58a1bfc2 
>   src/tests/allocator.hpp 341efa665ad0ce897e087fb8d73ec50fd041d559 
>   src/tests/master_allocator_tests.cpp 
> e1aef8a9625a805e7ad2dfad37bfeedee82f160d 
>   src/tests/slave_recovery_tests.cpp afe8b8a6cc21421a37164016d7fe01bf96a559da 
> 
> 
> Diff: https://reviews.apache.org/r/66982/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 66947: Increased the timeout for waiting for `reaped` to be invoked.

2018-05-07 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66947/#review202560
---


Ship it!




LGTM.

- Vinod Kone


On May 4, 2018, 9:32 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66947/
> ---
> 
> (Updated May 4, 2018, 9:32 a.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Gilbert Song, Greg Mann, and Vinod 
> Kone.
> 
> 
> Bugs: MESOS-8876
> https://issues.apache.org/jira/browse/MESOS-8876
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously after the container process is reaped by the Docker
> executor, we will wait 3 seconds for `reaped` to be invoked.
> However in some cases (e.g., launch a Docker container to use
> an external rexray volume), there will be more than 3 seconds
> after the container process exits and before the `docker run`
> command returns (i.e., `reaped` invoked). So in this patch,
> the timeout is increased to 60 seconds.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp 16509dae3acebc59635e925073372adfb3786edc 
> 
> 
> Diff: https://reviews.apache.org/r/66947/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 66963: Updated the 1.6 CHANGELOG.

2018-05-04 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66963/#review202456
---


Ship it!




Ship It!

- Vinod Kone


On May 4, 2018, 5:18 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66963/
> ---
> 
> (Updated May 4, 2018, 5:18 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Chun-Hung Hsiao, Gaston 
> Kleiman, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated the 1.6 CHANGELOG.
> 
> 
> Diffs
> -
> 
>   CHANGELOG befaab123e0a8fc2a8bb659a0a0bb00ab30c74f5 
> 
> 
> Diff: https://reviews.apache.org/r/66963/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 66460: Added a `call()` method to the v1 scheduler library.

2018-04-06 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66460/#review200670
---




include/mesos/v1/scheduler.hpp
Line 50 (original), 54-55 (patched)
<https://reviews.apache.org/r/66460/#comment281510>

Since we don't guarantee backwards compat for this library, can we just 
update the signature of `send`?



include/mesos/v1/scheduler.hpp
Lines 112-113 (patched)
<https://reviews.apache.org/r/66460/#comment281509>

What do you mean by `SUBSCRIBED` calls?


- Vinod Kone


On April 6, 2018, 9 p.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66460/
> ---
> 
> (Updated April 6, 2018, 9 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds a `call()` method to the scheduler library that allows
> clients to send a `v1::scheduler::Call` to the master and receive a
> `v1::scheduler::Response`.
> 
> It is going to be used to test operation state reconciliation.
> 
> 
> Diffs
> -
> 
>   include/mesos/v1/scheduler.hpp d56e08861d5190ef66992d383dc5710d8f6ce661 
>   src/java/jni/org_apache_mesos_v1_scheduler_V0Mesos.cpp 
> 60b17b9be74132c81532d22eba681feb899b22a3 
>   src/scheduler/scheduler.cpp ecef916ffd0797a13552525ff2cda0f99ca57e74 
> 
> 
> Diff: https://reviews.apache.org/r/66460/diff/2/
> 
> 
> Testing
> ---
> 
> `sudo bin/mesos-tests` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>



Re: Review Request 65791: Removed stale generated HTTP endpoint documentation.

2018-02-23 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65791/#review198234
---


Ship it!




Ship It!

- Vinod Kone


On Feb. 24, 2018, 12:29 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65791/
> ---
> 
> (Updated Feb. 24, 2018, 12:29 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Vinod Kone.
> 
> 
> Bugs: MESOS-8442
> https://issues.apache.org/jira/browse/MESOS-8442
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This is now generated by the website CI bot, rather than having
> to be checked in and kept up to date within the docs/ folder.
> 
> 
> Diffs
> -
> 
>   docs/endpoints/files/browse.json.md 
> 6b21e50bb0c0098babedcd612a5acfd0e6ae0aab 
>   docs/endpoints/files/browse.md 0fa204a85fc62aa4b43dacc07aec6951b618a015 
>   docs/endpoints/files/debug.json.md 2806a5844f49a3c051e58db0723879c9627e18e1 
>   docs/endpoints/files/debug.md 33380663eab59c704c610bd392676aee99528f77 
>   docs/endpoints/files/download.json.md 
> 31d442a7e1e3d4c6e32986ec85e42f3364f37de9 
>   docs/endpoints/files/download.md 202d015484b22e63c9d24610808a43d091786844 
>   docs/endpoints/files/read.json.md 64f348e76d61d3d0ad364f4712d3c1d90818d6d1 
>   docs/endpoints/files/read.md ef95c80739eeb09f52658470429c6ffb9b41fb8c 
>   docs/endpoints/index.md 871153774c4b21cdca4afec9bc2ce2bd955b17c4 
>   docs/endpoints/logging/toggle.md 014a1969a7efb709e06c0c90ca023f2fa747414e 
>   docs/endpoints/master/api/v1.md 2d778f33c830caccb0254cf15adc46a86b5db48a 
>   docs/endpoints/master/api/v1/scheduler.md 
> f029d8e6e90bc7d974d60ad4a71d63807f0e901a 
>   docs/endpoints/master/create-volumes.md 
> c9ce39ead8041f249ee6f85d1afdd8f90b38ce9b 
>   docs/endpoints/master/destroy-volumes.md 
> ff82e96a6ee1e9f6e1bbc565ab40c98b758c1227 
>   docs/endpoints/master/flags.md 6b3de41d45278cc34a4f4f0e52b76407ab96b790 
>   docs/endpoints/master/frameworks.md 
> 7aa22130dfa8702d8b93690cb75f1323ea3c2223 
>   docs/endpoints/master/health.md 3752a1bb4a9a7c2cc5cbf25b021592280c174a80 
>   docs/endpoints/master/machine/down.md 
> 05abd155e693163eb50f675d2724534a6f87dfba 
>   docs/endpoints/master/machine/up.md 
> 8a3488bb3f96bc7470aaea5114559cb593cbb3a8 
>   docs/endpoints/master/maintenance/schedule.md 
> 86ae30b06118f13593e91b70dbec97945b1d9bd8 
>   docs/endpoints/master/maintenance/status.md 
> f8e1c7eed44e14a8324b582a2db91e523b2fbc15 
>   docs/endpoints/master/quota.md 6ce7811136ac1c4ae00a3011dca17cbde948f14e 
>   docs/endpoints/master/redirect.md 3c568e3eef658f9a489008f4f84745827f8002f0 
>   docs/endpoints/master/reserve.md 8174027f0e66dc4e9ae7bca951ed9ca38033008d 
>   docs/endpoints/master/roles.json.md 
> 9acba2cfed120918b2571d7336651d125cbe 
>   docs/endpoints/master/roles.md de58120172c390ef1f084031077fd9b865b817db 
>   docs/endpoints/master/slaves.md 9365cdb9e5b17cdf26ed38a0ab1fff8afef38f00 
>   docs/endpoints/master/state-summary.md 
> 21b8703bb011daf5cc560550e6c2ce36ba615bba 
>   docs/endpoints/master/state.json.md 
> 62a723051270ab24591b956805aee5455848201b 
>   docs/endpoints/master/state.md 09a478263d4cd015040f47619b0aa94341c3bba2 
>   docs/endpoints/master/tasks.json.md 
> f9553d7bde7de33b02795ba9088a88e68cb2a0cc 
>   docs/endpoints/master/tasks.md 019a14bb6c8d8acd3aab4a1f7ba6893f4bad835b 
>   docs/endpoints/master/teardown.md b017084e3d195363615a59011671fdbd76182a1a 
>   docs/endpoints/master/unreserve.md 324dcded2013aeb6c167a695bbad1da69340cd57 
>   docs/endpoints/master/weights.md 3c9019e74867cbd931dc5c68a0d8de3ea269e5b2 
>   docs/endpoints/metrics/snapshot.md 0912fa939b38fd1d8335e165c7aeddf1f076 
>   docs/endpoints/profiler/start.md a25f0d0468f2ad9f8d1ff41f6c50ea0bbe87d316 
>   docs/endpoints/profiler/stop.md 5028d13f8a3f9aa3d41b60959a9cc3d6b03fb7d9 
>   docs/endpoints/registrar/registry.md 
> e347db269839d60fac6de0777b245e830279661e 
>   docs/endpoints/slave/api/v1.md b394f7fac2c4600b7e11f5c7f4def44edc22e7db 
>   docs/endpoints/slave/api/v1/executor.md 
> 5ef40e3ca5ad072b5bbfd480d4e0b56d9530a17f 
>   docs/endpoints/slave/api/v1/resource_provider.md 
> c721c6d005224807bdb4fbfcf4fd7c30edb82e6c 
>   docs/endpoints/slave/containers.md f8bcb3847a6e29dc98d424cdfeacfce26ad98d67 
>   docs/endpoints/slave/flags.md 4a8a0361124078342887c291acaa73c9f6f4bc4d 
>   docs/endpoints/slave/health.md 988567cdbe885dbfe1270732f84e529b13983546 
>   docs/endpoints/slave/monitor/statistics.json.md 
> 51d7c39ff994aeabf0bb

Re: Review Request 65570: Attached/detached volume directory for task which has volume specified.

2018-02-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65570/#review197359
---



Can you add tests for this in a subsequent review please?


src/slave/slave.hpp
Line 434 (original), 442 (patched)
<https://reviews.apache.org/r/65570/#comment277482>

s/at it to make it can be/to it so that it can be/



src/slave/slave.hpp
Line 435 (original), 443 (patched)
<https://reviews.apache.org/r/65570/#comment277483>

s/, so/. So/



src/slave/slave.cpp
Line 1043 (original), 1044 (patched)
<https://reviews.apache.org/r/65570/#comment277493>

can you fix the naming to be consistent with below comments?



src/slave/slave.cpp
Lines 1075 (patched)
<https://reviews.apache.org/r/65570/#comment277484>

`containerInfo`



src/slave/slave.cpp
Lines 1078-1087 (patched)
<https://reviews.apache.org/r/65570/#comment277495>

kill this. see below.



src/slave/slave.cpp
Lines 1089-1091 (patched)
<https://reviews.apache.org/r/65570/#comment277497>

kill this.



src/slave/slave.cpp
Lines 1109- (patched)
<https://reviews.apache.org/r/65570/#comment277488>

I don't think we need this check. We want to show non PV directories too in 
the UI.



src/slave/slave.cpp
Lines 1113 (patched)
<https://reviews.apache.org/r/65570/#comment277491>

executorRunPath



src/slave/slave.cpp
Lines 1120 (patched)
<https://reviews.apache.org/r/65570/#comment277490>

s/executorVolumePath/executorDirectoryPath/



src/slave/slave.cpp
Lines 1131 (patched)
<https://reviews.apache.org/r/65570/#comment277492>

taskDirectoryPath



src/slave/slave.cpp
Lines 1157-1167 (patched)
<https://reviews.apache.org/r/65570/#comment277494>

kill this.



src/slave/slave.cpp
Lines 1196-1198 (patched)
<https://reviews.apache.org/r/65570/#comment277496>

kill this.



src/slave/slave.cpp
Lines 1219-1221 (patched)
<https://reviews.apache.org/r/65570/#comment277498>

kill this.



src/slave/slave.cpp
Lines 1231 (patched)
<https://reviews.apache.org/r/65570/#comment277499>

taskDirectoryPath


- Vinod Kone


On Feb. 10, 2018, 2:59 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65570/
> ---
> 
> (Updated Feb. 10, 2018, 2:59 p.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8565
> https://issues.apache.org/jira/browse/MESOS-8565
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Attached/detached volume directory for task which has volume specified.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 30151c4886e12e9183a971b86b854e28a8ca1b39 
>   src/slave/slave.cpp f98f37321872d090176b7cc50873fc3c627773f5 
> 
> 
> Diff: https://reviews.apache.org/r/65570/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65551: Stopped shutting down the whole default executor on task launch failure.

2018-02-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65551/#review197314
---




src/launcher/default_executor.cpp
Line 786 (original), 788 (patched)
<https://reviews.apache.org/r/65551/#comment277455>

If a launch for nested container fails, don't we get `NotFound`?


- Vinod Kone


On Feb. 10, 2018, 6:35 a.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65551/
> ---
> 
> (Updated Feb. 10, 2018, 6:35 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Greg Mann, Qian Zhang, and Vinod 
> Kone.
> 
> 
> Bugs: MESOS-8468
> https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The default executor would be completely shutdown on a
> `LAUNCH_NESTED_CONTAINER` failure.
> 
> This patch makes it kill the affected task group instead of shutting
> down and killing all task groups.
> 
> 
> Diffs
> -
> 
>   src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 
> 
> 
> Diff: https://reviews.apache.org/r/65551/diff/4/
> 
> 
> Testing
> ---
> 
> `sudo make check` on GNU/Linux
> 
> Regression test on https://reviews.apache.org/r/65552/
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>



Re: Review Request 65551: Stopped shutting down the whole default executor on task launch failure.

2018-02-12 Thread Vinod Kone


> On Feb. 12, 2018, 7:57 p.m., Joseph Wu wrote:
> > src/launcher/default_executor.cpp
> > Line 1479 (original), 1492 (patched)
> > <https://reviews.apache.org/r/65551/diff/4/?file=1955928#file1955928line1509>
> >
> > This booleans seems like a remnant of the time when the default 
> > executor would only launch a single task group.  The value is initialized 
> > to false, and is flipped to true upon launching the first TaskGroup.
> > 
> > It is `CHECK`'d in `__launchGroup(...)` (seems redundant, as that is a 
> > continuation of `launchGroup` where `launched = true;` is set); in `wait()` 
> > (similarly redundant, but has more entrypoints); and in `shutdown()`.
> > 
> > You can consider removing the bool in a separate patch.

I made the same comment in the previous review :)


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65551/#review197298
---


On Feb. 10, 2018, 6:35 a.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65551/
> ---
> 
> (Updated Feb. 10, 2018, 6:35 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Greg Mann, Qian Zhang, and Vinod 
> Kone.
> 
> 
> Bugs: MESOS-8468
> https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The default executor would be completely shutdown on a
> `LAUNCH_NESTED_CONTAINER` failure.
> 
> This patch makes it kill the affected task group instead of shutting
> down and killing all task groups.
> 
> 
> Diffs
> -
> 
>   src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 
> 
> 
> Diff: https://reviews.apache.org/r/65551/diff/4/
> 
> 
> Testing
> ---
> 
> `sudo make check` on GNU/Linux
> 
> Regression test on https://reviews.apache.org/r/65552/
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>



Re: Review Request 65550: Made default executor not shutdown if unsubscribed during task launch.

2018-02-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65550/#review197304
---


Fix it, then Ship it!





src/launcher/default_executor.cpp
Line 349 (original), 349 (patched)
<https://reviews.apache.org/r/65550/#comment277450>

Not yours, but I think we can kill this. I think this was added pre 
multiple task groups support. We should be able to look into the `containers` 
map AFAICT.



src/launcher/default_executor.cpp
Line 647 (original), 642 (patched)
<https://reviews.apache.org/r/65550/#comment277448>

Can you log that we are skipping wait in an `else` statement.



src/launcher/default_executor.cpp
Line 651 (original), 647 (patched)
<https://reviews.apache.org/r/65550/#comment277449>

Can you log here because this gets called from couple different sites?


- Vinod Kone


On Feb. 10, 2018, 6:35 a.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65550/
> ---
> 
> (Updated Feb. 10, 2018, 6:35 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
> https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The default executor would unnecessarily shutdown if, while launching a
> task group, it gets unsubscribed after having successfully launched the
> task group's containers.
> 
> 
> Diffs
> -
> 
>   src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 
> 
> 
> Diff: https://reviews.apache.org/r/65550/diff/2/
> 
> 
> Testing
> ---
> 
> `make check` on GNU/Linux
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>



Re: Review Request 65593: Added a test to ensure master removes fail-launched deafult executor.

2018-02-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65593/#review197300
---


Fix it, then Ship it!




Description: s/fail-launched deafult/default executor that failed to launch/


src/tests/slave_tests.cpp
Lines 4733 (patched)
<https://reviews.apache.org/r/65593/#comment277431>

s/unscheule/unschedule/
s/.So that/and that/


- Vinod Kone


On Feb. 10, 2018, 12:36 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65593/
> ---
> 
> (Updated Feb. 10, 2018, 12:36 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
> https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test ensures that agent sends ExitedExecutorMessage when the
> task group fails to launch due to unscheule GC failure. So that
> master's executor bookkeeping entry is removed.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_tests.cpp 628b0d0fc862264c9553e1660c7df548df9cd4a1 
> 
> 
> Diff: https://reviews.apache.org/r/65593/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65549: Improved some default executor log messages.

2018-02-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65549/#review197302
---


Ship it!




Ship It!

- Vinod Kone


On Feb. 10, 2018, 6:34 a.m., Gaston Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65549/
> ---
> 
> (Updated Feb. 10, 2018, 6:34 a.m.)
> 
> 
> Review request for mesos, Greg Mann, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8468
> https://issues.apache.org/jira/browse/MESOS-8468
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Improved some default executor log messages.
> 
> 
> Diffs
> -
> 
>   src/launcher/default_executor.cpp 4a619859095cc2d30f4806813f64a2e48c83b3ea 
> 
> 
> Diff: https://reviews.apache.org/r/65549/diff/2/
> 
> 
> Testing
> ---
> 
> None, this patch doesn't contain functional changes.
> 
> 
> Thanks,
> 
> Gaston Kleiman
> 
>



Re: Review Request 65449: Fixed an bug where executor info lingers on master if failed to launch.

2018-02-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65449/#review197293
---


Fix it, then Ship it!





src/slave/slave.cpp
Line 1909 (original), 1916 (patched)
<https://reviews.apache.org/r/65449/#comment277417>

Can you add a comment here on why you are not sending 
`ExitedExecutorMessage`? Is it because the expectation is that agent will 
(eventually) re-register and reconcile the state in this scenario?



src/slave/slave.cpp
Lines 2678 (patched)
<https://reviews.apache.org/r/65449/#comment277419>

s/task,/task;/



src/slave/slave.cpp
Lines 2692 (patched)
<https://reviews.apache.org/r/65449/#comment277420>

s/received/it was received/



src/slave/slave.cpp
Lines 2738 (patched)
<https://reviews.apache.org/r/65449/#comment277422>

There is another step for us to trigger this right?

```
(4) Master now sends another `runTaskMessage` for the same executor id with 
`launch_executor = true`.
```



src/slave/slave.cpp
Lines 2759 (patched)
<https://reviews.apache.org/r/65449/#comment277424>

s/launch the executor/launch the executor but the master doesn't know about 
it yet because the `ExitedExecutorMessage` is still in flight./



src/slave/slave.cpp
Lines 2785 (patched)
<https://reviews.apache.org/r/65449/#comment277426>

Should we be sending an `ExitedExecutorMessage` in this case to be safe? I 
guess not strictly require since the expectation is that one is already in 
flight and if it gets dropped we will hopefully reconcile. Worth adding a 
comment though for posterity.



src/slave/slave.cpp
Lines 2789 (patched)
<https://reviews.apache.org/r/65449/#comment277427>

Ideally we should have tests for each of these scenarios here. Do we?



src/slave/slave.cpp
Lines 3402 (patched)
<https://reviews.apache.org/r/65449/#comment277428>

"Consider doing a `CHECK` here since this shouldn't be possible."


- Vinod Kone


On Feb. 5, 2018, 3 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65449/
> ---
> 
> (Updated Feb. 5, 2018, 3 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
> https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master relies on `ExitedExecutorMessage` from the agent to recycle
> executor entry. However, this message won't be sent if the executor
> never actually launched (due to transient error), leaving executor
> info on the master lingering and resource claimed.
> See MESOS-1720.
> 
> This patch fixes this issue by sending the `ExitedExecutorMessage`
> from the agent if the executor is never launched.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 30151c4886e12e9183a971b86b854e28a8ca1b39 
>   src/slave/slave.cpp f98f37321872d090176b7cc50873fc3c627773f5 
>   src/tests/mock_slave.hpp 942ead57fc67bdd2a268c67575952349838dc280 
>   src/tests/mock_slave.cpp 597d7abef20dd5f89b16e4616233f02760b9d037 
>   src/tests/slave_tests.cpp b54688add9949b9c4ac2ce3a42503a411e6da09f 
> 
> 
> Diff: https://reviews.apache.org/r/65449/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> Dedicated test in #65448
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65504: Made master set `launch_executor` in the RunTask(Group)Message.

2018-02-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65504/#review197286
---


Fix it, then Ship it!





src/master/master.cpp
Lines 4920-4924 (original), 4919-4928 (patched)
<https://reviews.apache.org/r/65504/#comment277412>

I think this should be 

```
bool launchExecutor = true;
if (task.has_executor()) {
 launchExecutor = isLaunchExecutor(task.executor().executor_id(), 
framework, slave);
 if (launchExecutor) {
   addExecutor(task.executor(), framework, slave);
   consumed += task.executor().resources()
 }
 
} 

```

otherwise, the log line below says we are launching a command task on an 
existing executor which is wrong.


- Vinod Kone


On Feb. 5, 2018, 2:49 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65504/
> ---
> 
> (Updated Feb. 5, 2018, 2:49 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
> https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> By setting a new field `launch_executor` in the RunTask(Group)Message,
> the master is able to control the executor creation on the agent.
> 
> Also refactored `addTask()` logic. Added two new functions:
> `isTaskLaunchExecutor()` checks if a task needs to launch executor;
> `addExecutor()` adds executor to the framework and slave.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp b434d2398b8815811345b6586ca586d2025cb2a2 
>   src/master/master.cpp d7d22866f7a4eb87bd8949efafc97e828e7d4b94 
> 
> 
> Diff: https://reviews.apache.org/r/65504/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65571: Handle 'None' passed from the MasterDetector in 'Master::detect()'.

2018-02-12 Thread Vinod Kone


> On Feb. 8, 2018, 8:22 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Lines 2186 (patched)
> > <https://reviews.apache.org/r/65571/diff/1/?file=1954571#file1954571line2186>
> >
> > s/Leader detector indicated no master elected/No master was elected/
> > 
> > More importantly, this changes the semantics a bit. Previously if this 
> > master was the current leader it committed suicide even in this case. But 
> > we don't anymore. Is that what we want?
> > 
> > 
> > Also, where in the interface does it say that None() is retryable. It 
> > says retryable errors are handled internally by the detector?
> 
> Benno Evers wrote:
> Ok, I guess I misinterpreted the documentation on 
> `MasterDetector::detect()`:
> 
> ```
>* Returns MasterInfo after an election has occurred and the elected
>* master is different than that specified (if any), or NONE if an
>* election occurs and no master is elected (e.g., all masters are
>* lost). A failed future is returned if the detector is unable to
>* detect the leading master due to a non-retryable error.
> ```
> 
> Since electing no master sounds like an error, and the future is not 
> failed, I assumed that this case was implicitly classifid as retryable error.
> 
> Anyways, for the semantics I assume that not aborting is at least what 
> the original author intended, otherwise there would be no point to 
> differentiate between `Error` and `None` in the API.
> 
> Additionally, the code that causes the master to crash in this situation 
> was introduced relatively recently (11 July 2017, a8c7ae44c8), before that 
> the `detected()`-handler would have just set `leader` to `None` and quietly 
> continued. So I would argue that this fix is actually restoring the 
> previously existing behaviour, not changing it.

"...before that the detected()-handler would have just set leader to None and 
quietly continued". Is this true? AFAICT the commit you mentioned only adds 
leader domain related changes, doesn't change the behavior we are talking 
about? See: https://reviews.apache.org/r/59763/


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65571/#review197103
---


On Feb. 9, 2018, 12:55 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65571/
> ---
> 
> (Updated Feb. 9, 2018, 12:55 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Vinod Kone.
> 
> 
> Bugs: MESOS-8550
> https://issues.apache.org/jira/browse/MESOS-8550
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The function `MasterDetector::detect()` returns a value of type
> `Future<Option>`, which, according to its documentation,
> can be `None` if an election occured and no master is elected.
> 
> However, the code in `Master::detected()` was only handling the
> cases of a failed future or a valid `MasterInfo` object.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp cc2685a6bc14103c639ce776cf1c912361e93381 
>   src/tests/cluster.hpp ad2b80e658d2f8afcefe9969d62cd33f0c475ce9 
>   src/tests/master_tests.cpp 7112bb4efcc312dc4f68bdc44fb685c7624002b1 
> 
> 
> Diff: https://reviews.apache.org/r/65571/diff/2/
> 
> 
> Testing
> ---
> 
> `./mesos-tests`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 65518: Reaped the container process directly in Docker executor.

2018-02-09 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65518/#review197206
---




src/docker/executor.cpp
Lines 277 (patched)
<https://reviews.apache.org/r/65518/#comment277339>

s/never returns though/to never return although/



src/docker/executor.cpp
Lines 280 (patched)
<https://reviews.apache.org/r/65518/#comment277340>

when can this be None()?



src/docker/executor.cpp
Lines 288 (patched)
<https://reviews.apache.org/r/65518/#comment277336>

Add a LOG line here?



src/docker/executor.cpp
Lines 289 (patched)
<https://reviews.apache.org/r/65518/#comment277341>

s/lambda/lambda;/



src/docker/executor.cpp
Lines 530 (patched)
<https://reviews.apache.org/r/65518/#comment277338>

why not call `reaped` directly? sounds like `reaped` does a bunch of 
necessary cleanup that you are skipping by calling `_reaped`?



src/docker/executor.cpp
Line 490 (original), 535 (patched)
<https://reviews.apache.org/r/65518/#comment277343>

shouldn't we return here if `terminated` already is set in case the `run` 
returns after `reapedContainer` is called?


- Vinod Kone


On Feb. 9, 2018, 1:03 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65518/
> ---
> 
> (Updated Feb. 9, 2018, 1:03 a.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Gilbert Song, Greg Mann, and Vinod 
> Kone.
> 
> 
> Bugs: MESOS-8488
> https://issues.apache.org/jira/browse/MESOS-8488
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Due to a Docker issue (https://github.com/moby/moby/issues/33820),
> Docker daemon will fail to catch a container exit, i.e., the container
> process has already exited but the command `docker ps` shows the
> container still running, this will lead to the "docker run" command
> that we execute in Docker executor never returns, and it will also
> cause the `docker stop` command takes no effect, i.e., it will return
> without error but `docker ps` shows the container still running, so
> the task will stuck in `TASK_KILLING` state.
> 
> To workaround this Docker issue, in this patch we made Docker executor
> reaps the container process directly so Docker executor will be notified
> once the container process exits.
> 
> 
> Diffs
> -
> 
>   src/docker/executor.cpp e4c53d558e414e50b1c429fba8e31e504c63744a 
> 
> 
> Diff: https://reviews.apache.org/r/65518/diff/2/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65448: Added a test to ensure master removes executors that never launched.

2018-02-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65448/#review197139
---


Fix it, then Ship it!





src/tests/slave_tests.cpp
Lines 4643 (patched)
<https://reviews.apache.org/r/65448/#comment277265>

s/toLaunchExecutor/launchExecutor/



src/tests/slave_tests.cpp
Lines 4681 (patched)
<https://reviews.apache.org/r/65448/#comment277266>

s/exited/Exited/



src/tests/slave_tests.cpp
Lines 4725 (patched)
<https://reviews.apache.org/r/65448/#comment277267>

Can you also add a test with default executor? And maybe instead of kill 
task path, try to exercise the authz failure path for that test? Feel free to 
do it in a subsequent review. More tests that exercise the code paths you 
changed the better.


- Vinod Kone


On Feb. 1, 2018, 2:04 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65448/
> ---
> 
> (Updated Feb. 1, 2018, 2:04 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
> https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test ensures that agent send exited executor message when the
> executor is never launched. So that master's executor bookkeeping
> entry is removed. See MESOS-1720.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_tests.cpp b54688add9949b9c4ac2ce3a42503a411e6da09f 
> 
> 
> Diff: https://reviews.apache.org/r/65448/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65572: Fixed two slave recovery tests.

2018-02-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65572/#review197140
---


Ship it!




Ship It!

- Vinod Kone


On Feb. 9, 2018, 1:12 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65572/
> ---
> 
> (Updated Feb. 9, 2018, 1:12 a.m.)
> 
> 
> Review request for mesos, Greg Mann and Vinod Kone.
> 
> 
> Bugs: MESOS-8552
> https://issues.apache.org/jira/browse/MESOS-8552
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> These two tests are affected by the lingering executor fix
> (#65109). We should wait to make sure tasks are actually launched
> on the executor before trigger the agent failover. Otherwise, the
> container will be shutdown upon re-registration because the executor
> has never received any task.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_recovery_tests.cpp 77aa60c953bd0769eaba05f001755e4cec9ba028 
> 
> 
> Diff: https://reviews.apache.org/r/65572/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65449: Fixed an issue where executor info linger on master if failed to launch.

2018-02-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65449/#review197107
---



Description: s/linger/lingers/


src/slave/slave.cpp
Lines 1816 (patched)
<https://reviews.apache.org/r/65449/#comment277226>

`None()` on next line please



src/slave/slave.cpp
Line 2104 (original), 2127 (patched)
<https://reviews.apache.org/r/65449/#comment277227>

don't need to send here?



src/slave/slave.cpp
Line 2543 (original), 2626 (patched)
<https://reviews.apache.org/r/65449/#comment277228>

TASK_LOST or ExitedExecutorMessage



src/slave/slave.cpp
Lines 2671 (patched)
<https://reviews.apache.org/r/65449/#comment277229>

for a command task `executor` should be null. Can you do a CHECK instead?



src/slave/slave.cpp
Lines 2682 (patched)
<https://reviews.apache.org/r/65449/#comment277230>

s/We will drop tasks/In this case we will drop tasks/



src/slave/slave.cpp
Lines 2715 (patched)
<https://reviews.apache.org/r/65449/#comment277231>

should we send this only if executor state is TERMINATED? If the executor 
is in any other state, we are shutting it down below which should result in a 
ExitedExecutorMessage?

Also, can you add a TODO to have master do proper reconciliation instead of 
removing the executor entry in this case?



src/slave/slave.cpp
Lines 2725 (patched)
<https://reviews.apache.org/r/65449/#comment277232>

s/And the master will no longer contain/Master then removes/


- Vinod Kone


On Feb. 5, 2018, 3 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65449/
> ---
> 
> (Updated Feb. 5, 2018, 3 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
> https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master relies on `ExitedExecutorMessage` from the agent to recycle
> executor entry. However, this message won't be sent if the executor
> never actually launched (due to transient error), leaving executor
> info on the master lingering and resource claimed.
> See MESOS-1720.
> 
> This patch fixes this issue by sending the `ExitedExecutorMessage`
> from the agent if the executor is never launched.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 09c01ebd1b5e8008ba9e7d412042f1db76a2c5a5 
>   src/slave/slave.cpp a6a5c93ab2d541c870cb52587495de20ed5ac1f4 
>   src/tests/mock_slave.hpp 29ce7140501888d95d5f2d6c26b752ad276b484a 
>   src/tests/mock_slave.cpp 8357edc7b3a35624c813eccb9ecca9d3b5dbe07c 
>   src/tests/slave_tests.cpp f393a8433a984267adc4db307ef07fcbafd1e62f 
> 
> 
> Diff: https://reviews.apache.org/r/65449/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> Dedicated test in #65448
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65446: Added helper function for the agent to send `ExitedExecutorMessage`.

2018-02-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65446/#review197105
---


Ship it!




Ship It!

- Vinod Kone


On Feb. 5, 2018, 2:45 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65446/
> ---
> 
> (Updated Feb. 5, 2018, 2:45 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
> https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added helper function for the agent to send `ExitedExecutorMessage`.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 09c01ebd1b5e8008ba9e7d412042f1db76a2c5a5 
>   src/slave/slave.cpp a6a5c93ab2d541c870cb52587495de20ed5ac1f4 
> 
> 
> Diff: https://reviews.apache.org/r/65446/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65571: Handle 'None' passed from the MasterDetector in 'Master::detect()'.

2018-02-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65571/#review197103
---




src/master/master.cpp
Lines 2186 (patched)
<https://reviews.apache.org/r/65571/#comment277223>

s/Leader detector indicated no master elected/No master was elected/

More importantly, this changes the semantics a bit. Previously if this 
master was the current leader it committed suicide even in this case. But we 
don't anymore. Is that what we want?

Also, where in the interface does it say that None() is retryable. It says 
retryable errors are handled internally by the detector?


- Vinod Kone


On Feb. 8, 2018, 4:50 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65571/
> ---
> 
> (Updated Feb. 8, 2018, 4:50 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Vinod Kone.
> 
> 
> Bugs: MESOS-8550
> https://issues.apache.org/jira/browse/MESOS-8550
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The function `MasterDetector::detect()` returns a value of type
> `Future<Option>`, which, according to its documentation,
> can be `None` if an election occured and no master is elected.
> 
> However, the code in `Master::detected()` was only handling the
> cases of a failed future or a valid `MasterInfo` object.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp cc2685a6bc14103c639ce776cf1c912361e93381 
>   src/tests/cluster.hpp ad2b80e658d2f8afcefe9969d62cd33f0c475ce9 
>   src/tests/master_tests.cpp 7112bb4efcc312dc4f68bdc44fb685c7624002b1 
> 
> 
> Diff: https://reviews.apache.org/r/65571/diff/1/
> 
> 
> Testing
> ---
> 
> `./mesos-tests`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 65504: Made master set `launch_executor` in the RunTask(Group)Message.

2018-02-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65504/#review197099
---




src/master/master.cpp
Line 3832 (original), 3832 (patched)
<https://reviews.apache.org/r/65504/#comment277210>

Can you add a comment here that for command tasks, even though an executor 
is launched by the agent master is oblivious to that fact? Alternatively, this 
should return true for command tasks.

My proposal:

```
bool islaunchExecutor(
   const ExecutorID& executorId,
   Framework* framework,
   Slave* slave) {

   CHECK_NOTNULL(framework);
   CHECK_NOTNULL(slave);
   
   if (!slave->hasExecutor(framework->id(), executorId())) {
  CHECK(!framework->hasExecutor(slave->id, executorId()))
<< "Executor '" << executorId
<< "' known to the framework " << *framework
<< " but unknown to the agent " << *slave;

  return true;
}
  }
  
  return false;
}
```



src/master/master.cpp
Lines 4922 (patched)
<https://reviews.apache.org/r/65504/#comment277211>

```
if (launchExecutor && task.has_executor()) {
  addExecutor(task.executor(), framework, slave);
  consumed += task.executor().resources();
}
```



src/master/master.cpp
Lines 4992 (patched)
<https://reviews.apache.org/r/65504/#comment277212>

s/message.launch_executor()/launchExecutor/



src/master/master.cpp
Lines 5154-5165 (patched)
<https://reviews.apache.org/r/65504/#comment277215>

```
bool launchExecutor = isLaunchExecutor(executor.executor_id(), framework, 
slave);
if (launchExecutor) {
  addExecutor(executor, framework, slave);
  executorResources = executor.resources();
  totalResources += executorResources;
}

```



src/master/master.cpp
Lines 5185-5186 (patched)
<https://reviews.apache.org/r/65504/#comment277218>

this doesn't seem correct?

shouldn't this be

```
CHECK(_offeredResources.contains(taskResources))
 << _offeredResources << " does not contain " << taskResources;
 
_offeredResources -= taskResources;

```

also, you need to do another check for executor resources before the 
`foreach` loop

```
CHECK(_offeredResources.contains(executorResources))
 << _offeredResources << " does not contain " << executorResources;
 
_offeredResources -= executorResources;

```



src/master/master.cpp
Line 5159 (original), 5201 (patched)
<https://reviews.apache.org/r/65504/#comment277222>

if we do the changes above maybe we can just have `taskGroupResources` 
variable instead of `totalResources` and use it here?



src/master/master.cpp
Lines 5203 (patched)
<https://reviews.apache.org/r/65504/#comment277219>

s/message.launch_executor()/executor/


- Vinod Kone


On Feb. 5, 2018, 2:49 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65504/
> ---
> 
> (Updated Feb. 5, 2018, 2:49 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
> https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> By setting a new field `launch_executor` in the RunTask(Group)Message,
> the master is able to control the executor creation on the agent.
> 
> Also refactored `addTask()` logic. Added two new functions:
> `isTaskLaunchExecutor()` checks if a task needs to launch executor;
> `addExecutor()` adds executor to the framework and slave.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp b434d2398b8815811345b6586ca586d2025cb2a2 
>   src/master/master.cpp cc2685a6bc14103c639ce776cf1c912361e93381 
> 
> 
> Diff: https://reviews.apache.org/r/65504/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65573: Fixed an error in the 1.3.2 CHANGELOG.

2018-02-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65573/#review197101
---


Ship it!




Doesn't need to depend on the earlier chain?

- Vinod Kone


On Feb. 8, 2018, 7:12 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65573/
> ---
> 
> (Updated Feb. 8, 2018, 7:12 p.m.)
> 
> 
> Review request for mesos, Jie Yu, Michael Park, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed an error in the 1.3.2 CHANGELOG.
> 
> 
> Diffs
> -
> 
>   CHANGELOG b7c38f54d9f728d13c319daa4e43c658118ff1b7 
> 
> 
> Diff: https://reviews.apache.org/r/65573/diff/1/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 65445: Added new protobuf field `launch_executor` in RunTask(Group)Message.

2018-02-06 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65445/#review196924
---


Ship it!




Ship It!

- Vinod Kone


On Feb. 5, 2018, 2:44 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65445/
> ---
> 
> (Updated Feb. 5, 2018, 2:44 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
> https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This boolean flag is used for the master to specify whether a
> new executor should be launched for the task or taskGroup (with
> the exception of command executor). This will let the master
> to control executor creation on the agent.
> 
> Also updated the relevant message handlers and mock functions.
> 
> 
> Diffs
> -
> 
>   src/messages/messages.proto 0db44a24979cfdc748cd7fa3acd9e0346b14cfd3 
>   src/slave/slave.hpp 09c01ebd1b5e8008ba9e7d412042f1db76a2c5a5 
>   src/slave/slave.cpp a6a5c93ab2d541c870cb52587495de20ed5ac1f4 
>   src/tests/mock_slave.hpp 29ce7140501888d95d5f2d6c26b752ad276b484a 
>   src/tests/mock_slave.cpp 8357edc7b3a35624c813eccb9ecca9d3b5dbe07c 
>   src/tests/slave_tests.cpp f393a8433a984267adc4db307ef07fcbafd1e62f 
> 
> 
> Diff: https://reviews.apache.org/r/65445/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65449: Fixed an issue where executor info linger on master if failed to launch.

2018-02-02 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65449/#review196665
---



partial review.


src/master/master.hpp
Lines 867-871 (patched)
<https://reviews.apache.org/r/65449/#comment276422>

s/toLaunchExecutor/launchExecutor/

More importnatly, we don't typically use this pattern of passing a pointer 
and changing its value for `int` like fields. See below.



src/master/master.cpp
Line 4914 (original), 4918 (patched)
<https://reviews.apache.org/r/65449/#comment276423>

How about we figure whether to launch an executor or not here and then call 
`addTask`?



src/master/master.cpp
Lines 4986 (patched)
<https://reviews.apache.org/r/65449/#comment276424>

This comes out as `0 executor launch` or `1 executor launch` which reads 
weird.

how about:

```
LOG(INFO) << "Launching task " << task.task_id() << " of framework "
  << *framework << " with resources " << task.resources()
  << " on agent " << *slave << " on "
  << (message.launch_executor() ? " new executor" << " existing 
executor");
```



src/master/master.cpp
Line 5143 (original), 5151-5160 (patched)
<https://reviews.apache.org/r/65449/#comment276426>

ditto. see above.

do the check for executor launch outside the for loop.



src/master/master.cpp
Lines 5177-5179 (patched)
<https://reviews.apache.org/r/65449/#comment276427>

see above. should be set outside the loop.



src/master/master.cpp
Lines 5193-5194 (patched)
<https://reviews.apache.org/r/65449/#comment276428>

see above.



src/slave/slave.hpp
Lines 170 (patched)
<https://reviews.apache.org/r/65449/#comment276429>

s/toLaunchExecutor/launchExecutor/



src/slave/slave.hpp
Lines 180 (patched)
<https://reviews.apache.org/r/65449/#comment276430>

ditto. here and below.



src/slave/slave.cpp
Lines 1806 (patched)
<https://reviews.apache.org/r/65449/#comment276431>

I think old masters didn't used to set this. If 1.x masters do it then we 
can do a CHECK.



src/slave/slave.cpp
Lines 1816 (patched)
<https://reviews.apache.org/r/65449/#comment276432>

    you can put fraemworkInfo in the previous line?



src/slave/slave.cpp
Lines 2065 (patched)
<https://reviews.apache.org/r/65449/#comment276433>

do you need the temporary variable?


- Vinod Kone


On Feb. 1, 2018, 2:03 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65449/
> ---
> 
> (Updated Feb. 1, 2018, 2:03 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
> https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Master relies on `ExitedExecutorMessage` from the agent to recycle
> executor entry. However, this message won't be sent if the executor
> never actually launched (due to transient error), leaving executor
> info on the master lingering and resource claimed.
> See MESOS-1720.
> 
> This patch fixes this issue by sending the `ExitedExecutorMessage`
> from the agent if the executor is never launched. And by
> setting a new field `launch_executor` in the RunTask(Group)Message,
> the master is able to control the executor creation on the agent.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp b434d2398b8815811345b6586ca586d2025cb2a2 
>   src/master/master.cpp b97ebae6ebfd8ae0f73e617d0c55e140b9c3fce7 
>   src/slave/slave.hpp 09c01ebd1b5e8008ba9e7d412042f1db76a2c5a5 
>   src/slave/slave.cpp a6a5c93ab2d541c870cb52587495de20ed5ac1f4 
>   src/tests/mock_slave.hpp 29ce7140501888d95d5f2d6c26b752ad276b484a 
>   src/tests/mock_slave.cpp 8357edc7b3a35624c813eccb9ecca9d3b5dbe07c 
>   src/tests/slave_tests.cpp f393a8433a984267adc4db307ef07fcbafd1e62f 
> 
> 
> Diff: https://reviews.apache.org/r/65449/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> Dedicated test in #65448
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65437: Added documentation for fault domains.

2018-02-02 Thread Vinod Kone


> On Feb. 2, 2018, 10:11 p.m., Vinod Kone wrote:
> >

Fixed the issues before committing.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65437/#review196740
---


On Feb. 2, 2018, 7:30 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65437/
> ---
> 
> (Updated Feb. 2, 2018, 7:30 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-8483
> https://issues.apache.org/jira/browse/MESOS-8483
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fault domains are a new feature in 1.5 which did not yet have
> a corresponding description in the documentation.
> 
> 
> Diffs
> -
> 
>   docs/configuration/master-and-agent.md 
> f247498ead43a16bbef5afb49d453073dd9ab6ef 
>   docs/fault-domains.md PRE-CREATION 
>   docs/home.md f5b65cc7895b10181e1b8483e3ee9da596d00fd6 
> 
> 
> Diff: https://reviews.apache.org/r/65437/diff/3/
> 
> 
> Testing
> ---
> 
> No.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 65437: Added documentation for fault domains.

2018-02-02 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65437/#review196740
---


Fix it, then Ship it!





docs/fault-domains.md
Line 74 (original), 79 (patched)
<https://reviews.apache.org/r/65437/#comment276578>

s/London/San Francisco/ ?



docs/fault-domains.md
Line 84 (original), 89 (patched)
<https://reviews.apache.org/r/65437/#comment276580>

s/companies/company's/


- Vinod Kone


On Feb. 2, 2018, 7:30 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65437/
> ---
> 
> (Updated Feb. 2, 2018, 7:30 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-8483
> https://issues.apache.org/jira/browse/MESOS-8483
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fault domains are a new feature in 1.5 which did not yet have
> a corresponding description in the documentation.
> 
> 
> Diffs
> -
> 
>   docs/configuration/master-and-agent.md 
> f247498ead43a16bbef5afb49d453073dd9ab6ef 
>   docs/fault-domains.md PRE-CREATION 
>   docs/home.md f5b65cc7895b10181e1b8483e3ee9da596d00fd6 
> 
> 
> Diff: https://reviews.apache.org/r/65437/diff/3/
> 
> 
> Testing
> ---
> 
> No.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 65437: Added documentation for fault domains.

2018-02-02 Thread Vinod Kone


> On Feb. 1, 2018, 10:18 p.m., Vinod Kone wrote:
> > docs/fault-domains.md
> > Lines 63 (patched)
> > <https://reviews.apache.org/r/65437/diff/2/?file=1950950#file1950950line63>
> >
> > s/The default/By default, the/
> 
> Benno Evers wrote:
> Are you sure about this? It would imply to me as a reader that this 
> behaviour can be changed.

I see. Yea, I'm not very sure then, what you have is probably less confusing.


> On Feb. 1, 2018, 10:18 p.m., Vinod Kone wrote:
> > docs/fault-domains.md
> > Lines 80 (patched)
> > <https://reviews.apache.org/r/65437/diff/2/?file=1950950#file1950950line80>
> >
> > Non-region-aware frameworks will only receive offers from the primary 
> > region (region containing masters). They won't get offers from other 
> > regions.
> 
> Benno Evers wrote:
> Does this actually imply that users should upgrade all their frameworks 
> to be partition-aware before configuring masters and agents with fault 
> domains? In this example, it would be quite devastating if two out of three 
> datacenters suddenly went completely unused.

s/primary/local/ in my first comment.

Do you mean REGION aware and not PARTION aware? So, yes, frameworks need to 
register with REGION_AWARE capability if they want remote region offers. The 
rationale was that most frameworks want their workloads in the local region and 
not magically go to remote regions with potentially higher latencies without an 
explicit opt-in. Note that region-aware frameworks should ideally expose the 
remote launching capability to their users too (e.g., via a configuration 
option in the app definition) before they start registering with REGION_AWARE 
capability.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65437/#review196653
---


On Feb. 2, 2018, 7:30 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65437/
> ---
> 
> (Updated Feb. 2, 2018, 7:30 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-8483
> https://issues.apache.org/jira/browse/MESOS-8483
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fault domains are a new feature in 1.5 which did not yet have
> a corresponding description in the documentation.
> 
> 
> Diffs
> -
> 
>   docs/configuration/master-and-agent.md 
> f247498ead43a16bbef5afb49d453073dd9ab6ef 
>   docs/fault-domains.md PRE-CREATION 
>   docs/home.md f5b65cc7895b10181e1b8483e3ee9da596d00fd6 
> 
> 
> Diff: https://reviews.apache.org/r/65437/diff/3/
> 
> 
> Testing
> ---
> 
> No.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 65447: Refactored couple of launch task sanity checks into a single code path.

2018-02-01 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65447/#review196664
---




src/slave/slave.cpp
Lines 2064-2065 (original), 2049-2050 (patched)
<https://reviews.apache.org/r/65447/#comment276406>

This comment doesn't make sense here?



src/slave/slave.cpp
Lines 2071-2079 (original), 2056-2064 (patched)
<https://reviews.apache.org/r/65447/#comment276407>

hmm. we are mutating state here in a function that looks like it is only 
doing checks. that's weird.

also, i don't know if i like this refactor because there are still some pre 
run checks being done in `_run` and `__run` themselves. it's not clear which 
ones this helper does and which ones it doesn't.

typically, we refactor common code into a function if that function makes 
sense by itself. i don't think that's the case here so i would actually propose 
that we do not refactor unless we find a good abstraction.



src/slave/slave.cpp
Lines 2117-2118 (patched)
<https://reviews.apache.org/r/65447/#comment276408>

each arg on a different line.



src/slave/slave.cpp
Lines 2123 (patched)
<https://reviews.apache.org/r/65447/#comment276409>

CHECK_NOTNULL(getFramework(frameworkId))


- Vinod Kone


On Feb. 1, 2018, 2:03 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65447/
> ---
> 
> (Updated Feb. 1, 2018, 2:03 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Initial steps common to `_run()` and `__run()` on the task launch
> code path are refactored.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 09c01ebd1b5e8008ba9e7d412042f1db76a2c5a5 
>   src/slave/slave.cpp a6a5c93ab2d541c870cb52587495de20ed5ac1f4 
> 
> 
> Diff: https://reviews.apache.org/r/65447/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65446: Added helper function for the agent to send `ExitedExecutorMessage`.

2018-02-01 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65446/#review196663
---


Ship it!




Ship It!

- Vinod Kone


On Feb. 1, 2018, 2:01 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65446/
> ---
> 
> (Updated Feb. 1, 2018, 2:01 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
> https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added helper function for the agent to send `ExitedExecutorMessage`.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 09c01ebd1b5e8008ba9e7d412042f1db76a2c5a5 
>   src/slave/slave.cpp a6a5c93ab2d541c870cb52587495de20ed5ac1f4 
> 
> 
> Diff: https://reviews.apache.org/r/65446/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65445: Added new protobuf field `launch_executor` in RunTask(Group)Message.

2018-02-01 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65445/#review196661
---




src/messages/messages.proto
Lines 333 (patched)
<https://reviews.apache.org/r/65445/#comment276398>

s/field/field value/



src/messages/messages.proto
Lines 368 (patched)
<https://reviews.apache.org/r/65445/#comment276399>

s/field/field value/



src/slave/slave.hpp
Lines 149 (patched)
<https://reviews.apache.org/r/65445/#comment276404>

There are other handlers installed that do not use the `handle` prefix, 
eg., `KillTaskMessage`?



src/slave/slave.hpp
Lines 161 (patched)
<https://reviews.apache.org/r/65445/#comment276401>

s/toLaunchExecutor/launchExecutor/



src/slave/slave.hpp
Lines 193 (patched)
<https://reviews.apache.org/r/65445/#comment276402>

ditto.



src/tests/mock_slave.hpp
Lines 111 (patched)
<https://reviews.apache.org/r/65445/#comment276405>

s/toLaunchExecutor/launchExecutor/
    
here and below.


- Vinod Kone


On Feb. 1, 2018, 2 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65445/
> ---
> 
> (Updated Feb. 1, 2018, 2 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
> https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This boolean flag is used for the master to specify whether a
> new executor should be launched for the task or taskGroup (with
> the exception of command executor). This will let the master
> to control executor creation on the agent.
> 
> Also updated the relevant message handlers and mock functions.
> 
> 
> Diffs
> -
> 
>   src/messages/messages.proto 0db44a24979cfdc748cd7fa3acd9e0346b14cfd3 
>   src/slave/slave.hpp 09c01ebd1b5e8008ba9e7d412042f1db76a2c5a5 
>   src/slave/slave.cpp a6a5c93ab2d541c870cb52587495de20ed5ac1f4 
>   src/tests/mock_slave.hpp 29ce7140501888d95d5f2d6c26b752ad276b484a 
>   src/tests/mock_slave.cpp 8357edc7b3a35624c813eccb9ecca9d3b5dbe07c 
>   src/tests/slave_tests.cpp f393a8433a984267adc4db307ef07fcbafd1e62f 
> 
> 
> Diff: https://reviews.apache.org/r/65445/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 65437: Added documentation for fault domains.

2018-02-01 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65437/#review196653
---




docs/configuration/master-and-agent.md
Lines 99 (patched)
<https://reviews.apache.org/r/65437/#comment276395>

Can you also call this out in the home page of documentation 
"http://mesos.apache.org/documentation/latest/;, perhaps in the 
"Administration" section?



docs/fault-domains.md
Lines 15-16 (patched)
<https://reviews.apache.org/r/65437/#comment276378>

How about

"A fault domain is a 2 level hierarchy of regions and zones."



docs/fault-domains.md
Lines 51 (patched)
<https://reviews.apache.org/r/65437/#comment276380>

s/slave/agent/



docs/fault-domains.md
Lines 56 (patched)
<https://reviews.apache.org/r/65437/#comment276381>

s/master,/master/



docs/fault-domains.md
Lines 61 (patched)
<https://reviews.apache.org/r/65437/#comment276382>

This should be:

* All masters must belong to the same "region" to avoid cross-region quorum 
writes. It is recommended to put them in different zones within that region for 
high availability.



docs/fault-domains.md
Lines 63 (patched)
<https://reviews.apache.org/r/65437/#comment276383>

s/The default/By default, the/



docs/fault-domains.md
Lines 71 (patched)
<https://reviews.apache.org/r/65437/#comment276384>

s/connected in a/connected through a/



docs/fault-domains.md
Lines 74-78 (patched)
<https://reviews.apache.org/r/65437/#comment276389>

In this example, "europe", "na" and "asia" would be regions. 

Also, I would not use cross continental regions as an example because we 
haven't tested the latency limits. Lets use "california", "atlanta" and "new 
york" as regions.

Zones would be racks within the data center.

One of these regions should contain all the masters, preferrably spread 
across zones. Putting them across racks gives them enough isolation for faults 
without sacrificing latency for quorum writes.



docs/fault-domains.md
Lines 80 (patched)
<https://reviews.apache.org/r/65437/#comment276391>

Non-region-aware frameworks will only receive offers from the primary 
region (region containing masters). They won't get offers from other regions.



docs/fault-domains.md
Lines 88 (patched)
<https://reviews.apache.org/r/65437/#comment276393>

s/to read the `domain`/to register with `REGION_AWARE` capability/



docs/fault-domains.md
Lines 90-99 (patched)
<https://reviews.apache.org/r/65437/#comment276394>

This example is a bit too complicated. I would say lets use an example of 
one on-prem data center which is extended by a cloud provider hosted remote 
region.


- Vinod Kone


On Jan. 31, 2018, 5:50 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65437/
> ---
> 
> (Updated Jan. 31, 2018, 5:50 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-8483
> https://issues.apache.org/jira/browse/MESOS-8483
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fault domains are a new feature in 1.5 which did not yet have
> a corresponding description in the documentation.
> 
> 
> Diffs
> -
> 
>   docs/configuration/master-and-agent.md 
> f247498ead43a16bbef5afb49d453073dd9ab6ef 
>   docs/fault-domains.md PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65437/diff/2/
> 
> 
> Testing
> ---
> 
> No.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 65382: Reaped Docker executor only when it can be connected.

2018-01-30 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65382/#review196538
---



The blocking review https://reviews.apache.org/r/65384/  seems non-existent?

- Vinod Kone


On Jan. 30, 2018, 3:42 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65382/
> ---
> 
> (Updated Jan. 30, 2018, 3:42 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-8125
> https://issues.apache.org/jira/browse/MESOS-8125
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Reaped Docker executor only when it can be connected.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.cpp f1d7d3e6afa119a6a24b054dcaa5ee68dbea965d 
> 
> 
> Diff: https://reviews.apache.org/r/65382/diff/2/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65420: Fixed a coding error in a log message of Docker containerizer.

2018-01-30 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65420/#review196537
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 30, 2018, 3:42 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65420/
> ---
> 
> (Updated Jan. 30, 2018, 3:42 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a coding error in a log message of Docker containerizer.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.cpp f1d7d3e6afa119a6a24b054dcaa5ee68dbea965d 
> 
> 
> Diff: https://reviews.apache.org/r/65420/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65382: Reaped Docker executor only when it can be connected.

2018-01-30 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65382/#review196536
---


Fix it, then Ship it!





src/slave/containerizer/docker.cpp
Line 975 (original), 976 (patched)
<https://reviews.apache.org/r/65382/#comment276219>

s/in this case/in which case/



src/slave/containerizer/docker.cpp
Line 976 (original), 977 (patched)
<https://reviews.apache.org/r/65382/#comment276220>

s/not error/safe to skip/



src/slave/containerizer/docker.cpp
Lines 979 (patched)
<https://reviews.apache.org/r/65382/#comment276218>

Do we return a "failed" container termination here? AFAICT, we return 
`None` on `wait` in this case?



src/slave/containerizer/docker.cpp
Lines 1040-1048 (patched)
<https://reviews.apache.org/r/65382/#comment276221>

I moved this comment block to #1027



src/slave/containerizer/docker.cpp
Lines 1041 (patched)
<https://reviews.apache.org/r/65382/#comment276222>

s/, this/. This/



src/slave/containerizer/docker.cpp
Lines 1047 (patched)
<https://reviews.apache.org/r/65382/#comment276223>

s/reboots/reboots or restarts/ ?



src/slave/containerizer/docker.cpp
Lines 1052-1054 (patched)
<https://reviews.apache.org/r/65382/#comment276224>

LOG(WARNING) << "Failed to connect to executor '" << executor.id
 << "' of framework " << framework.id
 
Can we log the SocketError here as well?


- Vinod Kone


On Jan. 30, 2018, 3:42 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65382/
> -------
> 
> (Updated Jan. 30, 2018, 3:42 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-8125
> https://issues.apache.org/jira/browse/MESOS-8125
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Reaped Docker executor only when it can be connected.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.cpp f1d7d3e6afa119a6a24b054dcaa5ee68dbea965d 
> 
> 
> Diff: https://reviews.apache.org/r/65382/diff/2/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65344: Updated the docs for agent ping timeout flags.

2018-01-30 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65344/#review196534
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 25, 2018, 9:18 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65344/
> ---
> 
> (Updated Jan. 25, 2018, 9:18 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gaston Kleiman, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated docs for agent ping timeout flags.
> 
> 
> Diffs
> -
> 
>   docs/configuration/master.md e3cfb24443bfcff27606efcae46e70bd2614955e 
>   src/master/flags.cpp b49cb63bfebb286ae4f5e665652dc5f8f2964ceb 
> 
> 
> Diff: https://reviews.apache.org/r/65344/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 65423: Fixed duplicate protobuf linking in Python examples.

2018-01-30 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65423/#review196533
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 30, 2018, 9:35 p.m., Till Toenshoff wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65423/
> ---
> 
> (Updated Jan. 30, 2018, 9:35 p.m.)
> 
> 
> Review request for mesos, Armand Grillet, Benno Evers, and Vinod Kone.
> 
> 
> Bugs: MESOS-8483
> https://issues.apache.org/jira/browse/MESOS-8483
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> see summary.
> 
> 
> Diffs
> -
> 
>   src/examples/python/test_executor.py 
> f29da95b449bfb3aac655f02517f66403293ad10 
>   src/examples/python/test_framework.py 
> 836c1a994f03ff96a87b64c489d25e717ae1ce71 
> 
> 
> Diff: https://reviews.apache.org/r/65423/diff/1/
> 
> 
> Testing
> ---
> 
> make check and manual testing by invoking: 
> `src/examples/python/test-framework local` - does not crash anymore.
> 
> 
> Thanks,
> 
> Till Toenshoff
> 
>



Re: Review Request 65044: Added the v1 API 'GET_OPERATIONS' call for master and agent.

2018-01-26 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65044/#review196370
---



API changes look good to me. I haven't reviewed the tests though.


src/slave/http.cpp
Lines 1685 (patched)
<https://reviews.apache.org/r/65044/#comment275992>

Not yours, but can we make the logging consistent here and in master for 
operator api handlers? Looks like master doesn't even log here?


- Vinod Kone


On Jan. 25, 2018, 10:36 a.m., Jan Schlicht wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65044/
> ---
> 
> (Updated Jan. 25, 2018, 10:36 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gaston Kleiman, Greg Mann, Jie 
> Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-8424
> https://issues.apache.org/jira/browse/MESOS-8424
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The 'GET_OPERATIONS' call lists all operations known to master or agent.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 315820041990d28e5cdae71c8385d4551d56558c 
>   include/mesos/master/master.proto 3e34634f3b864222d07374936c0d9a18269c2fbd 
>   include/mesos/v1/agent/agent.proto 9e8b49defb83ffd4dd240ebb3a09b2dd3610ab2a 
>   include/mesos/v1/master/master.proto 
> 6759c3004e7e04c0044360b0c54f4438c2f6ec8a 
>   src/master/http.cpp c489b6f525f157811549b2cc84addd9d85e87990 
>   src/master/master.hpp b434d2398b8815811345b6586ca586d2025cb2a2 
>   src/master/validation.cpp f0b86775b7919ba6aa4a73038edb78a0adca68f4 
>   src/slave/http.hpp 1619bb77e3e7d5a6d387e4d4bb071d83ce914967 
>   src/slave/http.cpp 77e711ceeb0e2613d629b5e21fd686f85dfad11a 
>   src/slave/validation.cpp 0c2ccda177734cf3c47c0346ed34d20d58e7d932 
>   src/tests/api_tests.cpp bdacc30be4dc8656a41a0c47d0e350d48e59ad94 
> 
> 
> Diff: https://reviews.apache.org/r/65044/diff/8/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>



Re: Review Request 65245: Renamed `LOG` by `Stream logs` in Web UI.

2018-01-24 Thread Vinod Kone


> On Jan. 22, 2018, 7:33 p.m., Vinod Kone wrote:
> > src/webui/master/static/agent.html
> > Line 45 (original), 45 (patched)
> > <https://reviews.apache.org/r/65245/diff/1/?file=1943087#file1943087line45>
> >
> > Hmm. This is inconsistent with how we did the stream vs download for 
> > sandboxes. The stream is just a hyper link with the filename followed by a 
> > Download button.
> > 
> > Can we do the same with these logs for consistency? A hyperlink "LOG" 
> > that opens up pailer (which is what it is today) and a "Download" button 
> > next to it with some space (like we did for sandboxes).
> 
> Armand Grillet wrote:
> I imagine that your comment concerns /r/65246/ where I have changed the 
> front-end to have a multi-button instead of links.
> 
> Our sandboxes provide a way to see/download any file in a sandbox 
> (everything is clickable) whereas the masters/agents UI provide a list of 
> information with only one actionnable element: getting the logs. The current 
> UI with a single link showing "LOG" in the middle of a list could be more 
> descriptive and that is why I have changed it.
> 
> I will discard /r/65245/ and put its content in /r/65246/ and update the 
> UI according to your comment. I will leave the first screenshot in the 
> "Testing Done" part and add a second one with the UI after the patch update 
> so that you and Benjamin can decide.

SGTM.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65245/#review195926
---


On Jan. 20, 2018, 4:14 p.m., Armand Grillet wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65245/
> ---
> 
> (Updated Jan. 20, 2018, 4:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-8454
> https://issues.apache.org/jira/browse/MESOS-8454
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The home and agents Web UI are now more descriptive. The name of the
> functions called when clicking these two links have also been updated.
> 
> 
> Diffs
> -
> 
>   src/webui/master/static/agent.html 908d01d1d28f2a4fc595b581fa16515c5be1410a 
>   src/webui/master/static/home.html df3eef61eb9564dbdfc3d138ecfd18e85f747f11 
>   src/webui/master/static/js/controllers.js 
> 59665869dbea77c00740e42f2590473181dfe2fe 
> 
> 
> Diff: https://reviews.apache.org/r/65245/diff/1/
> 
> 
> Testing
> ---
> 
> Checked manually that the feature was still working on the home and agents 
> endpoints.
> 
> 
> Thanks,
> 
> Armand Grillet
> 
>



Re: Review Request 65106: Removed the misleading `isRemovable` helper in the master.

2018-01-22 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65106/#review195962
---


Fix it, then Ship it!





src/master/master.cpp
Lines 10170 (patched)
<https://reviews.apache.org/r/65106/#comment275413>

s/Indicated/Indicates/ ?



src/master/master.cpp
Lines 10301 (patched)
<https://reviews.apache.org/r/65106/#comment275414>

s/simply/simplify/


- Vinod Kone


On Jan. 23, 2018, 12:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65106/
> ---
> 
> (Updated Jan. 23, 2018, 12:11 a.m.)
> 
> 
> Review request for mesos, James Peach, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8389
> https://issues.apache.org/jira/browse/MESOS-8389
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In the past, the notion of a "removable" task meant: the task is
> terminal and acknowledged. However, the `isRemovable` helper only
> defines removability by the task state (terminal or unreachable)
> but not whether the terminal update is acknowledged.
> 
> As a result, the code that is calling `isRemovable` is unintuitive.
> One example of a confusing piece of code is within updateTask. Here,
> we have logic which says, if the task is removable, recover the
> resources but don't remove it. This seems more intuitive if directly
> described as: "if the task is no longer consuming resources (e.g.
> transitioned to terminal or unreachable) then recover the resources".
> 
> If one looks up the documentation of `isRemovable`, it says "When a
> task becomes removable, it is erased from the master's primary task
> data structures", but that isn't accurate since this function doesn't
> say whether the terminal task has been acknowledged, which is
> required for a task to be removable.
> 
> To make an immediate improvement, this patch removes `isRemovable`
> (no pun intended) in favor of directly checking the states we care
> about. In the future, we may want to introduce some helpers like
> `isAllocatedResources` to describe whether the task's resources are
> considered allocated.
> 
> If we do introduce a notion of `isRemovable` in the future, it seems
> it should take the `Task*` so that it can say whether the task could
> be "removed" from the master, which includes checking that terminal
> tasks have been acknowledged. However, "remove" is somewhat of a
> confusing term, since what we're actually doing is "completing" the
> task.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 05136781b6b1539f37c283e8127e4bafb187a0d1 
>   src/master/master.cpp 3af96b1d2024ab1b951537ebc6bbc225cfa9cc88 
> 
> 
> Diff: https://reviews.apache.org/r/65106/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 65253: Fixed dropped events on the master operator API stream.

2018-01-22 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65253/#review195953
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 22, 2018, 11:36 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65253/
> ---
> 
> (Updated Jan. 22, 2018, 11:36 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Benjamin Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-8469
> https://issues.apache.org/jira/browse/MESOS-8469
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's `Subscribers::send()` call path previously accessed
> event-related master state after an asynchronous call. Thus, if
> that state changed after the call to `Subscribers::send()` but
> before the event was actually sent, messages could be dropped.
> 
> This issue was observed when a TASK_KILLED update failed to be
> sent on an operator's stream because the task was removed from
> master state in between the aforementioned async calls.
> 
> This patch updates that call path to capture a shared copy of
> event-related metadata so that the asynchronous calls have a
> consistent view of the master's state at the time of the event.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 05136781b6b1539f37c283e8127e4bafb187a0d1 
>   src/master/master.cpp 3af96b1d2024ab1b951537ebc6bbc225cfa9cc88 
> 
> 
> Diff: https://reviews.apache.org/r/65253/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 65263: Updated `SlaveRecoveryTest.RecoverCompletedExecutor` to verify gc.

2018-01-22 Thread Vinod Kone


> On Jan. 22, 2018, 8:24 p.m., Vinod Kone wrote:
> > Ship It!

To confirm, would this new test have failed without the fix for 
https://issues.apache.org/jira/browse/MESOS-8460 ?


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65263/#review195932
---


On Jan. 22, 2018, 1:21 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65263/
> ---
> 
> (Updated Jan. 22, 2018, 1:21 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-8462
> https://issues.apache.org/jira/browse/MESOS-8462
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In the test `SlaveRecoveryTest.RecoverCompletedExecutor`, when the
> completed executor is recovered, verify its work and meta directories
> gc'ed successfully.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_recovery_tests.cpp 6d8a57138903637d3cf9e5b19a466961d2c120d7 
> 
> 
> Diff: https://reviews.apache.org/r/65263/diff/1/
> 
> 
> Testing
> ---
> 
> Manually ran this test repeatedly (100 times).
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65263: Updated `SlaveRecoveryTest.RecoverCompletedExecutor` to verify gc.

2018-01-22 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65263/#review195932
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 22, 2018, 1:21 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65263/
> ---
> 
> (Updated Jan. 22, 2018, 1:21 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-8462
> https://issues.apache.org/jira/browse/MESOS-8462
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In the test `SlaveRecoveryTest.RecoverCompletedExecutor`, when the
> completed executor is recovered, verify its work and meta directories
> gc'ed successfully.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_recovery_tests.cpp 6d8a57138903637d3cf9e5b19a466961d2c120d7 
> 
> 
> Diff: https://reviews.apache.org/r/65263/diff/1/
> 
> 
> Testing
> ---
> 
> Manually ran this test repeatedly (100 times).
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65108: Added documentation for `protobuf::isTerminalState`.

2018-01-22 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65108/#review195931
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 12, 2018, 1:34 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65108/
> ---
> 
> (Updated Jan. 12, 2018, 1:34 a.m.)
> 
> 
> Review request for mesos, Vinod Kone and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   src/common/protobuf_utils.hpp fd728b7038ffcbb430cd3d635046788ee388e4ec 
> 
> 
> Diff: https://reviews.apache.org/r/65108/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 65107: Updated the task state metrics to be more readable.

2018-01-22 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65107/#review195930
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 12, 2018, 1:34 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65107/
> ---
> 
> (Updated Jan. 12, 2018, 1:34 a.m.)
> 
> 
> Review request for mesos, Vinod Kone and Jiang Yan Xu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 7ed15e4ba2a31c5fe4b8571f645cdca69a3e82f4 
> 
> 
> Diff: https://reviews.apache.org/r/65107/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 65106: Removed the misleading `isRemovable` helper in the master.

2018-01-22 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65106/#review195647
---




src/master/master.hpp
Lines 2208 (patched)
<https://reviews.apache.org/r/65106/#comment274911>

can you use `CHECK_NE` here so that `task->state()` is auto printed when 
this fails?



src/master/master.hpp
Lines 2215 (patched)
<https://reviews.apache.org/r/65106/#comment274910>

s/the UI/in the UI/



src/master/master.hpp
Lines 2334 (patched)
<https://reviews.apache.org/r/65106/#comment274912>

ditto. `CHECK_NE`?



src/master/master.cpp
Lines 10111-10127 (original), 10111-10119 (patched)
<https://reviews.apache.org/r/65106/#comment275327>

move this to #1061

Also, consider moving this to `protobuf_utils.hpp`. AFAICT this is used in 
4 different places in this diff.



src/master/master.cpp
Lines 10247-10261 (original), 10245-10261 (patched)
<https://reviews.apache.org/r/65106/#comment275332>

Not yours, but I wish we can remove this if altogether and CHECK that we 
are only removing a terminal or unreachable task period. AFAICT, the only place 
where we violate this invariant is in `Master::finalize()`. Maybe add a TODO to 
fix?



src/master/master.cpp
Line 11423 (original), 11423 (patched)
<https://reviews.apache.org/r/65106/#comment275333>

CHECK_NE



src/master/master.cpp
Lines 11447 (patched)
<https://reviews.apache.org/r/65106/#comment275336>

Can you include taskid and frameworkid in the error message here for easy 
debuggability?


- Vinod Kone


On Jan. 12, 2018, 1:34 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65106/
> ---
> 
> (Updated Jan. 12, 2018, 1:34 a.m.)
> 
> 
> Review request for mesos, James Peach, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8389
> https://issues.apache.org/jira/browse/MESOS-8389
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In the past, the notion of a "removable" task meant: the task is
> terminal and acknowledged. However, the `isRemovable` helper only
> defines removability by the task state (terminal or unreachable)
> but not whether the terminal update is acknowledged.
> 
> As a result, the code that is calling `isRemovable` is unintuitive.
> One example of a confusing piece of code is within updateTask. Here,
> we have logic which says, if the task is removable, recover the
> resources but don't remove it. This seems more intuitive if directly
> described as: "if the task is no longer consuming resources (e.g.
> transitioned to terminal or unreachable) then recover the resources".
> 
> If one looks up the documentation of `isRemovable`, it says "When a
> task becomes removable, it is erased from the master's primary task
> data structures", but that isn't accurate since this function doesn't
> say whether the terminal task has been acknowledged, which is
> required for a task to be removable.
> 
> To make an immediate improvement, this patch removes `isRemovable`
> (no pun intended) in favor of directly checking the states we care
> about. In the future, we may want to introduce some helpers like
> `isAllocatedResources` to describe whether the task's resources are
> considered allocated.
> 
> If we do introduce a notion of `isRemovable` in the future, it seems
> it should take the `Task*` so that it can say whether the task could
> be "removed" from the master, which includes checking that terminal
> tasks have been acknowledged. However, "remove" is somewhat of a
> confusing term, since what we're actually doing is "completing" the
> task.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 3d5180b8028b2a15ad111f91863e77747b362593 
>   src/master/master.cpp 465336d33008a848df063d4295416eb91f7bb44f 
> 
> 
> Diff: https://reviews.apache.org/r/65106/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 65245: Renamed `LOG` by `Stream logs` in Web UI.

2018-01-22 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65245/#review195926
---




src/webui/master/static/agent.html
Line 45 (original), 45 (patched)
<https://reviews.apache.org/r/65245/#comment275323>

Hmm. This is inconsistent with how we did the stream vs download for 
sandboxes. The stream is just a hyper link with the filename followed by a 
Download button.

Can we do the same with these logs for consistency? A hyperlink "LOG" that 
opens up pailer (which is what it is today) and a "Download" button next to it 
with some space (like we did for sandboxes).


- Vinod Kone


On Jan. 20, 2018, 4:14 p.m., Armand Grillet wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65245/
> ---
> 
> (Updated Jan. 20, 2018, 4:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-8454
> https://issues.apache.org/jira/browse/MESOS-8454
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The home and agents Web UI are now more descriptive. The name of the
> functions called when clicking these two links have also been updated.
> 
> 
> Diffs
> -
> 
>   src/webui/master/static/agent.html 908d01d1d28f2a4fc595b581fa16515c5be1410a 
>   src/webui/master/static/home.html df3eef61eb9564dbdfc3d138ecfd18e85f747f11 
>   src/webui/master/static/js/controllers.js 
> 59665869dbea77c00740e42f2590473181dfe2fe 
> 
> 
> Diff: https://reviews.apache.org/r/65245/diff/1/
> 
> 
> Testing
> ---
> 
> Checked manually that the feature was still working on the home and agents 
> endpoints.
> 
> 
> Thanks,
> 
> Armand Grillet
> 
>



Re: Review Request 65253: Avoided dropping events on the master operator API stream.

2018-01-22 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65253/#review195919
---



Please add a test in the next review.


src/master/master.hpp
Line 1997 (original), 1997 (patched)
<https://reviews.apache.org/r/65253/#comment275307>

put the arg on the next line and align with others.

also `const &`?



src/master/master.hpp
Lines 2002-2003 (patched)
<https://reviews.apache.org/r/65253/#comment275309>

`const &` ?



src/master/master.cpp
Line 11159 (original), 11160 (patched)
<https://reviews.apache.org/r/65253/#comment275314>

can you make this an rvalue ref as suggested by @bmahler offline?



src/master/master.cpp
Line 11205 (original), 11248 (patched)
<https://reviews.apache.org/r/65253/#comment275310>

put the arg on next line.



src/master/master.cpp
Line 11260 (original), 11292 (patched)
<https://reviews.apache.org/r/65253/#comment275319>

So looks like we do make a copy of the event for 
`FRAMEWORK_{ADDED,UPDATED_REMOVED}` and `AGENT_ADDED` after all. By taking 
Event as Shared we are only avoiding copies for `TASK_{ADDED,UPDATED}` and 
`AGENT_REMOVED`. 

A bit inconsistent, but I guess worth it for the perf improvement?


- Vinod Kone


On Jan. 22, 2018, 7:14 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65253/
> ---
> 
> (Updated Jan. 22, 2018, 7:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Benjamin Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-8469
> https://issues.apache.org/jira/browse/MESOS-8469
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The master's Subscribers::send() call path previously accessed
> event-related master state after an asynchronous call. Thus, if
> that state changed after the call to Subscribers::send() but
> before the event was actually sent, messages could be dropped.
> 
> 
> This patch updates that call path to capture a shared copy of
> event-related metadata so that the asynchronous calls have a
> consistent view of the master's state at the time of the event.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 05136781b6b1539f37c283e8127e4bafb187a0d1 
>   src/master/master.cpp 3af96b1d2024ab1b951537ebc6bbc225cfa9cc88 
> 
> 
> Diff: https://reviews.apache.org/r/65253/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 65231: Fixed detaching task volume directories of destroyed frameworks.

2018-01-18 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65231/#review195764
---



Can you or @qian write a unit test that triggered this bug?


src/slave/slave.cpp
Lines 1029 (patched)
<https://reviews.apache.org/r/65231/#comment275066>

Custom executors satisfy this check so it's not fool proof. Lets pass 
`ExecutorInfo` here and check the type instead.



src/slave/slave.cpp
Lines 1072 (patched)
<https://reviews.apache.org/r/65231/#comment275067>

ditto. lets pass `ExecutorInfo` here too.


- Vinod Kone


On Jan. 19, 2018, 1:25 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65231/
> ---
> 
> (Updated Jan. 19, 2018, 1:25 a.m.)
> 
> 
> Review request for mesos, Michael Park, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-8460
> https://issues.apache.org/jira/browse/MESOS-8460
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed detaching task volume directories of destroyed frameworks.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp a07f046c7ed980bed3e3d0d2780295727b87ee44 
>   src/slave/slave.cpp 1672c06894c25e7120bb8a1da15f7d6432eb5954 
> 
> 
> Diff: https://reviews.apache.org/r/65231/diff/3/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 65203: Updated the CHANGELOG for 1.5.0 release.

2018-01-17 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65203/#review195672
---


Fix it, then Ship it!




LGTM


CHANGELOG
Line 21 (original), 21 (patched)
<https://reviews.apache.org/r/65203/#comment274941>

s/Support/support for/


- Vinod Kone


On Jan. 18, 2018, 12:21 a.m., Gilbert Song wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65203/
> ---
> 
> (Updated Jan. 18, 2018, 12:21 a.m.)
> 
> 
> Review request for mesos, Adam B, Anand Mazumdar, Andrew Schwartzmeyer, 
> Benjamin Mahler, Chun-Hung Hsiao, Gaston Kleiman, Greg Mann, Ilya Pronin, Jie 
> Yu, James Peach, Kapil Arya, Joseph Wu, Kevin Klues, Neil Conway, Qian Zhang, 
> Till Toenshoff, Vinod Kone, Jiang Yan Xu, and Zhitao Li.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated the CHANGELOG for 1.5.0 release.
> 
> 
> Diffs
> -
> 
>   CHANGELOG 078c5d64100d81ce84dbdff4055cdda465e19011 
> 
> 
> Diff: https://reviews.apache.org/r/65203/diff/1/
> 
> 
> Testing
> ---
> 
> N/A.
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>



Re: Review Request 64857: Updated example frameworks for mesos-local.

2018-01-16 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64857/#review195518
---




src/examples/dynamic_reservation_framework.cpp
Lines 389 (patched)
<https://reviews.apache.org/r/64857/#comment274709>

don't you need to set roles in other example frameworks too for the local 
case?


- Vinod Kone


On Jan. 15, 2018, 12:29 a.m., Till Toenshoff wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64857/
> ---
> 
> (Updated Jan. 15, 2018, 12:29 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Armand Grillet, Greg Mann, 
> Kapil Arya, and Vinod Kone.
> 
> 
> Bugs: MESOS-8357 and MESOS-8361
> https://issues.apache.org/jira/browse/MESOS-8357
> https://issues.apache.org/jira/browse/MESOS-8361
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Unifies handling of 'local' when supplied for the '--master' flag.
> 
> 
> Diffs
> -
> 
>   src/examples/balloon_framework.cpp 410966e64aa3f146a88c00babaa51b499dd24d36 
>   src/examples/disk_full_framework.cpp 
> d75f769538d17229bdbf332ea7513b1aa8d7c334 
>   src/examples/docker_no_executor_framework.cpp 
> 1fa408d3847d414c23eed8334db8de02e84cf764 
>   src/examples/dynamic_reservation_framework.cpp 
> 538fbe8847a1b1dcbfe48ad0e3678797801a12f5 
>   src/examples/load_generator_framework.cpp 
> 42867992eba8699cab5b70c62a24b87446139406 
>   src/examples/long_lived_framework.cpp 
> 943160d5a42e02cf05fe92999233ce2e792849fd 
>   src/examples/no_executor_framework.cpp 
> 972ef777cec07d1030af208daef69ebe606d071e 
>   src/examples/test_framework.cpp acf1faf1523c8c03483dfafbc8f8e245322527e4 
>   src/examples/test_http_framework.cpp 
> 482f65efc15a7bac52c33a57b2b876191249410a 
> 
> 
> Diff: https://reviews.apache.org/r/64857/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Till Toenshoff
> 
>



Re: Review Request 64849: Added authentication to some example frameworks.

2018-01-16 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64849/#review195515
---




src/examples/dynamic_reservation_framework.cpp
Line 41 (original)
<https://reviews.apache.org/r/64849/#comment274706>

Any particular reason you deleted these?



src/examples/persistent_volume_framework.cpp
Line 501 (original)
<https://reviews.apache.org/r/64849/#comment274707>

Why did you revert this change introduced in the previous review?



src/examples/test_http_framework.cpp
Line 401 (original), 390 (patched)
<https://reviews.apache.org/r/64849/#comment274708>

aah you fixed in this review!


- Vinod Kone


On Jan. 15, 2018, 12:29 a.m., Till Toenshoff wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64849/
> ---
> 
> (Updated Jan. 15, 2018, 12:29 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Armand Grillet, Greg Mann, 
> Kapil Arya, and Vinod Kone.
> 
> 
> Bugs: MESOS-5362 and MESOS-8357
> https://issues.apache.org/jira/browse/MESOS-5362
> https://issues.apache.org/jira/browse/MESOS-8357
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> All example frameworks now support authenticating when registering
> to the master.
> 
> 
> Diffs
> -
> 
>   src/examples/dynamic_reservation_framework.cpp 
> 538fbe8847a1b1dcbfe48ad0e3678797801a12f5 
>   src/examples/persistent_volume_framework.cpp 
> 71db39d7743d31ea34c2aed579d7a1ef2ed95687 
>   src/examples/test_http_framework.cpp 
> 482f65efc15a7bac52c33a57b2b876191249410a 
>   src/tests/persistent_volume_framework_test.sh 
> 2ab22c03e573d4801c73957f9cad2939b3d3174b 
> 
> 
> Diff: https://reviews.apache.org/r/64849/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Till Toenshoff
> 
>



Re: Review Request 65184: Added missing registry-related flags to the master config docs.

2018-01-16 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65184/#review195514
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 16, 2018, 11:16 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65184/
> ---
> 
> (Updated Jan. 16, 2018, 11:16 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added missing registry-related flags to the master config docs.
> 
> 
> Diffs
> -
> 
>   docs/configuration/master.md 3ff97ea5dbae12e17cb137d1367b98130dce9596 
> 
> 
> Diff: https://reviews.apache.org/r/65184/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 64848: Updated example frameworks to make use of added flags.

2018-01-16 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64848/#review195513
---



Love the cleanup! Thanks for doing this.


src/examples/test_http_framework.cpp
Lines 399-401 (original), 401-403 (patched)
<https://reviews.apache.org/r/64848/#comment274705>

hmm. this seems incorrect? how did this compile!?


- Vinod Kone


On Jan. 15, 2018, 12:29 a.m., Till Toenshoff wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64848/
> ---
> 
> (Updated Jan. 15, 2018, 12:29 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Armand Grillet, Greg Mann, 
> Kapil Arya, and Vinod Kone.
> 
> 
> Bugs: MESOS-8357
> https://issues.apache.org/jira/browse/MESOS-8357
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The example frameworks now consistently rely on a set of common flags
> for parameterizing. This unifies parameter names, descriptions and
> validation for those examples.
> 
> 
> Diffs
> -
> 
>   src/examples/balloon_framework.cpp 410966e64aa3f146a88c00babaa51b499dd24d36 
>   src/examples/disk_full_framework.cpp 
> d75f769538d17229bdbf332ea7513b1aa8d7c334 
>   src/examples/docker_no_executor_framework.cpp 
> 1fa408d3847d414c23eed8334db8de02e84cf764 
>   src/examples/dynamic_reservation_framework.cpp 
> 538fbe8847a1b1dcbfe48ad0e3678797801a12f5 
>   src/examples/load_generator_framework.cpp 
> 42867992eba8699cab5b70c62a24b87446139406 
>   src/examples/long_lived_framework.cpp 
> 943160d5a42e02cf05fe92999233ce2e792849fd 
>   src/examples/no_executor_framework.cpp 
> 972ef777cec07d1030af208daef69ebe606d071e 
>   src/examples/persistent_volume_framework.cpp 
> 71db39d7743d31ea34c2aed579d7a1ef2ed95687 
>   src/examples/test_framework.cpp acf1faf1523c8c03483dfafbc8f8e245322527e4 
>   src/examples/test_http_framework.cpp 
> 482f65efc15a7bac52c33a57b2b876191249410a 
> 
> 
> Diff: https://reviews.apache.org/r/64848/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Till Toenshoff
> 
>



Re: Review Request 65167: Detached `virtualLatestPath` when recovering the executor.

2018-01-16 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65167/#review195466
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 15, 2018, 1:49 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65167/
> ---
> 
> (Updated Jan. 15, 2018, 1:49 p.m.)
> 
> 
> Review request for mesos, Vinod Kone and Zhitao Li.
> 
> 
> Bugs: MESOS-8446
> https://issues.apache.org/jira/browse/MESOS-8446
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously we miss to detach `/frameworks/FID/executors/EID/runs/latest`
> when we find the latest run of the executor was completed in the method
> `Framework::recoverExecutor`, that is a leak.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp aeb0fdaeda78f26de2ec790735bfa88c5be72850 
> 
> 
> Diff: https://reviews.apache.org/r/65167/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65070: Updated `ROOT_TaskSandboxPersistentVolume` to check `/files` endpoint.

2018-01-16 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65070/#review195465
---


Ship it!




Nice test!

- Vinod Kone


On Jan. 16, 2018, 3:33 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65070/
> ---
> 
> (Updated Jan. 16, 2018, 3:33 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gilbert Song, Jie Yu, and Vinod 
> Kone.
> 
> 
> Bugs: MESOS-8279
> https://issues.apache.org/jira/browse/MESOS-8279
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated `ROOT_TaskSandboxPersistentVolume` to check `/files` endpoint.
> 
> 
> Diffs
> -
> 
>   src/tests/default_executor_tests.cpp 
> d2cf5fdda3c51d42548c102f076d2b4aebe1170b 
> 
> 
> Diff: https://reviews.apache.org/r/65070/diff/2/
> 
> 
> Testing
> ---
> 
> Manually ran this test 
> (`LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_TaskSandboxPersistentVolume/*`)
>  100 times.
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 64978: Made task's volume directory visible in the /files endpoints.

2018-01-16 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64978/#review195462
---


Fix it, then Ship it!





src/slave/slave.hpp
Lines 837-838 (patched)
<https://reviews.apache.org/r/64978/#comment274638>

can you just take `const Task&` as argument like you did for `detach`?



src/slave/slave.cpp
Line 5911 (original)
<https://reviews.apache.org/r/64978/#comment274639>

this should have been in the previous review.



src/slave/slave.cpp
Lines 9162 (patched)
<https://reviews.apache.org/r/64978/#comment274635>

I would move this comment to the header where the function is declared.



src/slave/slave.cpp
Lines 9183 (patched)
<https://reviews.apache.org/r/64978/#comment274637>

shouldn't this be a CHECK too like you did for `detach` below?



src/slave/slave.cpp
Lines 9219 (patched)
<https://reviews.apache.org/r/64978/#comment274636>

ditto. move comment to the header.


- Vinod Kone


On Jan. 15, 2018, 6:21 a.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64978/
> ---
> 
> (Updated Jan. 15, 2018, 6:21 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gilbert Song, Jie Yu, and Vinod 
> Kone.
> 
> 
> Bugs: MESOS-8279
> https://issues.apache.org/jira/browse/MESOS-8279
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In MESOS-7225, we made a task can access any volumes specified in its
> disk resources from its own sandbox by introducing a workaround to the
> default executor, i.e., add a `SANDBOX_PATH` volume with type `PARENT`
> to the corresponding nested container. It will be translated into a bind
> mount in the nested container's mount namespace, thus not visible in the
> host mount namespace, that means the task's volume directory can not be
> visible in Mesos UI since it operates in the host mount namespace.
> 
> In this patch, to make the task's volume directory visible in Mesos UI,
> we attached the executor's volume directory to it, so when users browse
> task's volume directory in Mesos UI, what they actually browse is the
> executor's volume directory. Note when calling `Files::attach()`, the
> third argument `authorized` is not specified, that is because it is
> already specified when we do the attach for the executor's sandbox and
> it is also applied to the executor's tasks.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp ef0eae21af811cc09f43cd1d4c4ccc0c33cbeb39 
>   src/slave/slave.cpp aeb0fdaeda78f26de2ec790735bfa88c5be72850 
> 
> 
> Diff: https://reviews.apache.org/r/64978/diff/4/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 64978: Made task's volume directory visible in the /files endpoints.

2018-01-14 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64978/#review195404
---




src/slave/slave.cpp
Lines 2813 (patched)
<https://reviews.apache.org/r/64978/#comment274570>

can you move this stuff to `attach` and `detach` helpers as discussed?



src/slave/slave.cpp
Lines 9261 (patched)
<https://reviews.apache.org/r/64978/#comment274569>

Why are you doing the detach here? Detach should only happen when the 
executor run directory is gc'ed or if the task goes out of completed.


- Vinod Kone


On Jan. 14, 2018, 2:30 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64978/
> ---
> 
> (Updated Jan. 14, 2018, 2:30 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gilbert Song, Jie Yu, and Vinod 
> Kone.
> 
> 
> Bugs: MESOS-8279
> https://issues.apache.org/jira/browse/MESOS-8279
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In MESOS-7225, we made a task can access any volumes specified in its
> disk resources from its own sandbox by introducing a workaround to the
> default executor, i.e., add a `SANDBOX_PATH` volume with type `PARENT`
> to the corresponding nested container. It will be translated into a bind
> mount in the nested container's mount namespace, thus not visible in the
> host mount namespace, that means the task's volume directory can not be
> visible in Mesos UI since it operates in the host mount namespace.
> 
> In this patch, to make the task's volume directory visible in Mesos UI,
> we attached the executor's volume directory to it, so when users browse
> task's volume directory in Mesos UI, what they actually browse is the
> executor's volume directory. Note when calling `Files::attach()`, the
> third argument `authorized` is not specified, that is because it is
> already specified when we do the attach for the executor's sandbox and
> it is also applied to the executor's tasks.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp aeb0fdaeda78f26de2ec790735bfa88c5be72850 
> 
> 
> Diff: https://reviews.apache.org/r/64978/diff/3/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65156: Detached the virtual paths regardless of the result of gc.

2018-01-14 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65156/#review195403
---


Ship it!




Can you expand on the description on why this change is necessary?

- Vinod Kone


On Jan. 14, 2018, 2:17 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65156/
> ---
> 
> (Updated Jan. 14, 2018, 2:17 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-8444
> https://issues.apache.org/jira/browse/MESOS-8444
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Detached the virtual paths regardless of the result of gc.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp aeb0fdaeda78f26de2ec790735bfa88c5be72850 
> 
> 
> Diff: https://reviews.apache.org/r/65156/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65056: Added missing fields to the GET_MASTER operator API call.

2018-01-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65056/#review195380
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 13, 2018, 1:56 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65056/
> ---
> 
> (Updated Jan. 13, 2018, 1:56 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Gaston Kleiman, and Vinod Kone.
> 
> 
> Bugs: MESOS-8078
> https://issues.apache.org/jira/browse/MESOS-8078
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Several fields were present in the old master '/state' endpoint,
> which are now not accessible via the V1 operator API. This patch
> adds these fields to the response for the GET_MASTER call.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto f4506fe82ee7c1db77cfb7e94933d5a7f4b024df 
>   include/mesos/v1/master/master.proto 
> 5cb64df12459b1529306cd33691c003172e4b3ec 
>   src/master/http.cpp 0c2cd554124ee400c05f3256af36974a62e2303c 
>   src/tests/api_tests.cpp 28d46436b9c2fe85bde5bc8def7a840b72d29de3 
> 
> 
> Diff: https://reviews.apache.org/r/65056/diff/4/
> 
> 
> Testing
> ---
> 
> tested manually using curl
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 65056: Added missing fields to the GET_MASTER operator API call.

2018-01-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65056/#review195370
---



Can you update the `GetMaster` test in api_tests.cpp?


include/mesos/v1/master/master.proto
Lines 468-472 (patched)
<https://reviews.apache.org/r/65056/#comment274546>

I would get rid of these for now. I would like this to be inferred from 
GetAgents instead of having it at 2 different places.



src/master/http.cpp
Lines 2274-2281 (patched)
<https://reviews.apache.org/r/65056/#comment274545>

i would put group these as:

```
  getMaster->mutable_master_info()->CopyFrom(master->info());
  
  getMaster->set_start_time(master->startTime.secs());
  if (master->electedTime.isSome()) {
getMaster->set_elected_time(master->electedTime.get().secs());
  }
  
  getMaster->set_activated_agents(master->_slaves_active());
  getMaster->set_deactivated_agents(master->_slaves_inactive());
  getMaster->set_unreachable_agents(master->_slaves_unreachable());
  

```

if you kill the `slaves_*` per comment above, i'll leave it upto you 
whether you want the space or not.


- Vinod Kone


On Jan. 13, 2018, 12:11 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65056/
> ---
> 
> (Updated Jan. 13, 2018, 12:11 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Gaston Kleiman, and Vinod Kone.
> 
> 
> Bugs: MESOS-8078
> https://issues.apache.org/jira/browse/MESOS-8078
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Several fields were present in the old master '/state' endpoint,
> which are now not accessible via the V1 operator API. This patch
> adds these fields to the response for the GET_MASTER call.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto f4506fe82ee7c1db77cfb7e94933d5a7f4b024df 
>   include/mesos/v1/master/master.proto 
> 5cb64df12459b1529306cd33691c003172e4b3ec 
>   src/master/http.cpp bc29fafb1f4b515aec3b77850f960c88a65c8362 
> 
> 
> Diff: https://reviews.apache.org/r/65056/diff/3/
> 
> 
> Testing
> ---
> 
> tested manually using curl
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 65141: Fixed the default executor flaky testes in tests/cluster.cpp.

2018-01-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65141/#review195368
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 13, 2018, 12:15 a.m., Gilbert Song wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65141/
> ---
> 
> (Updated Jan. 13, 2018, 12:15 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Alexander Rukletsov, Benjamin 
> Mahler, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7506
> https://issues.apache.org/jira/browse/MESOS-7506
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixes some flaky tests listed below:
> 1. DefaultExecutorTest.KillTask/0
> 2. DefaultExecutorTest.TaskWithFileURI/0
> 3. DefaultExecutorTest.ResourceLimitation/0
> 4. DefaultExecutorTest.KillMultipleTasks/0
> 
> The root cause is that either docker containerizer or mesos
> containerizer have wait() and destroy() rely on the same
> future `ContainerTermination` which means these two methods
> become ready simultaneously, but this is not true for the
> composing containerizer because wait() may finish before
> destroy in which case the `containers_` hasshmap is not
> cleaned up yet in destroy()'s `.onAny` callback.
> 
> 
> Diffs
> -
> 
>   src/tests/cluster.cpp 066dd31a7b5b49823ca642dfcc930f00e920feb3 
> 
> 
> Diff: https://reviews.apache.org/r/65141/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>



Re: Review Request 65140: Reverted "Updated composing containerizer tests.".

2018-01-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65140/#review195367
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 13, 2018, 12:15 a.m., Gilbert Song wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65140/
> ---
> 
> (Updated Jan. 13, 2018, 12:15 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Alexander Rukletsov, Benjamin 
> Mahler, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-8391
> https://issues.apache.org/jira/browse/MESOS-8391
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This reverts commit 84365a140c3730e2d6579ad500118d6749d2f87f.
> 
> 
> Diffs
> -
> 
>   src/tests/containerizer/composing_containerizer_tests.cpp 
> 7c22f162b128c3fdf8d4b20cac73fdf442449d79 
> 
> 
> Diff: https://reviews.apache.org/r/65140/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>



Re: Review Request 65139: Reverted "Fixed `wait()` and `destroy()` in composing containerizer.".

2018-01-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65139/#review195366
---


Ship it!




Ship It!

- Vinod Kone


On Jan. 13, 2018, 12:15 a.m., Gilbert Song wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65139/
> ---
> 
> (Updated Jan. 13, 2018, 12:15 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Alexander Rukletsov, Benjamin 
> Mahler, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-8391
> https://issues.apache.org/jira/browse/MESOS-8391
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This reverts commit 95decd404438abd422794524e01d72a889821566.
> 
> There are two reasons to revert this commit:
>   1. After the agent recovers, if the nested containers that are
>  launched beforehand are killed, they will no longer be updated
>  with new status, because the `WAIT_NESTED_CONTAINER` call from
>  the default executor will end up with a future forever. Please
>  see MESOS-8391 for details.
>   2. The original commit makes the composing containerizer wait()
>  and destroy() rely on the same future of a ContainerTermination
>  promise. This would get into the bug that composing containerizer
>  destroy() may fail due to the wait() future got discarded.
>  Need to protect it by using `undiscardable()`. Please see
>  MESOS-7926 for details.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/composing.cpp 
> 9ace70d9fbd78182715c5ef13fcaf7ad45f76f97 
> 
> 
> Diff: https://reviews.apache.org/r/65139/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>



Re: Review Request 65056: Added missing fields to the GET_MASTER operator API call.

2018-01-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65056/#review195358
---




include/mesos/v1/mesos.proto
Lines 901-909 (patched)
<https://reviews.apache.org/r/65056/#comment274523>

lets move this to `GetMaster` response instead of adding it to MasterInfo.


- Vinod Kone


On Jan. 10, 2018, 11:11 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65056/
> ---
> 
> (Updated Jan. 10, 2018, 11:11 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Gaston Kleiman, and Vinod Kone.
> 
> 
> Bugs: MESOS-8078
> https://issues.apache.org/jira/browse/MESOS-8078
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Several fields were present in the old master '/state' endpoint,
> which are now not accessible via the V1 operator API. This patch
> adds these fields to the response for the GET_MASTER call.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 2c5ae4cd8d9cfaabd42a9e94ce94cd73bfacc6eb 
>   include/mesos/v1/mesos.proto b8e016e60322879e53ffa3bef23481015b0a6d2d 
>   src/internal/evolve.cpp 7758c9b93ad41b45b941c0de8c2b1008fbc4e50d 
>   src/master/http.cpp bc29fafb1f4b515aec3b77850f960c88a65c8362 
>   src/master/master.hpp 5e6ba53c075174a1e514a395ceb17c26201ec470 
> 
> 
> Diff: https://reviews.apache.org/r/65056/diff/2/
> 
> 
> Testing
> ---
> 
> tested manually using curl
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 65074: Fixed comparison logic for additive reconfiguration policy.

2018-01-12 Thread Vinod Kone


> On Jan. 12, 2018, 6:19 p.m., James Peach wrote:
> > src/common/resources.cpp
> > Line 263 (original), 263 (patched)
> > <https://reviews.apache.org/r/65074/diff/2/?file=1939706#file1939706line263>
> >
> > Nit: `static bool compareResourceMetadata(...)`

I'll fix this before committing.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65074/#review195334
---


On Jan. 12, 2018, 3:40 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65074/
> ---
> 
> (Updated Jan. 12, 2018, 3:40 p.m.)
> 
> 
> Review request for mesos, James Peach and Vinod Kone.
> 
> 
> Bugs: MESOS-8410
> https://issues.apache.org/jira/browse/MESOS-8410
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The case where multiple resources have the same name
> was not handled correctly, and could result in false
> negatives.
> 
> 
> Diffs
> -
> 
>   include/mesos/resources.hpp 69bf823f800ac8bfc2bc211895b0967f573009b5 
>   src/common/resources.cpp 1291208b9f2a7c373ff7ec941d3e4310fec92207 
>   src/slave/compatibility.hpp 78b421a01abe5d2178c93832577577a7ba282b38 
>   src/slave/compatibility.cpp 4ead4a5b655f6f3d7812aa52d656830d7cff4598 
>   src/tests/slave_compatibility_tests.cpp 
> ab5ed29f43c593dbd7d90c235aa304b581d932a2 
> 
> 
> Diff: https://reviews.apache.org/r/65074/diff/2/
> 
> 
> Testing
> ---
> 
> * Several new unit tests added in the review
> * `make check`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-12 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review195339
---


Ship it!




LGTM.

I still see open issues though, can you please resolve them?

- Vinod Kone


On Jan. 12, 2018, 7:24 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 12, 2018, 7:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gaston Kleiman, Jie Yu, Vinod 
> Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
> https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If an agent is lost, we try to remove all the tasks that might have
> been lost. If a task is already terminal but has unacknowleged status
> updates, it is expected that we track it in the unreachable tasks list
> so we should remove the CHECK that prevents this. This also backs out
> changes to how unreachable tasks are presented in the HTTP endpoints to
> restore compatibility with previous Mesos releases.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp bc29fafb1f4b515aec3b77850f960c88a65c8362 
>   src/master/master.hpp f915c6f611eaffd5c2cb6cba6c43def679e8f92d 
>   src/master/master.cpp c96cd7090875fbf1b11b1708390e88182c77655b 
>   src/tests/mesos.hpp 93913f2e01898c73e09de58a975aa467e714d882 
>   src/tests/partition_tests.cpp 3813139f576ea01db0197f0fe8a73597db1bb69a 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/7/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 65074: Fixed comparison logic for additive reconfiguration policy.

2018-01-12 Thread Vinod Kone


> On Jan. 12, 2018, 6:19 p.m., James Peach wrote:
> > The bug we hit was where the resources didn't change but containered mount 
> > disks. Are we confident that the test here (where a mount disk is added) 
> > also covers the case where the resources don't change?

I think that's what the last line in the test is verifying? note that it's 
comparing `info2` to `info2`.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65074/#review195334
---


On Jan. 12, 2018, 3:40 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65074/
> ---
> 
> (Updated Jan. 12, 2018, 3:40 p.m.)
> 
> 
> Review request for mesos, James Peach and Vinod Kone.
> 
> 
> Bugs: MESOS-8410
> https://issues.apache.org/jira/browse/MESOS-8410
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The case where multiple resources have the same name
> was not handled correctly, and could result in false
> negatives.
> 
> 
> Diffs
> -
> 
>   include/mesos/resources.hpp 69bf823f800ac8bfc2bc211895b0967f573009b5 
>   src/common/resources.cpp 1291208b9f2a7c373ff7ec941d3e4310fec92207 
>   src/slave/compatibility.hpp 78b421a01abe5d2178c93832577577a7ba282b38 
>   src/slave/compatibility.cpp 4ead4a5b655f6f3d7812aa52d656830d7cff4598 
>   src/tests/slave_compatibility_tests.cpp 
> ab5ed29f43c593dbd7d90c235aa304b581d932a2 
> 
> 
> Diff: https://reviews.apache.org/r/65074/diff/2/
> 
> 
> Testing
> ---
> 
> * Several new unit tests added in the review
> * `make check`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-12 Thread Vinod Kone


> On Jan. 12, 2018, 2:24 a.m., Vinod Kone wrote:
> > AFAICT, in 1.4.x we show unreachable terminal tasks on a removed agent as 
> > completed tasks. But now, we show them as unreachable tasks. If that's true 
> > it's an API change that needs to be called out. Is that really backwards 
> > compatible?
> 
> Jiang Yan Xu wrote:
> Yeah it's true. Despite it being a bug that if the unreachable terminal 
> tasks are considered completed and added to the completed list, it cannot be 
> later removed when the agent reregisters and duplicates are shown in the 
> webUI and APIs, it is indeed what 1.4.x gives you.
> 
> 1.5 fixes the duplication problem but we did the extra work (the 
> additional `if (task->state() != TASK_UNREACHABLE)` checks we added and this 
> revision removes) to make it look like the 1.4.x version, I guess it's fine 
> to keep it that way until we have a plan for an overhaul...
> 
> So, probably let's not revert the code that involves the http endpoints 
> (sorry for suggesting it earlier, there are small changes needed which I'll 
> comment on).

"unreachable terminal tasks are considered completed and added to the completed 
list, it cannot be later removed when the agent reregisters and duplicates are 
shown in the webUI and APIs" .

Duplicated tasks in 2 lists sounds like a bug and not intentional behavior in 
1.4.x. So, if possible we should fix it in 1.5.x to move the task from 
completed to active upon re-registration. Is that easy to do or do we have to 
live with the duplicates until the overhaul?


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review195282
---


On Jan. 5, 2018, 7:37 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 5, 2018, 7:37 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gaston Kleiman, Jie Yu, Vinod 
> Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
> https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If an agent is lost, we try to remove all the tasks that might have
> been lost. If a task is already terminal but has unacknowleged status
> updates, it is expected that we track it in the unreachable tasks list
> so we should remove the CHECK that prevents this. This also backs out
> changes to how unreachable tasks are presented in the HTTP endpoints to
> restore compatibility with previous Mesos releases.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp bc29fafb1f4b515aec3b77850f960c88a65c8362 
>   src/master/master.hpp 5e6ba53c075174a1e514a395ceb17c26201ec470 
>   src/master/master.cpp 6fc5de89e54ba0b9ae2c4fb475be9878910820d3 
>   src/tests/mesos.hpp 93913f2e01898c73e09de58a975aa467e714d882 
>   src/tests/partition_tests.cpp 3813139f576ea01db0197f0fe8a73597db1bb69a 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/6/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 65074: Fixed comparison logic for additive reconfiguration policy.

2018-01-11 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65074/#review195286
---


Fix it, then Ship it!





src/slave/compatibility.cpp
Line 156 (original), 135 (patched)
<https://reviews.apache.org/r/65074/#comment274464>

s/by Mesos//. Seems redundant?



src/slave/compatibility.cpp
Line 203 (original), 196 (patched)
<https://reviews.apache.org/r/65074/#comment274465>

ditto.



src/tests/slave_compatibility_tests.cpp
Lines 203 (patched)
<https://reviews.apache.org/r/65074/#comment274466>

Move this to #200 and say

// This is a regression test for MESOS-8410.

Or

Moe this to #345 add another case at the end of the test to add one more 
disk and make sure that doesn't trip up additive policy either.



src/tests/slave_compatibility_tests.cpp
Lines 215-230 (patched)
<https://reviews.apache.org/r/65074/#comment274467>

do you really need these many disks for the regression test. are 2 not 
enough?


- Vinod Kone


On Jan. 11, 2018, 8:08 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65074/
> ---
> 
> (Updated Jan. 11, 2018, 8:08 p.m.)
> 
> 
> Review request for mesos, James Peach and Vinod Kone.
> 
> 
> Bugs: MESOS-8410
> https://issues.apache.org/jira/browse/MESOS-8410
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The case where multiple resources have the same name
> was not handled correctly, and could result in false
> negatives.
> 
> To fix this, we now consider all resource metadata in addition to the name 
> when searching for a matching resource in the previous configuration.
> 
> 
> Diffs
> -
> 
>   include/mesos/resources.hpp 69bf823f800ac8bfc2bc211895b0967f573009b5 
>   src/common/resources.cpp 1291208b9f2a7c373ff7ec941d3e4310fec92207 
>   src/slave/compatibility.hpp 78b421a01abe5d2178c93832577577a7ba282b38 
>   src/slave/compatibility.cpp 4ead4a5b655f6f3d7812aa52d656830d7cff4598 
>   src/tests/slave_compatibility_tests.cpp 
> ab5ed29f43c593dbd7d90c235aa304b581d932a2 
> 
> 
> Diff: https://reviews.apache.org/r/65074/diff/1/
> 
> 
> Testing
> ---
> 
> * Several new unit tests added in the review
> * `make check`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.

2018-01-11 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review195282
---



AFAICT, in 1.4.x we show unreachable terminal tasks on a removed agent as 
completed tasks. But now, we show them as unreachable tasks. If that's true 
it's an API change that needs to be called out. Is that really backwards 
compatible?


src/tests/partition_tests.cpp
Lines 2481 (patched)
<https://reviews.apache.org/r/64940/#comment274462>

s/trscking/tracking/



src/tests/partition_tests.cpp
Lines 2513-2514 (patched)
<https://reviews.apache.org/r/64940/#comment274463>

s/we have one finished task and lost task/the task state is still finished/


- Vinod Kone


On Jan. 5, 2018, 7:37 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> ---
> 
> (Updated Jan. 5, 2018, 7:37 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gaston Kleiman, Jie Yu, Vinod 
> Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
> https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If an agent is lost, we try to remove all the tasks that might have
> been lost. If a task is already terminal but has unacknowleged status
> updates, it is expected that we track it in the unreachable tasks list
> so we should remove the CHECK that prevents this. This also backs out
> changes to how unreachable tasks are presented in the HTTP endpoints to
> restore compatibility with previous Mesos releases.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp bc29fafb1f4b515aec3b77850f960c88a65c8362 
>   src/master/master.hpp 5e6ba53c075174a1e514a395ceb17c26201ec470 
>   src/master/master.cpp 6fc5de89e54ba0b9ae2c4fb475be9878910820d3 
>   src/tests/mesos.hpp 93913f2e01898c73e09de58a975aa467e714d882 
>   src/tests/partition_tests.cpp 3813139f576ea01db0197f0fe8a73597db1bb69a 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/6/
> 
> 
> Testing
> ---
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 64978: Made task's volume directory visible in the /files endpoints.

2018-01-10 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64978/#review195178
---




src/slave/slave.cpp
Line 2795 (original), 2795 (patched)
<https://reviews.apache.org/r/64978/#comment274325>

s/the task's/default executor task's/



src/slave/slave.cpp
Line 2796 (original), 2796 (patched)
<https://reviews.apache.org/r/64978/#comment274320>

s/we made/we made sure/



src/slave/slave.cpp
Lines 2798 (patched)
<https://reviews.apache.org/r/64978/#comment274321>

s/add/adding/



src/slave/slave.cpp
Lines 2800-2803 (patched)
<https://reviews.apache.org/r/64978/#comment274323>

Change this sentence to:

```
// This volume get translated into a bindmount in the nested container's
// mount namespace, which is is not visible in Mesos UI because it
// operates in the host namespace. See Mesos-8279 for details.
```



src/slave/slave.cpp
Lines 2807 (patched)
<https://reviews.apache.org/r/64978/#comment274324>

s/attached/attach/



src/slave/slave.cpp
Lines 2810 (patched)
<https://reviews.apache.org/r/64978/#comment274327>

s/specified, that is/ specified/


- Vinod Kone


On Jan. 9, 2018, 2:11 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64978/
> ---
> 
> (Updated Jan. 9, 2018, 2:11 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gilbert Song, Jie Yu, and Vinod 
> Kone.
> 
> 
> Bugs: MESOS-8279
> https://issues.apache.org/jira/browse/MESOS-8279
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In MESOS-7225, we made a task can access any volumes specified in its
> disk resources from its own sandbox by introducing a workaround to the
> default executor, i.e., add a `SANDBOX_PATH` volume with type `PARENT`
> to the corresponding nested container. It will be translated into a bind
> mount in the nested container's mount namespace, thus not visible in the
> host mount namespace, that means the task's volume directory can not be
> visible in Mesos UI since it operates in the host mount namespace.
> 
> In this patch, to make the task's volume directory visible in Mesos UI,
> we attached the executor's volume directory to it, so when users browse
> task's volume directory in Mesos UI, what they actually browse is the
> executor's volume directory. Note when calling `Files::attach()`, the
> third argument `authorized` is not specified, that is because it is
> already specified when we do the attach for the executor's sandbox and
> it is also applied to the executor's tasks.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp aeb0fdaeda78f26de2ec790735bfa88c5be72850 
> 
> 
> Diff: https://reviews.apache.org/r/64978/diff/3/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



  1   2   3   4   5   6   7   8   9   10   >