Re: Review Request 71283: Made sure we are tracking ephemeral quota before getting the usage.

2019-08-13 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71283/#review217194
---


Ship it!




Ship It!

- Gilbert Song


On Aug. 13, 2019, 3:59 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71283/
> ---
> 
> (Updated Aug. 13, 2019, 3:59 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9935
> https://issues.apache.org/jira/browse/MESOS-9935
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If there is no ephemeral disk resource, the `disk/du` isolator
> will not track ephemeral disk usage. In that case, we can't index
> the paths hash to find the last usage since it doesn't contain the
> ephemeral paths. The fix is to check, and omit the ephemeral usage
> if it is not being tracked.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/posix/disk.cpp 
> 29bdbe6ccffd755df4dc48983c5f99fc2f0ae5d2 
> 
> 
> Diff: https://reviews.apache.org/r/71283/diff/1/
> 
> 
> Testing
> ---
> 
> make check with empty disk resources
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 71283: Made sure we are tracking ephemeral quota before getting the usage.

2019-08-13 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71283/#review217193
---



Patch looks great!

Reviews applied: [71283]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Aug. 13, 2019, 10:59 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71283/
> ---
> 
> (Updated Aug. 13, 2019, 10:59 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9935
> https://issues.apache.org/jira/browse/MESOS-9935
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If there is no ephemeral disk resource, the `disk/du` isolator
> will not track ephemeral disk usage. In that case, we can't index
> the paths hash to find the last usage since it doesn't contain the
> ephemeral paths. The fix is to check, and omit the ephemeral usage
> if it is not being tracked.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/posix/disk.cpp 
> 29bdbe6ccffd755df4dc48983c5f99fc2f0ae5d2 
> 
> 
> Diff: https://reviews.apache.org/r/71283/diff/1/
> 
> 
> Testing
> ---
> 
> make check with empty disk resources
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 71269: Added a role tree class in the allocator.

2019-08-13 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71269/#review217192
---



Nice to see this, and looking forward to it tracking quota consumption too! :P

I did a pass on the interface first, suggestions below:


src/master/allocator/mesos/hierarchical.hpp
Lines 113-117 (patched)


a bulleted list would be more clear than the long setence with "and/or" 
cases?



src/master/allocator/mesos/hierarchical.hpp
Lines 119 (patched)


Rather than '"Empty" roles are removed immediately. See 
`Role::isEmpty()`.', I think just referring to not meeting any of the cases in 
the list would be more clear, e.g.

We track roles when:

- The role has non-default weight, or
- The role has non-default quota, or
- ...
- or it has descendent roles meeting any of the above conditions

Any roles that do not meet these conditions are not tracked in the role 
tree.



src/master/allocator/mesos/hierarchical.hpp
Lines 124 (patched)


Passing `name` here seems prone to confusion vs passing `path`. (I had to 
go look at the implementation to figure out if `name` was the last part of the 
role or the entire role).

Why not just pass `path` and avoid the confusion? Is there any utility in 
passing the last component of the "/" split of the role?

Personally, I think s/path/role is just as clear, and avoids having to 
think about the "/" splitting:

```
  Role(const std::string& role, Role* parent);
```

Alternatively, `basename` seems like the clear expression of the last 
component (borrowed from unix terminology).



src/master/allocator/mesos/hierarchical.hpp
Line 112 (original), 128 (patched)


Hm.. why can't `path` be const to avoid the getter?

If we absolutely need the getter, `path()` seems more idiomatic to our code 
than `getPath()`.



src/master/allocator/mesos/hierarchical.hpp
Lines 130 (patched)


s/getR/r/



src/master/allocator/mesos/hierarchical.hpp
Line 119 (original), 144 (patched)


s/getC/c/



src/master/allocator/mesos/hierarchical.hpp
Lines 152-155 (patched)


Per above, `basename` seems less ambiguous. It also seems clearer and 
warrants less of a long comment about it, if next to each other:

```
  const std::string role; // E.g. "a/b/c"
  const std::string basename; // E.g. "c"
```



src/master/allocator/mesos/hierarchical.hpp
Lines 190 (patched)


Seems like we can live without this line



src/master/allocator/mesos/hierarchical.hpp
Lines 198-199 (patched)


The comment isn't adding value here IMO

s/getR/r/

Actually, couldn't this be just touching the member variable? It seems 
strange that you can only touch the root in a const way but can touch the Roles 
from `get()` in a non-const way?



src/master/allocator/mesos/hierarchical.hpp
Lines 201 (patched)


Typo in exists, but I don't think this comment is needed. Unless there's 
some subtlety to know about, it seems very clear from the signature.



src/master/allocator/mesos/hierarchical.hpp
Lines 202 (patched)


Alternatively, s/path/role/ would be perfectly clear for all these 
functions in the RoleTree. It doesn't seem like `path` is offering an important 
distinction in this class' interface?



src/master/allocator/mesos/hierarchical.hpp
Lines 207-210 (patched)


There seems to be a bit of subtlety here that is missed in the comment?

If I add "a/b", the ancestor role "a" along the path will be created. Then, 
if I add "a", the comment suggests that it already exists which is a problem.



src/master/allocator/mesos/hierarchical.hpp
Lines 212-215 (patched)


Ditto here, it seems at the overall explanation of the RoleTree abstraction 
we need to explain the "." subtlety, along with an example role tree to make it 
clear.



src/master/allocator/mesos/hierarchical.hpp
Lines 218-224 (patched)


It's quite strange that these take maps with the role (seems like an 
artifact of the way the allocator managed reservations previously?), the 
following interface seems more expected:

```
  void 

Review Request 71285: Fixed recovery of agent resources and operations after crash.

2019-08-13 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71285/
---

Review request for mesos, Gastón Kleiman, James Peach, and Joseph Wu.


Bugs: MESOS-9875
https://issues.apache.org/jira/browse/MESOS-9875


Repository: mesos


Description
---

Fixes an issue where the agent may incorrectly send an
OPERATION_FINISHED update for a failed offer operation
following agent failover and recovery.

The agent previously relied on the difference between the
set of checkpointed operations and the set of operation
IDs recovered from the operation status update manager to
apply any operations which had not been applied due to an
ill-timed agent failover.

However, this approach did not work in the case where a
persistent volume failed to be successfully created by
`syncCheckpointedResources()`. In order to handle this
case, this patch changes the agent code to continue with
the old approach of a two-phase-commit of persistent
volumes to disk, where the agent will fail to complete
recovery if `syncCheckpointedResources()` cannot be
executed successfully after failover.


Diffs
-

  src/slave/paths.hpp e077587fd02bd8e35fee7ce12ae436e3dca25e47 
  src/slave/paths.cpp 28a7cf9f9c70fb31eeefe2e823cd7e19ffcf126a 
  src/slave/slave.cpp 74eb45744d6603b91676e812ed008a7b1ab39a49 
  src/slave/state.cpp cd3fac72dd57da21ed5ac46b17066531af26d42a 


Diff: https://reviews.apache.org/r/71285/diff/1/


Testing
---


Thanks,

Greg Mann



Re: Review Request 71284: Fixed a typo in the master.

2019-08-13 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71284/#review217191
---


Ship it!




Ship It!

- Meng Zhu


On Aug. 13, 2019, 5:43 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71284/
> ---
> 
> (Updated Aug. 13, 2019, 5:43 p.m.)
> 
> 
> Review request for mesos and Meng Zhu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a typo in the master.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 783e4a38ad9e36cb38d5e78b6b942033bc806a0a 
>   src/master/master.cpp 31c7c97abecb92591369b417e2ef38a99cc09bac 
>   src/tests/master_tests.cpp 429521d14cec1c496cd8be12ce3fb3737fd8cf99 
> 
> 
> Diff: https://reviews.apache.org/r/71284/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Review Request 71284: Fixed a typo in the master.

2019-08-13 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71284/
---

Review request for mesos and Meng Zhu.


Repository: mesos


Description
---

Fixed a typo in the master.


Diffs
-

  src/master/master.hpp 783e4a38ad9e36cb38d5e78b6b942033bc806a0a 
  src/master/master.cpp 31c7c97abecb92591369b417e2ef38a99cc09bac 
  src/tests/master_tests.cpp 429521d14cec1c496cd8be12ce3fb3737fd8cf99 


Diff: https://reviews.apache.org/r/71284/diff/1/


Testing
---


Thanks,

Greg Mann



Re: Review Request 71269: Added a role tree class in the allocator.

2019-08-13 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71269/#review217190
---



Bad review!

Reviews applied: [71269, 71258, 71257, 71255, 71254]

Error:
2019-08-13 23:41:39 URL:https://reviews.apache.org/r/71257/diff/raw/ 
[2246/2246] -> "71257.patch" [1]
error: patch failed: src/master/allocator/mesos/sorter/drf/sorter.cpp:15
error: src/master/allocator/mesos/sorter/drf/sorter.cpp: patch does not apply
error: patch failed: src/master/allocator/mesos/sorter/random/sorter.cpp:16
error: src/master/allocator/mesos/sorter/random/sorter.cpp: patch does not apply

- Mesos Reviewbot


On Aug. 13, 2019, 3:05 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71269/
> ---
> 
> (Updated Aug. 13, 2019, 3:05 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9917
> https://issues.apache.org/jira/browse/MESOS-9917
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The role concept in Mesos fits into a tree structure naturally.
> However, the role state in the allocator are currenstored
> in a hashmap. This is less efficient and harder to use and reason.
> 
> This patch introduced a `RoleTree` structure in the allocator
> and organizes all the roles in to a tree. This should simplify
> the code logic and opens further refactor and optimization
> opportunities.
> 
> In addition, the master code also lacks a proper tree structure
> for tracking roles. We should leverage the same role tree code
> here to simplify that as well.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> dee5bafb7d0e56f382b647c649a702cd64e6500a 
>   src/master/allocator/mesos/hierarchical.cpp 
> 87b03d3a0a2bc9113c6c488ddfc1437651bf58b7 
> 
> 
> Diff: https://reviews.apache.org/r/71269/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> Benchmaring using optimized build shows no noticable performance change.
> 
> Benchmark: `BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota`
> 
> ## Before:
> 
> Added 30 agents in 1.252621ms
> Added 30 frameworks in 7.747202ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
> Made 36 allocations in 12.791427ms
> 
> Added 300 agents in 9.765295ms
> Added 300 frameworks in 273.978325ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
> Made 350 allocations in 424.603736ms
> 
> Added 3000 agents in 79.646516ms
> Added 3000 frameworks in 19.17449514secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
> Made 3500 allocations in 31.687207066secs
> 
> ## After:
> 
> Added 30 agents in 1.376619ms
> Added 30 frameworks in 7.647487ms
> Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
> Made 36 allocations in 11.638116ms
> 
> Added 300 agents in 9.506101ms
> Added 300 frameworks in 263.008198ms
> Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
> Made 350 allocations in 418.962396ms
> 
> Added 3000 agents in 81.553527ms
> Added 3000 frameworks in 19.201708305secs
> Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
> Made 3500 allocations in 31.583417136secs
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 71275: Fixed agent draining when returning from unreachable state.

2019-08-13 Thread Joseph Wu


> On Aug. 13, 2019, 4:11 p.m., Greg Mann wrote:
> > src/tests/api_tests.cpp
> > Lines 5953 (patched)
> > 
> >
> > Is this necessary?

Oh, good catch.  This line doesn't add anything, since we don't expect any 
further updates in this sequence.

I shall delete the line.


- Joseph


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71275/#review217187
---


On Aug. 13, 2019, 2:13 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71275/
> ---
> 
> (Updated Aug. 13, 2019, 2:13 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, and Greg Mann.
> 
> 
> Bugs: MESOS-9934
> https://issues.apache.org/jira/browse/MESOS-9934
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This logic was missing from the initial implementation of agent
> draining.  When an agent became unreachable, and then reregistered
> with the master, the master would not properly deactivate or drain
> the agent.
> 
> This also fixes a potential problem with checking the agent
> drain state too early in the case of pending operations.
> Operations are not reported to the master until after the agent
> reregisters, so agents with the RESOURCE_PROVIDER capability
> cannot be considered DRAINED until after the first 
> UpdateSlaveMessage has arrived.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 31c7c97abecb92591369b417e2ef38a99cc09bac 
>   src/tests/api_tests.cpp c2099674e742eaa08134c5e0a7cdab1734808119 
> 
> 
> Diff: https://reviews.apache.org/r/71275/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 71275: Fixed agent draining when returning from unreachable state.

2019-08-13 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71275/#review217187
---


Fix it, then Ship it!





src/tests/api_tests.cpp
Lines 5953 (patched)


Is this necessary?


- Greg Mann


On Aug. 13, 2019, 9:13 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71275/
> ---
> 
> (Updated Aug. 13, 2019, 9:13 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, and Greg Mann.
> 
> 
> Bugs: MESOS-9934
> https://issues.apache.org/jira/browse/MESOS-9934
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This logic was missing from the initial implementation of agent
> draining.  When an agent became unreachable, and then reregistered
> with the master, the master would not properly deactivate or drain
> the agent.
> 
> This also fixes a potential problem with checking the agent
> drain state too early in the case of pending operations.
> Operations are not reported to the master until after the agent
> reregisters, so agents with the RESOURCE_PROVIDER capability
> cannot be considered DRAINED until after the first 
> UpdateSlaveMessage has arrived.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 31c7c97abecb92591369b417e2ef38a99cc09bac 
>   src/tests/api_tests.cpp c2099674e742eaa08134c5e0a7cdab1734808119 
> 
> 
> Diff: https://reviews.apache.org/r/71275/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Review Request 71283: Made sure we are tracking ephemeral quota before getting the usage.

2019-08-13 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71283/
---

Review request for mesos, Andrei Budnik and Gilbert Song.


Bugs: MESOS-9935
https://issues.apache.org/jira/browse/MESOS-9935


Repository: mesos


Description
---

If there is no ephemeral disk resource, the `disk/du` isolator
will not track ephemeral disk usage. In that case, we can't index
the paths hash to find the last usage since it doesn't contain the
ephemeral paths. The fix is to check, and omit the ephemeral usage
if it is not being tracked.


Diffs
-

  src/slave/containerizer/mesos/isolators/posix/disk.cpp 
29bdbe6ccffd755df4dc48983c5f99fc2f0ae5d2 


Diff: https://reviews.apache.org/r/71283/diff/1/


Testing
---

make check with empty disk resources


Thanks,

James Peach



Re: Review Request 71269: Added a role tree class in the allocator.

2019-08-13 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71269/
---

(Updated Aug. 13, 2019, 3:05 p.m.)


Review request for mesos, Andrei Sekretenko and Benjamin Mahler.


Changes
---

Encapsulate the tree state better in the role class.


Bugs: MESOS-9917
https://issues.apache.org/jira/browse/MESOS-9917


Repository: mesos


Description
---

The role concept in Mesos fits into a tree structure naturally.
However, the role state in the allocator are currenstored
in a hashmap. This is less efficient and harder to use and reason.

This patch introduced a `RoleTree` structure in the allocator
and organizes all the roles in to a tree. This should simplify
the code logic and opens further refactor and optimization
opportunities.

In addition, the master code also lacks a proper tree structure
for tracking roles. We should leverage the same role tree code
here to simplify that as well.


Diffs (updated)
-

  src/master/allocator/mesos/hierarchical.hpp 
dee5bafb7d0e56f382b647c649a702cd64e6500a 
  src/master/allocator/mesos/hierarchical.cpp 
87b03d3a0a2bc9113c6c488ddfc1437651bf58b7 


Diff: https://reviews.apache.org/r/71269/diff/2/

Changes: https://reviews.apache.org/r/71269/diff/1-2/


Testing
---

make check

Benchmaring using optimized build shows no noticable performance change.

Benchmark: `BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota`

## Before:

Added 30 agents in 1.252621ms
Added 30 frameworks in 7.747202ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
Made 36 allocations in 12.791427ms

Added 300 agents in 9.765295ms
Added 300 frameworks in 273.978325ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
Made 350 allocations in 424.603736ms

Added 3000 agents in 79.646516ms
Added 3000 frameworks in 19.17449514secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
Made 3500 allocations in 31.687207066secs

## After:

Added 30 agents in 1.376619ms
Added 30 frameworks in 7.647487ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
Made 36 allocations in 11.638116ms

Added 300 agents in 9.506101ms
Added 300 frameworks in 263.008198ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
Made 350 allocations in 418.962396ms

Added 3000 agents in 81.553527ms
Added 3000 frameworks in 19.201708305secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
Made 3500 allocations in 31.583417136secs


Thanks,

Meng Zhu



Re: Review Request 71275: Fixed agent draining when returning from unreachable state.

2019-08-13 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71275/
---

(Updated Aug. 13, 2019, 2:13 p.m.)


Review request for mesos, Benjamin Bannier, Benno Evers, and Greg Mann.


Changes
---

Fixed the problem with marking stuff drained too early if operations exist.
Fixed race in unit test.


Bugs: MESOS-9934
https://issues.apache.org/jira/browse/MESOS-9934


Repository: mesos


Description (updated)
---

This logic was missing from the initial implementation of agent
draining.  When an agent became unreachable, and then reregistered
with the master, the master would not properly deactivate or drain
the agent.

This also fixes a potential problem with checking the agent
drain state too early in the case of pending operations.
Operations are not reported to the master until after the agent
reregisters, so agents with the RESOURCE_PROVIDER capability
cannot be considered DRAINED until after the first 
UpdateSlaveMessage has arrived.


Diffs (updated)
-

  src/master/master.cpp 31c7c97abecb92591369b417e2ef38a99cc09bac 
  src/tests/api_tests.cpp c2099674e742eaa08134c5e0a7cdab1734808119 


Diff: https://reviews.apache.org/r/71275/diff/2/

Changes: https://reviews.apache.org/r/71275/diff/1-2/


Testing
---

make check


Thanks,

Joseph Wu



Re: Review Request 71258: Tracked weight info in the Role struct in the allocator.

2019-08-13 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71258/#review217186
---


Ship it!




Ship It!

- Benjamin Mahler


On Aug. 8, 2019, 7:35 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71258/
> ---
> 
> (Updated Aug. 8, 2019, 7:35 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Tracked weight info in the Role struct in the allocator.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> 29dfa9f828f452b1aed8162c1e13f96d20907553 
>   src/master/allocator/mesos/hierarchical.cpp 
> f1fa838d654816aba14ac032174bf1522b7ae033 
> 
> 
> Diff: https://reviews.apache.org/r/71258/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 71255: Fixed and improved `Sorter::remove` function.

2019-08-13 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71255/#review217185
---


Ship it!





src/master/allocator/mesos/sorter/drf/sorter.cpp
Line 212 (original), 213 (patched)


The whole logic here seems to be dealing with the single "." child case. 
Maybe simpler to grasp as a single condition and with less nesting?

```
} else if (current->children.size() == 1 &&
   current->children[0]->name == ".") {
  ...
}
```



src/master/allocator/mesos/sorter/random/sorter.cpp
Line 206 (original), 207 (patched)


Ditto here.


- Benjamin Mahler


On Aug. 8, 2019, 7:34 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71255/
> ---
> 
> (Updated Aug. 8, 2019, 7:34 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9930
> https://issues.apache.org/jira/browse/MESOS-9930
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug in the remove() function where after collapsing
> and turning an internal node into an leaf node, the new node's
> position needs to updated in the parent's children list.
> See MESOS-9930.
> 
> This patch fixes the bug and also refactored the function logic.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/sorter/drf/sorter.cpp 
> 40397f7111d92dccecd15c9bc1ad7c45abbd850b 
>   src/master/allocator/mesos/sorter/random/sorter.cpp 
> b480aeee38f87dca852ad3c8f97bb4fe993ce068 
> 
> 
> Diff: https://reviews.apache.org/r/71255/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> dedicated test added in r/71256/
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 71275: Fixed agent draining when returning from unreachable state.

2019-08-13 Thread Greg Mann


> On Aug. 13, 2019, 1:10 p.m., Benno Evers wrote:
> > src/master/master.cpp
> > Lines 7959 (patched)
> > 
> >
> > Can we already decide at this point if the agent is drained? E.g. 
> > `checkAndTransitionDrainingAgent()` seems to look at pending operations, 
> > which only get sent in the first `UpdateSlaveMessage` after the 
> > reregistration is completed.

Good catch Benno - this is also an issue with the existing reregistration code 
in `___reregisterSlave()`. Unfortunately it's even more annoying than waiting 
for the first `UpdateSlaveMessage`, since it's possible that resource providers 
may subscribe with the agent later on. Not yet sure what the best way to handle 
this is.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71275/#review217177
---


On Aug. 13, 2019, 2:35 a.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71275/
> ---
> 
> (Updated Aug. 13, 2019, 2:35 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, and Greg Mann.
> 
> 
> Bugs: MESOS-9934
> https://issues.apache.org/jira/browse/MESOS-9934
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This logic was missing from the initial implementation of agent
> draining.  When an agent became unreachable, and then reregistered
> with the master, the master would not properly deactivate or drain
> the agent.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 31c7c97abecb92591369b417e2ef38a99cc09bac 
>   src/tests/api_tests.cpp c2099674e742eaa08134c5e0a7cdab1734808119 
> 
> 
> Diff: https://reviews.apache.org/r/71275/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 71264: Cleared filters upon unsuppressing a role.

2019-08-13 Thread Benjamin Mahler


> On Aug. 12, 2019, 6:24 p.m., Meng Zhu wrote:
> > src/master/allocator/mesos/hierarchical.cpp
> > Line 1343 (original), 1347 (patched)
> > 
> >
> > reviveRole might be a better function name?

Yeah.. good point!


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71264/#review217157
---


On Aug. 9, 2019, 9:43 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71264/
> ---
> 
> (Updated Aug. 9, 2019, 9:43 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Meng Zhu.
> 
> 
> Bugs: MESOS-9932
> https://issues.apache.org/jira/browse/MESOS-9932
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Per MESOS-9932, removal of a role from the suppression list does
> not currently clear filters. This means that schedulers have to
> issue a separate explicit REVIVE for the roles they want to remove.
> 
> This ensures that filters are cleared upon removing a role from
> the suppression list.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> f1fa838d654816aba14ac032174bf1522b7ae033 
> 
> 
> Diff: https://reviews.apache.org/r/71264/diff/1/
> 
> 
> Testing
> ---
> 
> Test added in subsequent patch.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 71263: Re-wrote the quota documentation to reflect quota limits.

2019-08-13 Thread Benjamin Mahler


> On Aug. 12, 2019, 6:51 p.m., Meng Zhu wrote:
> > Thanks for adding this. I like the brevity of the doc!
> > 
> > I think we should at least try to explain consumptions a bit. We mentioned 
> > this a few times in the doc, and it also shows up in the UI and endpoint 
> > response.
> > Maybe a note after viewing qutoa.
> > 
> > - A role's quota consumption includes both allocated an reserved. The 
> > latter is accounted even if it is not allocated. Outstanding offered 
> > resources are not charged against quota.

Ah, thanks for catching this!


> On Aug. 12, 2019, 6:51 p.m., Meng Zhu wrote:
> > docs/quota.md
> > Line 94 (original), 77-78 (patched)
> > 
> >
> > I suggest removing this point, since
> > 
> > - This is a minor low-level detail. 
> > - It also could be confusing if the reader thinks offered resources is 
> > also part of the consumption.
> > - This might change in the future if do admission control when 
> > accepting offers.

Ok, sounds reasonable. What I had in my mind when adding this was that this is 
only an "attempt" and therefore someone might see a quota limit get exceeded 
after they set it, which can be confusing. But if the "attempt" is overly 
aggressive at rescinding, that seems ok to me. Is it the latter?


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71263/#review217159
---


On Aug. 9, 2019, 4:53 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71263/
> ---
> 
> (Updated Aug. 9, 2019, 4:53 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Meng Zhu.
> 
> 
> Bugs: MESOS-9427
> https://issues.apache.org/jira/browse/MESOS-9427
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This takes a simpler approach compared to the previous version,
> which explained too many details that were not important for the
> user to understand.
> 
> This also documents that quota guarantees are deprecated and why.
> 
> 
> Diffs
> -
> 
>   docs/operator-http-api.md dd09845899a304885f12fa2e7682527ec9590a59 
>   docs/quota.md c42d5cd69a4534a71879e29be49b7fd7d9d075f6 
> 
> 
> Diff: https://reviews.apache.org/r/71263/diff/1/
> 
> 
> Testing
> ---
> 
> Review the rendered version here:
> 
> https://gist.github.com/bmahler/f85b67f5c7d0f2d7970d37a926f0d778
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 71277: Guarded access to possibly destructed resource provider driver.

2019-08-13 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71277/#review217179
---



Patch looks great!

Reviews applied: [71272, 71277]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Aug. 13, 2019, 9:30 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71277/
> ---
> 
> (Updated Aug. 13, 2019, 9:30 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9560
> https://issues.apache.org/jira/browse/MESOS-9560
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We provide a method to tear down a resource provider driver which
> internally resets the resource provider HTTP driver pointer. We need to
> prevent access to the pointed to value once the driver has been torn
> down.
> 
> This patch switches all internal uses of the the driver to `send` data
> to use a public interface method instead. We also make sure that that
> public method does not access an invalidated pointed to value.
> 
> 
> Diffs
> -
> 
>   src/tests/mesos.hpp 73b6e424e8f611cf72f102fa13a6773e2865ddbf 
> 
> 
> Diff: https://reviews.apache.org/r/71277/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Ran `ResourceProviderManagerHttpApiTest.ResourceProviderSubscribeDisconnect` 
> under enormous system `stress-ng`.
> 
> * before this patch: 5/413 failures; failure rate 0.01±0.07 (CL=67%)
> * after this patch: 0/1273 failures; failure rate <0.04 (CL=67%)
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71272: Dispatched invocations of resource provider mock default actions.

2019-08-13 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71272/#review217178
---


Ship it!




Ship It!

- Andrei Budnik


On Авг. 13, 2019, 9:30 д.п., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71272/
> ---
> 
> (Updated Авг. 13, 2019, 9:30 д.п.)
> 
> 
> Review request for mesos, Andrei Budnik and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9560
> https://issues.apache.org/jira/browse/MESOS-9560
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When invoking member functions of the mock resource provider
> `TestResourceProviderProcess` from default mock actions we need to
> ensure that the mock object remains alive for the duration of the
> invocation.
> 
> This patch replaces direct invocations which might run on any thread
> (e.g., also while the mock object is being destructed on another thread)
> which safe `dispatch`es of the default methods.
> 
> 
> Diffs
> -
> 
>   src/tests/api_tests.cpp c2099674e742eaa08134c5e0a7cdab1734808119 
>   src/tests/mesos.hpp 73b6e424e8f611cf72f102fa13a6773e2865ddbf 
>   src/tests/resource_provider_manager_tests.cpp 
> 2792000d291a7d8cd7eeac556c16ccbeb55ed429 
>   src/tests/slave_tests.cpp 7c6e1d927fefde3f1ecadab1a32c9dd793e70b7f 
> 
> 
> Diff: https://reviews.apache.org/r/71272/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> For a before/after failure rate breakdown see 
> https://reviews.apache.org/r/71277/.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71275: Fixed agent draining when returning from unreachable state.

2019-08-13 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71275/#review217177
---




src/master/master.cpp
Lines 7959 (patched)


Can we already decide at this point if the agent is drained? E.g. 
`checkAndTransitionDrainingAgent()` seems to look at pending operations, which 
only get sent in the first `UpdateSlaveMessage` after the reregistration is 
completed.


- Benno Evers


On Aug. 13, 2019, 2:35 a.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71275/
> ---
> 
> (Updated Aug. 13, 2019, 2:35 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, and Greg Mann.
> 
> 
> Bugs: MESOS-9934
> https://issues.apache.org/jira/browse/MESOS-9934
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This logic was missing from the initial implementation of agent
> draining.  When an agent became unreachable, and then reregistered
> with the master, the master would not properly deactivate or drain
> the agent.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 31c7c97abecb92591369b417e2ef38a99cc09bac 
>   src/tests/api_tests.cpp c2099674e742eaa08134c5e0a7cdab1734808119 
> 
> 
> Diff: https://reviews.apache.org/r/71275/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 71277: Guarded access to possibly destructed resource provider driver.

2019-08-13 Thread Andrei Budnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71277/#review217176
---


Ship it!




Ship It!

- Andrei Budnik


On Авг. 13, 2019, 9:30 д.п., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71277/
> ---
> 
> (Updated Авг. 13, 2019, 9:30 д.п.)
> 
> 
> Review request for mesos, Andrei Budnik and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9560
> https://issues.apache.org/jira/browse/MESOS-9560
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We provide a method to tear down a resource provider driver which
> internally resets the resource provider HTTP driver pointer. We need to
> prevent access to the pointed to value once the driver has been torn
> down.
> 
> This patch switches all internal uses of the the driver to `send` data
> to use a public interface method instead. We also make sure that that
> public method does not access an invalidated pointed to value.
> 
> 
> Diffs
> -
> 
>   src/tests/mesos.hpp 73b6e424e8f611cf72f102fa13a6773e2865ddbf 
> 
> 
> Diff: https://reviews.apache.org/r/71277/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Ran `ResourceProviderManagerHttpApiTest.ResourceProviderSubscribeDisconnect` 
> under enormous system `stress-ng`.
> 
> * before this patch: 5/413 failures; failure rate 0.01±0.07 (CL=67%)
> * after this patch: 0/1273 failures; failure rate <0.04 (CL=67%)
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71277: Guarded access to possibly destructed resource provider driver.

2019-08-13 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71277/#review217175
---



Patch looks great!

Reviews applied: [71272, 71277]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On Aug. 13, 2019, 9:30 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71277/
> ---
> 
> (Updated Aug. 13, 2019, 9:30 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9560
> https://issues.apache.org/jira/browse/MESOS-9560
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We provide a method to tear down a resource provider driver which
> internally resets the resource provider HTTP driver pointer. We need to
> prevent access to the pointed to value once the driver has been torn
> down.
> 
> This patch switches all internal uses of the the driver to `send` data
> to use a public interface method instead. We also make sure that that
> public method does not access an invalidated pointed to value.
> 
> 
> Diffs
> -
> 
>   src/tests/mesos.hpp 73b6e424e8f611cf72f102fa13a6773e2865ddbf 
> 
> 
> Diff: https://reviews.apache.org/r/71277/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Ran `ResourceProviderManagerHttpApiTest.ResourceProviderSubscribeDisconnect` 
> under enormous system `stress-ng`.
> 
> * before this patch: 5/413 failures; failure rate 0.01±0.07 (CL=67%)
> * after this patch: 0/1273 failures; failure rate <0.04 (CL=67%)
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71272: Dispatched invocations of resource provider mock default actions.

2019-08-13 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71272/
---

(Updated Aug. 13, 2019, 11:30 a.m.)


Review request for mesos, Andrei Budnik and Chun-Hung Hsiao.


Bugs: MESOS-9560
https://issues.apache.org/jira/browse/MESOS-9560


Repository: mesos


Description
---

When invoking member functions of the mock resource provider
`TestResourceProviderProcess` from default mock actions we need to
ensure that the mock object remains alive for the duration of the
invocation.

This patch replaces direct invocations which might run on any thread
(e.g., also while the mock object is being destructed on another thread)
which safe `dispatch`es of the default methods.


Diffs
-

  src/tests/api_tests.cpp c2099674e742eaa08134c5e0a7cdab1734808119 
  src/tests/mesos.hpp 73b6e424e8f611cf72f102fa13a6773e2865ddbf 
  src/tests/resource_provider_manager_tests.cpp 
2792000d291a7d8cd7eeac556c16ccbeb55ed429 
  src/tests/slave_tests.cpp 7c6e1d927fefde3f1ecadab1a32c9dd793e70b7f 


Diff: https://reviews.apache.org/r/71272/diff/2/


Testing (updated)
---

`make check`

For a before/after failure rate breakdown see 
https://reviews.apache.org/r/71277/.


Thanks,

Benjamin Bannier



Re: Review Request 71272: Dispatched invocations of resource provider mock default actions.

2019-08-13 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71272/
---

(Updated Aug. 13, 2019, 11:30 a.m.)


Review request for mesos, Andrei Budnik and Chun-Hung Hsiao.


Changes
---

`dispatch` more invocations


Summary (updated)
-

Dispatched invocations of resource provider mock default actions.


Bugs: MESOS-9560
https://issues.apache.org/jira/browse/MESOS-9560


Repository: mesos


Description
---

When invoking member functions of the mock resource provider
`TestResourceProviderProcess` from default mock actions we need to
ensure that the mock object remains alive for the duration of the
invocation.

This patch replaces direct invocations which might run on any thread
(e.g., also while the mock object is being destructed on another thread)
which safe `dispatch`es of the default methods.


Diffs (updated)
-

  src/tests/api_tests.cpp c2099674e742eaa08134c5e0a7cdab1734808119 
  src/tests/mesos.hpp 73b6e424e8f611cf72f102fa13a6773e2865ddbf 
  src/tests/resource_provider_manager_tests.cpp 
2792000d291a7d8cd7eeac556c16ccbeb55ed429 
  src/tests/slave_tests.cpp 7c6e1d927fefde3f1ecadab1a32c9dd793e70b7f 


Diff: https://reviews.apache.org/r/71272/diff/2/

Changes: https://reviews.apache.org/r/71272/diff/1-2/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Review Request 71277: Guarded access to possibly destructed resource provider driver.

2019-08-13 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71277/
---

Review request for mesos, Andrei Budnik and Chun-Hung Hsiao.


Bugs: MESOS-9560
https://issues.apache.org/jira/browse/MESOS-9560


Repository: mesos


Description
---

We provide a method to tear down a resource provider driver which
internally resets the resource provider HTTP driver pointer. We need to
prevent access to the pointed to value once the driver has been torn
down.

This patch switches all internal uses of the the driver to `send` data
to use a public interface method instead. We also make sure that that
public method does not access an invalidated pointed to value.


Diffs
-

  src/tests/mesos.hpp 73b6e424e8f611cf72f102fa13a6773e2865ddbf 


Diff: https://reviews.apache.org/r/71277/diff/1/


Testing
---

`make check`

Ran `ResourceProviderManagerHttpApiTest.ResourceProviderSubscribeDisconnect` 
under enormous system `stress-ng`.

* before this patch: 5/413 failures; failure rate 0.01±0.07 (CL=67%)
* after this patch: 0/1273 failures; failure rate <0.04 (CL=67%)


Thanks,

Benjamin Bannier