Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-05-01 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
---

(Updated May 1, 2019, 8:35 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
---

Addressed BenM's comments.


Bugs: MESOS-9616
https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description (updated)
---

Currently if a framework accepts an offer with a `RESERVE` operation
without a task consuming the reserved resources, the resources will be
implicitly declined. This is counter to what one would expect (that only
the remaining resources in the offer will be declined):

  Offer `cpus:10` -> `ACCEPT` with `RESERVE cpus(role):1`
 *Actual* implicit decline: `cpus:9;cpus(role):1`
 *Expected* implicit decline: `cpus:9`

The same issue is present with other transformational operations (i.e.,
`UNRESERVE`, `CREATE` and `DESTROY`).

This patch fixes this issue by only implicitly declining the "remaining"
untransformed resources, computed as follows:

  Offered = `cpus:10`
  Remaining = `cpus:9;cpus(role):1`
  Implicitly declined = remaining - (remaining - offered)
  = `cpus:9;cpus(role):1` - `cpus:(role):1`
  = `cpus:9`


Diffs (updated)
-

  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/8/

Changes: https://reviews.apache.org/r/70132/diff/7-8/


Testing
---

make check

More testing done in r/70537.


Thanks,

Chun-Hung Hsiao



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-25 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214901
---


Fix it, then Ship it!




Thank you Chun!

I think the description is still a bit tough of a read, here's a
suggestion to work off of that hopefully makes the issue clearer to
the uninitiated:

```
Currently, if a framework accepts an offer with a RESERVE operation,
the RESERVE resoures will be implicitly declined. This is counter to
what one would expect (that only the remaining resources in the offer
will be declined):

  Offer cpus:10 -> ACCEPT with RESERVE cpus(role):1
   *Actual* implicit decline: cpus:9;cpus(role):1
   *Expected* implicit decline: cpus:9

The same issue is present with the other transformational operations
(i.e. UNRESERVE, CREATE, ...).

This patch fixes the issue by computing the transformed "remaining"
resources in the offer and subtracting the converted resources from
the remaining resources. In the example above:

  Offered:   cpus:10
  Remaining: cpus:9;cpus(role):1
  Implicitly Declined = Remaining - (Remaining - Offered)
  = cpus:9;cpus(role):1 - cpus(role):1
  = cpus:9
```


src/master/master.cpp
Lines 5941-5950 (patched)


I think some clear formulas would be more readable over the wall of text?

```
// We now need to compute the amounts of resources to (1) decline with the
// filter and (2) recover without a filter.
//
//   Implicity  (Offered Resources).apply(Operations)
//Declined  = - LAUNCH[_GROUP] Resources
//   Resources- Converted Remaining Resources
//
//  |
//  V
//
//   Implicity  _offeredResources
//Declined  = - Converted Remaining Resources
//   Resources
//
//  |
//  V
//
//   Implicity  _offeredResources
//Declined  = - (_offeredResources - offeredResources)
//   Resources
//
```



src/master/master.cpp
Lines 5944-5945 (patched)


"We should" seems to imply there's no api contract or anything, probably 
this can just phrase it as what we're supposed to do and how we compute it?



src/master/master.cpp
Lines 5943-5944 (original), 5963-5964 (patched)


Let's remove this TODO, I don't think it's accurate. If pausing is a 
workaround of anything, it's that we don't allow recovering both pieces of the 
offer in a single call, which I don't believe there's a ticket for


- Benjamin Mahler


On April 25, 2019, 10:14 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> ---
> 
> (Updated April 25, 2019, 10:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
> https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/7/
> 
> 
> Testing
> ---
> 
> make check
> 
> More testing done in r/70537.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-25 Thread Chun-Hung Hsiao


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > 
> >
> > What do you think of getting rid of "implicitly declined" behavior for 
> > "cancelling operations"?
> > 
> > It seems that behavior is more driven by the implementation than 
> > intuitive api behavior; it e.g., forces frameworks to reason differently 
> > about operations executed in isolation vs. executed together. It seems 
> > having the identical behavior for both cases would both be easier to 
> > explain and also program against. The behavior that seems to make most 
> > sense for me would be to only ever implictly decline "untouched resources", 
> > e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) && 
> > UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.
> 
> Chun-Hung Hsiao wrote:
> It seems to me that "cancelling operations" as something that are both 1. 
> very rare and 2. make little sense for frameworks, so I'm more like 
> delivering a fix for common cases without making the alrealy-messy code path 
> more complicated. WDYT? Also @bmahler what's your opinion on @bbannier's 
> suggestion? IIRC you mentioned something like some are designed behaviors 
> before, but I didn't know the context.
> 
> Benjamin Mahler wrote:
> Thanks for bringing this up, it's certainly a bit bizarre of a use case. 
> I think the more common case is UNRESERVE on its own, where it still seems a 
> bit bizarre that the "untouched" resources are declined with the filter and 
> the UNRESERVE resources are not filtered. That seems a bit arbitrary to me, 
> but I'm not sure what to do about it without allowing the framework to be 
> explicit about which part it wants to "decline and filter" when accepting, 
> and this requires an interface change.
> 
> Personally I would consider RESERVE+UNRESERVE to be "touching" those 
> resources, but I don't think we should worry about it in this patch (I assume 
> that wasn't your intent anyway, and you were more wanting to raise this topic 
> for discussion?)
> 
> Benjamin Bannier wrote:
> What I worry most is that this edge case makes explaining suggested 
> framework behavior harder ("should any of the offer operations in a single 
> accept call cancel each other out you will not get offered the resources 
> again until the default offer filter timeout expires (the timeout isn't up to 
> you here)" -> framework defensively revives after each accept call if it has 
> more work to do). Instead we would like frameworks to focus on getting their 
> offer handling and decline behavior correct and only ever revive in 
> exceptional scenarios (e.g., even "_new_ work arrived").
> 
> Since this patch tries to fix incorrect master behavior we should make 
> sure to get the behavior somewhat right or else risk that frameworks 
> implement suboptimal behavior which will be hard to unlearn. That being said, 
> the fact that no framework author complained when this bug was introduced 
> makes me worry that they either do not care about how fast offers arrive or 
> already implement a overly pessimistc approach (e.g., revive whenever there 
> is more work to do in their state machine).

The timeout is still controlled by the scheduler even in the case where 
operations cancel each other out.

We're trying to move to a world where frameworks are more expilict about what 
filters they want to set up. With this in mind, it seems to me that neither the 
current fix nor what you suggested fits into this goal well, because here we're 
*guessing* what the frameworks' intentions. Say if a framework reserves two 
CPUs and unreserves the two reserved CPUs and then reserves two CPUs again, 
there is no way to infer if the framework is trying to use just two CPUs or 
four CPUs. This could become really messy in terms of both semantics and 
implementation.

Since as you said we're not aware of any complain about this, I'd say let's 
keep the logic simple and determine the declined resources based on the end 
result of an `ACCEPT` call. Dropping this for now.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
---


On April 25, 2019, 10:14 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> ---
> 
> (Updated April 25, 2019, 10:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
> https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description

Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-25 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
---

(Updated April 25, 2019, 10:14 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
---

Moved documentation to r/70547.


Bugs: MESOS-9616
https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
---

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly declined. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be declined.


Diffs (updated)
-

  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/7/

Changes: https://reviews.apache.org/r/70132/diff/6-7/


Testing
---

make check

More testing done in r/70537.


Thanks,

Chun-Hung Hsiao



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-23 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
---

(Updated April 24, 2019, 2:02 a.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
---

Addressed some of Benjamin's comments and added examples.


Bugs: MESOS-9616
https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
---

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly declined. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be declined.


Diffs (updated)
-

  docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/6/

Changes: https://reviews.apache.org/r/70132/diff/5-6/


Testing (updated)
---

make check

More testing done in r/70537.


Thanks,

Chun-Hung Hsiao



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-23 Thread Chun-Hung Hsiao


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > src/tests/slave_tests.cpp
> > Lines 6499 (patched)
> > 
> >
> > Since the changes in this patch are strongly related to behavior 
> > framework authors need to reason about I strongly feel that we must add a 
> > test for the expected behavior.
> 
> Chun-Hung Hsiao wrote:
> I could add a unit test in a separated patch. This patch itself will be 
> backported, after discussed with @bmahler.

Done in r/70537.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
---


On April 23, 2019, 1:15 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> ---
> 
> (Updated April 23, 2019, 1:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
> https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-23 Thread Benjamin Bannier


> On April 23, 2019, 12:47 p.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > 
> >
> > What do you think of getting rid of "implicitly declined" behavior for 
> > "cancelling operations"?
> > 
> > It seems that behavior is more driven by the implementation than 
> > intuitive api behavior; it e.g., forces frameworks to reason differently 
> > about operations executed in isolation vs. executed together. It seems 
> > having the identical behavior for both cases would both be easier to 
> > explain and also program against. The behavior that seems to make most 
> > sense for me would be to only ever implictly decline "untouched resources", 
> > e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) && 
> > UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.
> 
> Chun-Hung Hsiao wrote:
> It seems to me that "cancelling operations" as something that are both 1. 
> very rare and 2. make little sense for frameworks, so I'm more like 
> delivering a fix for common cases without making the alrealy-messy code path 
> more complicated. WDYT? Also @bmahler what's your opinion on @bbannier's 
> suggestion? IIRC you mentioned something like some are designed behaviors 
> before, but I didn't know the context.
> 
> Benjamin Mahler wrote:
> Thanks for bringing this up, it's certainly a bit bizarre of a use case. 
> I think the more common case is UNRESERVE on its own, where it still seems a 
> bit bizarre that the "untouched" resources are declined with the filter and 
> the UNRESERVE resources are not filtered. That seems a bit arbitrary to me, 
> but I'm not sure what to do about it without allowing the framework to be 
> explicit about which part it wants to "decline and filter" when accepting, 
> and this requires an interface change.
> 
> Personally I would consider RESERVE+UNRESERVE to be "touching" those 
> resources, but I don't think we should worry about it in this patch (I assume 
> that wasn't your intent anyway, and you were more wanting to raise this topic 
> for discussion?)

What I worry most is that this edge case makes explaining suggested framework 
behavior harder ("should any of the offer operations in a single accept call 
cancel each other out you will not get offered the resources again until the 
default offer filter timeout expires (the timeout isn't up to you here)" -> 
framework defensively revives after each accept call if it has more work to 
do). Instead we would like frameworks to focus on getting their offer handling 
and decline behavior correct and only ever revive in exceptional scenarios 
(e.g., even "_new_ work arrived").

Since this patch tries to fix incorrect master behavior we should make sure to 
get the behavior somewhat right or else risk that frameworks implement 
suboptimal behavior which will be hard to unlearn. That being said, the fact 
that no framework author complained when this bug was introduced makes me worry 
that they either do not care about how fast offers arrive or already implement 
a overly pessimistc approach (e.g., revive whenever there is more work to do in 
their state machine).


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
---


On April 23, 2019, 3:15 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> ---
> 
> (Updated April 23, 2019, 3:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
> https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>

Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-23 Thread Benjamin Mahler


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > 
> >
> > What do you think of getting rid of "implicitly declined" behavior for 
> > "cancelling operations"?
> > 
> > It seems that behavior is more driven by the implementation than 
> > intuitive api behavior; it e.g., forces frameworks to reason differently 
> > about operations executed in isolation vs. executed together. It seems 
> > having the identical behavior for both cases would both be easier to 
> > explain and also program against. The behavior that seems to make most 
> > sense for me would be to only ever implictly decline "untouched resources", 
> > e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) && 
> > UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.
> 
> Chun-Hung Hsiao wrote:
> It seems to me that "cancelling operations" as something that are both 1. 
> very rare and 2. make little sense for frameworks, so I'm more like 
> delivering a fix for common cases without making the alrealy-messy code path 
> more complicated. WDYT? Also @bmahler what's your opinion on @bbannier's 
> suggestion? IIRC you mentioned something like some are designed behaviors 
> before, but I didn't know the context.

Thanks for bringing this up, it's certainly a bit bizarre of a use case. I 
think the more common case is UNRESERVE on its own, where it still seems a bit 
bizarre that the "untouched" resources are declined with the filter and the 
UNRESERVE resources are not filtered. That seems a bit arbitrary to me, but I'm 
not sure what to do about it without allowing the framework to be explicit 
about which part it wants to "decline and filter" when accepting, and this 
requires an interface change.

Personally I would consider RESERVE+UNRESERVE to be "touching" those resources, 
but I don't think we should worry about it in this patch (I assume that wasn't 
your intent anyway, and you were more wanting to raise this topic for 
discussion?)


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > src/master/master.cpp
> > Lines 5963-5964 (original), 5983-5984 (patched)
> > 
> >
> > Is this a workaround we need until MESOS-4553 gets resolved? If it is, 
> > let's add a `TODO`.
> 
> Chun-Hung Hsiao wrote:
> I don't know actually lol. I just copied it from 
> https://github.com/apache/mesos/blob/45c9788618e7123f408a1dffcf6772a1285cd2e5/src/master/master.cpp#L10969-L10972,
>  as @mzhu suggested that if there's an allocation in between there might be 
> offer fragmentation. Is this a workaround for MESOS-4553?

Well, one could say all the interactions with the allocator around offers are a 
"workaround" until MESOS-4553 is done :)

I would say that the pause/resume here is more a workaround of the limited 
recoverResources interface (i.e. it doesn't let you specify a collection of 
resources and filters, so we need to perform two calls). The only issue with 
pause/resume is: https://issues.apache.org/jira/browse/MESOS-9734


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
---


On April 23, 2019, 1:15 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> ---
> 
> (Updated April 23, 2019, 1:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
> https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-23 Thread Chun-Hung Hsiao


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > 
> >
> > What do you think of getting rid of "implicitly declined" behavior for 
> > "cancelling operations"?
> > 
> > It seems that behavior is more driven by the implementation than 
> > intuitive api behavior; it e.g., forces frameworks to reason differently 
> > about operations executed in isolation vs. executed together. It seems 
> > having the identical behavior for both cases would both be easier to 
> > explain and also program against. The behavior that seems to make most 
> > sense for me would be to only ever implictly decline "untouched resources", 
> > e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) && 
> > UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.

It seems to me that "cancelling operations" as something that are both 1. very 
rare and 2. make little sense for frameworks, so I'm more like delivering a fix 
for common cases without making the alrealy-messy code path more complicated. 
WDYT? Also @bmahler what's your opinion on @bbannier's suggestion? IIRC you 
mentioned something like some are designed behaviors before, but I didn't know 
the context.


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > src/master/master.cpp
> > Lines 5963-5964 (original), 5983-5984 (patched)
> > 
> >
> > Is this a workaround we need until MESOS-4553 gets resolved? If it is, 
> > let's add a `TODO`.

I don't know actually lol. I just copied it from 
https://github.com/apache/mesos/blob/45c9788618e7123f408a1dffcf6772a1285cd2e5/src/master/master.cpp#L10969-L10972,
 as @mzhu suggested that if there's an allocation in between there might be 
offer fragmentation. Is this a workaround for MESOS-4553?


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > src/tests/slave_tests.cpp
> > Lines 6499 (patched)
> > 
> >
> > Since the changes in this patch are strongly related to behavior 
> > framework authors need to reason about I strongly feel that we must add a 
> > test for the expected behavior.

I could add a unit test in a separated patch. This patch itself will be 
backported, after discussed with @bmahler.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
---


On April 23, 2019, 1:15 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> ---
> 
> (Updated April 23, 2019, 1:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
> https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-23 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
---




docs/scheduler-http-api.md
Line 132 (original), 132 (patched)


What do you think of getting rid of "implicitly declined" behavior for 
"cancelling operations"?

It seems that behavior is more driven by the implementation than intuitive 
api behavior; it e.g., forces frameworks to reason differently about operations 
executed in isolation vs. executed together. It seems having the identical 
behavior for both cases would both be easier to explain and also program 
against. The behavior that seems to make most sense for me would be to only 
ever implictly decline "untouched resources", e.g., if accepting offered 
`cpus:4` with `RESERVE(cpus:2, role) && UNRESERVE(cpus:2, role)` we would 
implicitly decline only `cpus:2`.



src/master/master.cpp
Lines 5963-5964 (original), 5983-5984 (patched)


Is this a workaround we need until MESOS-4553 gets resolved? If it is, 
let's add a `TODO`.



src/tests/slave_tests.cpp
Lines 6499 (patched)


Since the changes in this patch are strongly related to behavior framework 
authors need to reason about I strongly feel that we must add a test for the 
expected behavior.


- Benjamin Bannier


On April 23, 2019, 3:15 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> ---
> 
> (Updated April 23, 2019, 3:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
> https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-22 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
---

(Updated April 23, 2019, 1:15 a.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
---

Updated variable names and comments to make it easier to understand.


Summary (updated)
-

Do not implicitly decline speculatively converted resources.


Bugs: MESOS-9616
https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description (updated)
---

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly declined. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be declined.


Diffs (updated)
-

  docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/5/

Changes: https://reviews.apache.org/r/70132/diff/4-5/


Testing
---

make check


Thanks,

Chun-Hung Hsiao