> On April 23, 2019, 12:47 p.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140649#file2140649line132>
> >
> > What do you think of getting rid of "implicitly declined" behavior for
> > "cancelling operations"?
> >
> > It seems that behavior is more driven by the implementation than
> > intuitive api behavior; it e.g., forces frameworks to reason differently
> > about operations executed in isolation vs. executed together. It seems
> > having the identical behavior for both cases would both be easier to
> > explain and also program against. The behavior that seems to make most
> > sense for me would be to only ever implictly decline "untouched resources",
> > e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) &&
> > UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.
>
> Chun-Hung Hsiao wrote:
> It seems to me that "cancelling operations" as something that are both 1.
> very rare and 2. make little sense for frameworks, so I'm more like
> delivering a fix for common cases without making the alrealy-messy code path
> more complicated. WDYT? Also @bmahler what's your opinion on @bbannier's
> suggestion? IIRC you mentioned something like some are designed behaviors
> before, but I didn't know the context.
>
> Benjamin Mahler wrote:
> Thanks for bringing this up, it's certainly a bit bizarre of a use case.
> I think the more common case is UNRESERVE on its own, where it still seems a
> bit bizarre that the "untouched" resources are declined with the filter and
> the UNRESERVE resources are not filtered. That seems a bit arbitrary to me,
> but I'm not sure what to do about it without allowing the framework to be
> explicit about which part it wants to "decline and filter" when accepting,
> and this requires an interface change.
>
> Personally I would consider RESERVE+UNRESERVE to be "touching" those
> resources, but I don't think we should worry about it in this patch (I assume
> that wasn't your intent anyway, and you were more wanting to raise this topic
> for discussion?)
What I worry most is that this edge case makes explaining suggested framework
behavior harder ("should any of the offer operations in a single accept call
cancel each other out you will not get offered the resources again until the
default offer filter timeout expires (the timeout isn't up to you here)" ->
framework defensively revives after each accept call if it has more work to
do). Instead we would like frameworks to focus on getting their offer handling
and decline behavior correct and only ever revive in exceptional scenarios
(e.g., even "_new_ work arrived").
Since this patch tries to fix incorrect master behavior we should make sure to
get the behavior somewhat right or else risk that frameworks implement
suboptimal behavior which will be hard to unlearn. That being said, the fact
that no framework author complained when this bug was introduced makes me worry
that they either do not care about how fast offers arrive or already implement
a overly pessimistc approach (e.g., revive whenever there is more work to do in
their state machine).
- Benjamin
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
-----------------------------------------------------------
On April 23, 2019, 3:15 a.m., Chun-Hung Hsiao wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
>
> (Updated April 23, 2019, 3:15 a.m.)
>
>
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
>
>
> Bugs: MESOS-9616
> https://issues.apache.org/jira/browse/MESOS-9616
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
>
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
>
>
> Diffs
> -----
>
> docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db
> src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084
> src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559
>
>
> Diff: https://reviews.apache.org/r/70132/diff/5/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Chun-Hung Hsiao
>
>