Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-10 Thread Meng Zhu
Thanks Ben. Some thoughts below:

>From a scheduler's perspective the difference between the two models is:
>
> (1) expressing "how much more" you need
> (2) expressing an offer "matcher"
>
> So:
>
> (1) covers the middle part of the demand quantity spectrum we currently
> have: unsuppressed -> infinite additional demand, suppressed -> 0
> additional demand, and now also unsuppressed w/ request of X -> X
> additional demand
>

I am not quite sure if the middle ground (expressing "how much more")
is needed. Even with matchers, the framework may still find itself to cycle
through several offers before finding the right resource. Setting
"effective limit"
will surely prolong this process. I guess the motivation here is to avoid
e.g. sending
too much resources to a just-unsuppressed framework that only wants to
launch a small task. I would say the inefficiency of flooding the framework
with offers would be tolerable if the framework rejects most offers in time,
as we are making progress. Even in cases where such limiting is desired
(e.g. the number of frameworks is too large), I think it is more appropirate
to rely on operators to configure the cluster prioirty by e.g. setting
limits,
than to expect individual frameworks to perform such altruistc action to
limit its own offers (while still having pending work).


> (2) is a global filtering mechanism to avoid getting offers in an unusable
> shape
>

Yeah, as you mentioned, I think we all agree that adding global matchers to
filter-out undesired resources is a good direction--which I think is what
matters most here. I think the small difference lies in how should the
framework
communicate the information: whether a more declarative approach or
exposing the global matchers to frameworks directly.


> They both solve inefficiencies we have, and they're complementary: a
> "request" could actually consist of (1) and (2), e.g. "I need an additional
> 10 cpus, 100GB mem, and I want offers to contain [1cpu, 10GB mem]".
>
> I'll schedule a meeting to discuss further. We should also make sure we
> come back to the original problem in this thread around REVIVE retries.
>
> On Mon, Dec 10, 2018 at 11:58 AM Benjamin Bannier <
> benjamin.bann...@mesosphere.io> wrote:
>
> > Hi Ben et al.,
> >
> > I'd expect frameworks to *always* know how to accept or decline offers in
> > general. More involved frameworks might know how to suppress offers. I
> > don't expect that any framework models filters and their associated
> > durations in detail (that's why I called them a Mesos implementation
> > detail) since there is not much benefit to a framework's primary goal of
> > running tasks as quickly as possible.
> >
> > > I couldn't quite tell how you were imagining this would work, but let
> me
> > spell out the two models that I've been considering, and you can tell me
> if
> > one of these matches what you had in mind or if you had a different model
> > in mind:
> >
> > > (1) "Effective limit" or "give me this much more" ...
> >
> > This sounds more like an operator-type than a framework-type API to me.
> > I'd assume that frameworks would not worry about their total limit the
> way
> > an operator would, but instead care about getting resources to run a
> > certain task at a point in time. I could also imagine this being easy to
> > use incorrectly as frameworks would likely need to understand their total
> > limit when issuing the call which could require state or coordination
> among
> > internal framework components (think: multi-purpose frameworks like
> > Marathon or Aurora).
> >
> > > (2) "Matchers" or "give me things that look like this": when a
> scheduler
> > expresses its "request" for a role, it would act as a "matcher" (opposite
> > of filter). When mesos is allocating resources, it only proceeds if
> > (requests.matches(resources) && !filters.filtered(resources)). The open
> > ended aspect here is what a matcher would consist of. Consider a case
> where
> > a matcher is a resource quantity and multiple are allowed; if any matcher
> > matches, the result is a match. This would be equivalent to letting
> > frameworks specify their own --min_allocatable_resources for a role
> (which
> > is something that has been considered). The "matchers" could be more
> > sophisticated: full resource objects just like filters (but global), full
> > resource objects but with quantities for non-scalar resources like ports,
> > etc.
> >
> > I was thinking in this direction, but what you described is more involved
> > than what I had in mind as a possible first attempt. I'd expect that
> > frameworks currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not
> > as a way to manage their filter state tracked in the allocator. Assuming
> we
> > have some way to express resource quantities (i.e., MESOS-9314), we
> should
> > be able to improve on `REVIVE` by providing a `REQUEST_RESOURCES` which
> > clears all filters for resource containing the requested resources (or
> all
> > filters 

Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-10 Thread Benjamin Mahler
I think we're agreed:

-There are no schedulers modeling the existing per-agent time-based
filters that mesos is tracking, and we shouldn't go in a direction that
encourages frameworks to try to model and manage these. So, we should be
very careful in considering something like CLEAR_FILTERS. We're probably
also agreed that the current filters aren't so great. :)
-Letting a scheduler have more explicit control over the offers it gets
(both in shape of the offers and overall quantity of resources) is a good
direction to go in to reduce the inefficiency in the pessimistic offer
model.
-Combining matchers of model (2) with REVIVE may eliminate the need for
CLEAR_FILTERS. I think once you have global matchers in play, it eliminates
the need for the existing decline filters to involve resource subsets and
we may be able to move new schedulers forward with a better model without
breaking old schedulers.

I don’t think model (1) was understood as intended. Schedulers would not be
expressing limits, they would be expressing a "request" equivalent to “how
much more they want”. The internal effective limit (equal to
allocation+request) is just an implementation detail here that demonstrates
how it fits cleanly into the allocation algorithm. So, if a scheduler needs
to run 10 tasks with [1 cpu, 10GB mem], they would express a request of
[10cpus ,100GB mem] regardless of how much else is already allocated at
that role/scheduler node.

>From a scheduler's perspective the difference between the two models is:

(1) expressing "how much more" you need
(2) expressing an offer "matcher"

So:

(1) covers the middle part of the demand quantity spectrum we currently
have: unsuppressed -> infinite additional demand, suppressed -> 0
additional demand, and now also unsuppressed w/ request of X -> X
additional demand

(2) is a global filtering mechanism to avoid getting offers in an unusable
shape

They both solve inefficiencies we have, and they're complementary: a
"request" could actually consist of (1) and (2), e.g. "I need an additional
10 cpus, 100GB mem, and I want offers to contain [1cpu, 10GB mem]".

I'll schedule a meeting to discuss further. We should also make sure we
come back to the original problem in this thread around REVIVE retries.

On Mon, Dec 10, 2018 at 11:58 AM Benjamin Bannier <
benjamin.bann...@mesosphere.io> wrote:

> Hi Ben et al.,
>
> I'd expect frameworks to *always* know how to accept or decline offers in
> general. More involved frameworks might know how to suppress offers. I
> don't expect that any framework models filters and their associated
> durations in detail (that's why I called them a Mesos implementation
> detail) since there is not much benefit to a framework's primary goal of
> running tasks as quickly as possible.
>
> > I couldn't quite tell how you were imagining this would work, but let me
> spell out the two models that I've been considering, and you can tell me if
> one of these matches what you had in mind or if you had a different model
> in mind:
>
> > (1) "Effective limit" or "give me this much more" ...
>
> This sounds more like an operator-type than a framework-type API to me.
> I'd assume that frameworks would not worry about their total limit the way
> an operator would, but instead care about getting resources to run a
> certain task at a point in time. I could also imagine this being easy to
> use incorrectly as frameworks would likely need to understand their total
> limit when issuing the call which could require state or coordination among
> internal framework components (think: multi-purpose frameworks like
> Marathon or Aurora).
>
> > (2) "Matchers" or "give me things that look like this": when a scheduler
> expresses its "request" for a role, it would act as a "matcher" (opposite
> of filter). When mesos is allocating resources, it only proceeds if
> (requests.matches(resources) && !filters.filtered(resources)). The open
> ended aspect here is what a matcher would consist of. Consider a case where
> a matcher is a resource quantity and multiple are allowed; if any matcher
> matches, the result is a match. This would be equivalent to letting
> frameworks specify their own --min_allocatable_resources for a role (which
> is something that has been considered). The "matchers" could be more
> sophisticated: full resource objects just like filters (but global), full
> resource objects but with quantities for non-scalar resources like ports,
> etc.
>
> I was thinking in this direction, but what you described is more involved
> than what I had in mind as a possible first attempt. I'd expect that
> frameworks currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not
> as a way to manage their filter state tracked in the allocator. Assuming we
> have some way to express resource quantities (i.e., MESOS-9314), we should
> be able to improve on `REVIVE` by providing a `REQUEST_RESOURCES` which
> clears all filters for resource containing the requested resources 

[API WG] Meeting tomorrow

2018-12-10 Thread Greg Mann
Hi all,
The API working group will meet tomorrow, Dec. 11 at 11am PST. On the
agenda we have:

   - Proposed calls for the scheduler API:
  - UNSUPPRESS
  - CLEAR_FILTER
  - REQUEST_RESOURCE
  - Adding a new 'ResourceQuantity' type
   - Improving the scheduler operation reconciliation API


We will meet at this Zoom link: https://zoom.us/j/567559753
You can check out the agenda doc here

!

Cheers,
Greg


Re: full Zookeeper authentication

2018-12-10 Thread Joseph Wu
There are two options for contributing:
1) You can make a pull request against the GitHub mirror:
https://github.com/apache/mesos .  We generally only use PRs for minor
changes, like typos, documentation, or uploading binaries.  See
http://mesos.apache.org/documentation/latest/beginner-contribution/
2) For larger changes, or more involved/impactful changes, we prefer
https://reviews.apache.org/ instead.  See
http://mesos.apache.org/documentation/latest/advanced-contribution/

I suspect this ZK Auth feature will be a fairly significant change, so I
recommend option (2).

On Mon, Dec 10, 2018 at 11:47 AM Kishchukov, Dmitrii (NIH/NLM/NCBI) [C] <
dmitrii.kishchu...@nih.gov> wrote:

> I have a working version. How should I make the patch? A branch in the git
> repository? Do I need to get permissions?
>
> --
>
> Dmitrii Kishchukov.
> Leading software developer
> Submission Portal Team
>
>
> On 12/6/18, 12:56 PM, "Vinod Kone"  wrote:
>
> Dmitrii.
>
> That approach sounds reasonable. Would you like to work on this? Are
> you
> looking for a reviewer/shepherd?
>
> On Thu, Dec 6, 2018 at 11:28 AM Kishchukov, Dmitrii (NIH/NLM/NCBI) [C]
> <
> dmitrii.kishchu...@nih.gov> wrote:
>
> > Mesos allow using only digest authentication scheme for Zookeeper.
> Which
> > is bad because Zookeeper has quite a flexible security model.
> > It is easy to make you own authenticator with its own scheme name.
> >
> > To support fully Zookeeper authentication, Mesos has pass two items
> into
> > Zookeeper:
> > scheme and credentials.
> > credentials can have different format depending on authentication
> scheme.
> > For digest scheme it is ‘login:password’
> >
> > All Mesos should do just pass scheme and credentials to Zookeeper.
> >
> > Another improvement might be be to configure credentials via file
> instead
> > of URI
> >
> > For example it can be two command line options:
> > --zk_auth_scheme and –zk_auth_credentials
> >
> > It can be used like this:
> > --zk_auth_scheme=some_custome_scheme –zk_auth_credentials=filename
> >
> > --zk_auth_credentials can just get all contents of the file as
> credentials
> > string.
> >
> > Class Authentication in Mesos already contains all that we need. The
> > problem is what Mesos pass to the constructor.
> >
> >
> > --
> >
> > Dmitrii Kishchukov.
> >
> >
>
>
>


Re: full Zookeeper authentication

2018-12-10 Thread Kishchukov, Dmitrii (NIH/NLM/NCBI) [C]
I have a working version. How should I make the patch? A branch in the git 
repository? Do I need to get permissions?

-- 
 
Dmitrii Kishchukov. 
Leading software developer
Submission Portal Team
 

On 12/6/18, 12:56 PM, "Vinod Kone"  wrote:

Dmitrii.

That approach sounds reasonable. Would you like to work on this? Are you
looking for a reviewer/shepherd?

On Thu, Dec 6, 2018 at 11:28 AM Kishchukov, Dmitrii (NIH/NLM/NCBI) [C] <
dmitrii.kishchu...@nih.gov> wrote:

> Mesos allow using only digest authentication scheme for Zookeeper. Which
> is bad because Zookeeper has quite a flexible security model.
> It is easy to make you own authenticator with its own scheme name.
>
> To support fully Zookeeper authentication, Mesos has pass two items into
> Zookeeper:
> scheme and credentials.
> credentials can have different format depending on authentication scheme.
> For digest scheme it is ‘login:password’
>
> All Mesos should do just pass scheme and credentials to Zookeeper.
>
> Another improvement might be be to configure credentials via file instead
> of URI
>
> For example it can be two command line options:
> --zk_auth_scheme and –zk_auth_credentials
>
> It can be used like this:
> --zk_auth_scheme=some_custome_scheme –zk_auth_credentials=filename
>
> --zk_auth_credentials can just get all contents of the file as credentials
> string.
>
> Class Authentication in Mesos already contains all that we need. The
> problem is what Mesos pass to the constructor.
>
>
> --
>
> Dmitrii Kishchukov.
>
>




Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-10 Thread Benjamin Bannier
Hi Ben et al.,

I'd expect frameworks to *always* know how to accept or decline offers in 
general. More involved frameworks might know how to suppress offers. I don't 
expect that any framework models filters and their associated durations in 
detail (that's why I called them a Mesos implementation detail) since there is 
not much benefit to a framework's primary goal of running tasks as quickly as 
possible.

> I couldn't quite tell how you were imagining this would work, but let me 
> spell out the two models that I've been considering, and you can tell me if 
> one of these matches what you had in mind or if you had a different model in 
> mind:

> (1) "Effective limit" or "give me this much more" ...

This sounds more like an operator-type than a framework-type API to me. I'd 
assume that frameworks would not worry about their total limit the way an 
operator would, but instead care about getting resources to run a certain task 
at a point in time. I could also imagine this being easy to use incorrectly as 
frameworks would likely need to understand their total limit when issuing the 
call which could require state or coordination among internal framework 
components (think: multi-purpose frameworks like Marathon or Aurora).

> (2) "Matchers" or "give me things that look like this": when a scheduler 
> expresses its "request" for a role, it would act as a "matcher" (opposite of 
> filter). When mesos is allocating resources, it only proceeds if 
> (requests.matches(resources) && !filters.filtered(resources)). The open ended 
> aspect here is what a matcher would consist of. Consider a case where a 
> matcher is a resource quantity and multiple are allowed; if any matcher 
> matches, the result is a match. This would be equivalent to letting 
> frameworks specify their own --min_allocatable_resources for a role (which is 
> something that has been considered). The "matchers" could be more 
> sophisticated: full resource objects just like filters (but global), full 
> resource objects but with quantities for non-scalar resources like ports, etc.

I was thinking in this direction, but what you described is more involved than 
what I had in mind as a possible first attempt. I'd expect that frameworks 
currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not as a way to 
manage their filter state tracked in the allocator. Assuming we have some way 
to express resource quantities (i.e., MESOS-9314), we should be able to improve 
on `REVIVE` by providing a `REQUEST_RESOURCES` which clears all filters for 
resource containing the requested resources (or all filters if no explicit 
resource request). Even if that let to more offers than needed it would likely 
still perform better than `REVIVE` (or `CLEAR_FILTERS` which has similar 
semantics). If we keep the scope of these calls narrow and clear we have 
freedom to be smarter in the future internally.

This should not only be pretty straight-forward to implement in Mesos, but I'd 
imagine also map pretty well onto framework use cases (i.e., I assume 
frameworks are interested in controlling the resources they are offered, not in 
managing filters we maintain for them).

> With regard to incentives, the incentive today for adhering to suppress is 
> that your framework will be doing less processing of offers when it has no 
> work to do and that other instances of your own framework as well as other 
> frameworks would get resources faster. The second aspect is indeed indirect. 
> The incentive structure with "request" / "demand" does indeed seem to be more 
> direct (while still having the indirect benefit on other frameworks / roles): 
> "I'll tell you what to show me so that I get it faster".

Additionally, by potentially explicitly introducing filters as a framework API 
concept, we ask the majority of framework authors to reason about an aspect 
they didn't have to worry about up until then (previously: "if work arrives, 
revive, and decline until an offer can be accepted, then suppress"). If we 
provided them something which fits their *current mental model* while also 
gives them more control, we have a higher chance of it being globally useful 
and adopted than if we'd add an expert-level knob.

> However, as far as performance is concerned, we still need suppress adoption 
> and not just request adoption. Suppress is actually the bigger performance 
> win at the current time, unless we think that frameworks with no work would 
> "effectively suppress" via requests (e.g. "no work? set a 0 request so 
> nothing matches"). Note though, that "effectively suppressing" via requests 
> has the same incentive structure as suppress itself, right?

I was also wondering about how what I suggested would fit here as we have two 
concepts controlling if and which offers a framework gets (a single global flag 
for suppress, and a zoo of many fine-grained filters). Currently we only expose 
`SUPPRESS`, `DECLINE`, and `REVIVE`. It seems that explicitly adding