Re: Propose to create a Kubernetes framework for Mesos

2018-12-05 Thread Jie Yu
I'd like to get some feedback on what Mesos users want. I can potentially
see two major use cases:

(1) I just want k8s to run on Mesos, along with other Mesos frameworks,
sharing the same resources pool. I don't really care about nodeless.
Ideally, i'd like to run upstream k8s (include kubelet). The original k8s
on mesos framework has been retired, and the new Mesosphere MKE is not open
source, and only runs on Mesosphere DC/OS. I need one open source solution
here.
(2) I want nodeless because I believe it has a tighter integration with
Mesos, as compared to (2), and can solve the static partition issue. (1) is
more like a k8s installer, and you can do that without Mesos.

*Can folks chime in here?*

However, I'm not sure if re-implementing k8s-scheduler as a Mesos framework
> is the right approach. I imagine k8s scheduler is significant piece of
> code  which we need to re-implement and on top of it as new API objects are
> added to k8s API, we need to keep pace with k8s scheduler for parity. The
> approach we (in the community) took with Spark (and Jenkins to some extent)
> was for the scheduling innovation happen in Spark community and we just let
> Spark launch spark executors via Mesos and let Spark launch its tasks out
> of band of Mesos. We used to have a version of Spark framework (fine
> grained mode?) where spark tasks were launched via Mesos offers but that
> was deprecated, partly because of maintainability. Will this k8s framework
> have similar problem? Sounds like one of the problems with the existing k8s
> framework implementations it the pre-launching of kubelets; can we use the
> k8s autoscaler to solve that problem?


This is a good concern. It's around 17k lines of code in k8s scheduler.

Jies-MacBook-Pro:scheduler jie$ pwd
/Users/jie/workspace/kubernetes/pkg/scheduler
Jies-MacBook-Pro:scheduler jie$ loc --exclude .*_test.go

 Language FilesLinesBlank  Comment
 Code

 Go  8317429 2165 3798
11466

 Total   8317429 2165 3798
11466


Also, I think (I might be wrong) most k8s users are not directly creating
> pods via the API but rather using higher level abstractions like replica
> sets, stateful sets, daemon sets etc. How will that fit into this
> architecture? Will the framework need to re-implement those controllers as
> well?


This is not true. You can re-use most of the controllers. Those controllers
will create pods as you said, and the mesos framework will be responsible
for scheduling those pods created.

- Jie

On Mon, Dec 3, 2018 at 9:56 AM Cecile, Adam  wrote:

> On 12/3/18 5:40 PM, Michał Łowicki wrote:
>
>
>
> On Thu, Nov 29, 2018 at 1:22 AM Vinod Kone  wrote:
>
>> Cameron and Michal: I would love to understand your motivations and use
>> cases for a k8s Mesos framework in a bit more detail. Looks like you are
>> willing to rewrite your existing app definitions into k8s API spec. At
>> this
>> point, why are you still interested in Mesos as a CAAS backend? Is it
>> because of scalability / reliability? Or is it because you still want to
>> run non-k8s workloads/frameworks in this world? What are these workloads?
>>
>
> Mesos with its scalability and ability to run many frameworks (like
> cron-like jobs, spark, proprietary) gives more flexibility in the long run.
> Right now we're at the stage where Marathon UI in public version isn't
> maintained so looking to have something with better community support.
> Having entity like k8s-compliant scheduler maybe could help with adopting
> other community-driven solutions but I also think that going into that
> direction should be well thought and planned process.
>
> We're sharing the exact same feeling. My next project will probably go
> full k8s because I don't feel confident in mesos future as an opensource
> project.
>
> Marathon UI still not supporting GPUs (even in JSON mode, thanks to
> marshaling) is the tip of the iceberg. I reported the issue ages ago and I
> can understand nobody cares because DC/OS comes with a different
> (closed-source I bet) UI.
>
>
>
>>
>> In general, I'm in favor of Mesos coming shipped with a default scheduler.
>> I think it might help with the adoption similar to what happened with the
>> command/default executor. In hindsight, we should've done this a long time
>> ago. But, oh well, we were too optimistic that a single "default"
>> scheduler
>> will rule in the ecosystem which didn't quite pan out.
>>
>> However, I'm not sure if re-implementing k8s-scheduler as a Mesos
>> framework
>> is the right approach. I imagine k8s scheduler is significant piece of
>> code  

Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-05 Thread Benjamin Mahler
Thanks for bringing REQUEST_RESOURCES up for discussion, it's one of the
mechanisms that we've been considering for further scaling pessimistic
offers before we make the migration to optimistic offers. It's also been
referred to as "demand" rather than "request", but for the sake of this
discussion consider them the same.

I couldn't quite tell how you were imagining this would work, but let me
spell out the two models that I've been considering, and you can tell me if
one of these matches what you had in mind or if you had a different model
in mind:

(1) "Effective limit" or "give me this much more": when a scheduler
expresses its "request" for a role, it would be equivalent to setting an
"effective limit" on the framework leaf node underneath the role node (i.e.
.../role/). The effective limit would probably be set to
(request + existing .../role/ wrote:

> Hi Meng,
>
> thanks for the proposal, I agree that the way these two aspects are
> currently entangled is an issue (e.g., for master/allocator performance
> reasons). At the same time, the workflow we currently expect frameworks to
> follow is conceptually not hard to grasp,
>
> (1) If framework has work then
> (i) put framework in unsuppressed state,
> (ii) decline not matching offers with a long filter duration.
> (2) If an offer matches, accept.
> (3) If there is no more work, suppress. GOTO (1).
>
> Here the framework does not need to track its filters across allocation
> cycles (they are an unexposed implementation detail of the hierarchical
> allocator anyway) which e.g., allows metaschedulers like Marathon or Apache
> Aurora to decouple the scheduling of different workloads. A downside of
> this interface is that
>
> * there is little incentive for frameworks to use SUPPRESS in addition to
> filters, and
> * unsupression is all-or-nothing, forcing the master to send potentially
> all unused resources to one framework, even if it is only interested in a
> fraction. This can cause, at least temporal, non-optimal allocation
> behavior.
>
> It seems to me that even though adding UNSUPPRESS and CLEAR_FILTERS would
> give frameworks more control, it would only be a small improvement. In
> above framework workflow we would allow a small improvement if the
> framework knows that a new workload matches a previously running workflow
> (i.e., it can infer that no filters for the resources it is interested in
> is active) so that it can issue UNSUPPRESS instead of CLEAR_FILTERS.
> Incidentally, there seems little local benefit for frameworks to use these
> new calls as they’d mostly help the master and I’d imagine we wouldn’t want
> to imply that clearing filters would unsuppress the framework. This seems
> too little to me, and we run the danger that frameworks would just always
> pair UNSUPPRESS and CLEAR_FILTERS (or keep using REVIVE) to simplify their
> workflow. If we’d model the interface more along framework needs, there
> would be clear benefit which would help adoption.
>
> A more interesting call for me would be REQUEST_RESOURCES. It maps very
> well onto framework needs (e.g., “I want to launch a task requiring these
> resources”), and clearly communicates a requirement to the master so that
> it e.g., doesn’t need to remove all filters for a framework. It also seems
> to fit the allocator model pretty well which doesn’t explicitly expose
> filters. I believe implementing it should not be too hard if we'd restrict
> its semantics to only communicate to the master that a framework _is
> interested in a certain resource_ without promising that the framework
> _will get them in any amount of time_ (i.e., no need to rethink DRF
> fairness semantics in the hierarchical allocator). I also feel that if we
> have REQUEST_RESOURCES we would have some freedom to perform further
> improvements around filters in the master/allocator (e.g., filter
> compatification, work around increasing the default filter duration, …).
>
>
> A possible zeroth implementation for REQUEST_RESOURCES with the
> hierarchical allocator would be to have it remove any filters containing
> the requested resource and likely to unsuppress the framework. A
> REQUEST_RESOURCES call would hold an optional resource and an optional
> AgentID; the case where both are empty would map onto CLEAR_FILTERS.
>
>
> That being said, it might still be useful to in the future expose a
> low-level knob for framework allowing them to explicitly manage their
> filters.
>
>
> Cheers,
>
> Benjamin
>
>
> On Dec 4, 2018, at 5:44 AM, Meng Zhu  wrote:
> >
> > See my comments inline.
> >
> > On Mon, Dec 3, 2018 at 5:43 PM Vinod Kone  wrote:
> >
> >> Thanks Meng for the explanation.
> >>
> >> I imagine most frameworks do not remember what stuff they filtered much
> >> less figure out how previously filtered stuff  can satisfy new
> operations.
> >> That sounds complicated!
> >>
> >
> > Frameworks do not need to remember what filters they currently have. Only
> > knowing
> > the resource profiles of the current