One option is to implement alternative behaviour in an allocator module. On Tue, Jun 30, 2015 at 3:34 PM, Dharmesh Kakadia <dhkaka...@gmail.com> wrote:
> Interesting. > > I agree, that dynamic reservation and optimistic offers will help mitigate > the issue, but the resource fragmentation (and starvation due to that) is a > more general problem. Predictive models can certainly aid the Mesos > scheduler here. I think the filters in Mesos can be extended to add more > general preferences like the offer size, execution/predictive model etc. > For the Mesos scheduler, the user should be able to configure what all > filters it recognizes while making offers, which will also make the effect > on scalability limited,as far as I understand. Thoughts? > > Thanks, > Dharmesh > > > > On Sun, Jun 28, 2015 at 7:29 PM, Alex Rukletsov <a...@mesosphere.com> > wrote: > >> Sharma, >> >> that's exactly what we plan to add to Mesos. Dynamic reservations will >> land in 0.23, the next step is to optimistically offer reserved but yet >> unused resources (we call them optimistic offers) to other framework as >> revocable. The alternative with one framework will of course work, but this >> implies having a general-purpose framework, that does some work that is >> better done by Mesos (which has more information and therefore can take >> better decisions). >> >> On Wed, Jun 24, 2015 at 11:54 PM, Sharma Podila <spod...@netflix.com> >> wrote: >> >>> In a previous (more HPC like) system I worked on, the scheduler did >>> "advance reservation" of resources, claiming bits and pieces it got and >>> holding on until all were available. Say the last bit is expected to come >>> in about 1 hour from now (and this needs job runtime estimation/knowledge), >>> any short jobs are "back filled" on to the advance reserved resources that >>> are sitting idle for an hour, to improve utilization. This was combined >>> with weights and priority based job preemptions, sometimes 1GB jobs are >>> higher priority than the 1GB job. Unfortunately, that technique doesn't >>> lend itself natively onto Mesos based scheduling. >>> >>> One idea that may work in Mesos is (thinking aloud): >>> >>> - The large (20GB) framework reserves 20 GB on some number of slaves (I >>> am referring to dynamic reservations here, which aren't available yet) >>> - The small framework continues to use up 1GB offers. >>> - When the large framework needs to run a job, it will have the 20 GB >>> offers since it has the reservation. >>> - When the large framework does not have any jobs running on it, the >>> small framework may be given those resources, but, those jobs will have to >>> be preempted in order to offer 20 GB to the large framework. >>> >>> I understand this idea has some forward looking expectations on how >>> dynamic reservations would/could work. Caveat: I haven't involved myself >>> closely with that feature definition, so could be wrong with my >>> expectations. >>> >>> Until something like that lands, the existing static reservations, of >>> course, should work. But, that reduces utilization drastically if the large >>> framework runs jobs sporadically. >>> >>> Another idea is to have one framework schedule both the 20GB jobs and >>> 1GB jobs. Within the framework, it can bin pack the 1GB jobs on to as small >>> a number of slaves as possible. This increases the likelihood of finding >>> 20GB on a slave. Combining that with preemptions from within the framework >>> (a simple kill of certain number of 1GB jobs) should satisfy the 20 GB jobs. >>> >>> >>> >>> On Wed, Jun 24, 2015 at 9:26 AM, Tim St Clair <tstcl...@redhat.com> >>> wrote: >>> >>>> >>>> >>>> ----- Original Message ----- >>>> > From: "Brian Candler" <b.cand...@pobox.com> >>>> > To: user@mesos.apache.org >>>> > Sent: Wednesday, June 24, 2015 10:50:43 AM >>>> > Subject: Re: Setting minimum offer size >>>> > >>>> > On 24/06/2015 16:31, Alex Gaudio wrote: >>>> > > Does anyone have other ideas? >>>> > HTCondor deals with this by having a "defrag" demon, which >>>> periodically >>>> > stops hosts accepting small jobs, so that it can coalesce small slots >>>> > into larger ones. >>>> > >>>> > >>>> http://research.cs.wisc.edu/htcondor/manual/latest/3_5Policy_Configuration.html#sec:SMP-defrag >>>> > >>>> >>>> Yuppers, and guess who helped work on it ;-) >>>> >>>> > You can configure policies based on how many drained machines are >>>> > already available, and how many can be draining at once. >>>> > >>>> >>>> It had to be done this way, as there was only so much sophistication >>>> you can put into scheduling before you start to add latency. >>>> >>>> > Maybe there would be a benefit if Mesos could work out what is the >>>> > largest job any framework has waiting to run, so it knows whether >>>> > draining is required and how far to drain down. This might take the >>>> > form of a message to the framework: "suppose I offered you all the >>>> > resources on the cluster, what is the largest single job you would >>>> want >>>> > to run, and which machine(s) could it run on?" Or something like >>>> that. >>>> > >>>> > Regards, >>>> > >>>> > Brian. >>>> > >>>> > >>>> >>>> -- >>>> Cheers, >>>> Timothy St. Clair >>>> Red Hat Inc. >>>> >>> >>> >> >