Sharma,

that's exactly what we plan to add to Mesos. Dynamic reservations will land
in 0.23, the next step is to optimistically offer reserved but yet unused
resources (we call them optimistic offers) to other framework as revocable.
The alternative with one framework will of course work, but this implies
having a general-purpose framework, that does some work that is better done
by Mesos (which has more information and therefore can take better
decisions).

On Wed, Jun 24, 2015 at 11:54 PM, Sharma Podila <[email protected]> wrote:

> In a previous (more HPC like) system I worked on, the scheduler did
> "advance reservation" of resources, claiming bits and pieces it got and
> holding on until all were available. Say the last bit is expected to come
> in about 1 hour from now (and this needs job runtime estimation/knowledge),
> any short jobs are "back filled" on to the advance reserved resources that
> are sitting idle for an hour, to improve utilization. This was combined
> with weights and priority based job preemptions, sometimes 1GB jobs are
> higher priority than the 1GB job. Unfortunately, that technique doesn't
> lend itself natively onto Mesos based scheduling.
>
> One idea that may work in Mesos is (thinking aloud):
>
> - The large (20GB) framework reserves 20 GB on some number of slaves (I am
> referring to dynamic reservations here, which aren't available yet)
> - The small framework continues to use up 1GB offers.
> - When the large framework needs to run a job, it will have the 20 GB
> offers since it has the reservation.
> - When the large framework does not have any jobs running on it, the small
> framework may be given those resources, but, those jobs will have to be
> preempted in order to offer 20 GB to the large framework.
>
> I understand this idea has some forward looking expectations on how
> dynamic reservations would/could work. Caveat: I haven't involved myself
> closely with that feature definition, so could be wrong with my
> expectations.
>
> Until something like that lands, the existing static reservations, of
> course, should work. But, that reduces utilization drastically if the large
> framework runs jobs sporadically.
>
> Another idea is to have one framework schedule both the 20GB jobs and 1GB
> jobs. Within the framework, it can bin pack the 1GB jobs on to as small a
> number of slaves as possible. This increases the likelihood of finding 20GB
> on a slave. Combining that with preemptions from within the framework (a
> simple kill of certain number of 1GB jobs) should satisfy the 20 GB jobs.
>
>
>
> On Wed, Jun 24, 2015 at 9:26 AM, Tim St Clair <[email protected]> wrote:
>
>>
>>
>> ----- Original Message -----
>> > From: "Brian Candler" <[email protected]>
>> > To: [email protected]
>> > Sent: Wednesday, June 24, 2015 10:50:43 AM
>> > Subject: Re: Setting minimum offer size
>> >
>> > On 24/06/2015 16:31, Alex Gaudio wrote:
>> > > Does anyone have other ideas?
>> > HTCondor deals with this by having a "defrag" demon, which periodically
>> > stops hosts accepting small jobs, so that it can coalesce small slots
>> > into larger ones.
>> >
>> >
>> http://research.cs.wisc.edu/htcondor/manual/latest/3_5Policy_Configuration.html#sec:SMP-defrag
>> >
>>
>> Yuppers, and guess who helped work on it ;-)
>>
>> > You can configure policies based on how many drained machines are
>> > already available, and how many can be draining at once.
>> >
>>
>> It had to be done this way, as there was only so much sophistication you
>> can put into scheduling before you start to add latency.
>>
>> > Maybe there would be a benefit if Mesos could work out what is the
>> > largest job any framework has waiting to run, so it knows whether
>> > draining is required and how far to drain down.  This might take the
>> > form of a message to the framework: "suppose I offered you all the
>> > resources on the cluster, what is the largest single job you would want
>> > to run, and which machine(s) could it run on?"  Or something like that.
>> >
>> > Regards,
>> >
>> > Brian.
>> >
>> >
>>
>> --
>> Cheers,
>> Timothy St. Clair
>> Red Hat Inc.
>>
>
>

Reply via email to