Yes, alternative allocator module will be great in terms of implementation,
but adding more capabilities to "filters" might be required to convey some
more info to the Mesos scheduler/allocator. Am I correct here or are there
already ways to convey such info ?

Thanks,
Dharmesh

On Tue, Jun 30, 2015 at 7:15 PM, Alex Rukletsov <[email protected]> wrote:

> One option is to implement alternative behaviour in an allocator module.
>
> On Tue, Jun 30, 2015 at 3:34 PM, Dharmesh Kakadia <[email protected]>
> wrote:
>
>> Interesting.
>>
>> I agree, that dynamic reservation and optimistic offers will help
>> mitigate the issue, but the resource fragmentation (and starvation due to
>> that) is a more general problem. Predictive models can certainly aid the
>> Mesos scheduler here. I think the filters in Mesos can be extended to add
>> more general preferences like the offer size, execution/predictive model
>> etc. For the Mesos scheduler, the user should be able to configure what all
>> filters it recognizes while making offers, which will also make the effect
>> on scalability limited,as far as I understand. Thoughts?
>>
>> Thanks,
>> Dharmesh
>>
>>
>>
>> On Sun, Jun 28, 2015 at 7:29 PM, Alex Rukletsov <[email protected]>
>> wrote:
>>
>>> Sharma,
>>>
>>> that's exactly what we plan to add to Mesos. Dynamic reservations will
>>> land in 0.23, the next step is to optimistically offer reserved but yet
>>> unused resources (we call them optimistic offers) to other framework as
>>> revocable. The alternative with one framework will of course work, but this
>>> implies having a general-purpose framework, that does some work that is
>>> better done by Mesos (which has more information and therefore can take
>>> better decisions).
>>>
>>> On Wed, Jun 24, 2015 at 11:54 PM, Sharma Podila <[email protected]>
>>> wrote:
>>>
>>>> In a previous (more HPC like) system I worked on, the scheduler did
>>>> "advance reservation" of resources, claiming bits and pieces it got and
>>>> holding on until all were available. Say the last bit is expected to come
>>>> in about 1 hour from now (and this needs job runtime estimation/knowledge),
>>>> any short jobs are "back filled" on to the advance reserved resources that
>>>> are sitting idle for an hour, to improve utilization. This was combined
>>>> with weights and priority based job preemptions, sometimes 1GB jobs are
>>>> higher priority than the 1GB job. Unfortunately, that technique doesn't
>>>> lend itself natively onto Mesos based scheduling.
>>>>
>>>> One idea that may work in Mesos is (thinking aloud):
>>>>
>>>> - The large (20GB) framework reserves 20 GB on some number of slaves (I
>>>> am referring to dynamic reservations here, which aren't available yet)
>>>> - The small framework continues to use up 1GB offers.
>>>> - When the large framework needs to run a job, it will have the 20 GB
>>>> offers since it has the reservation.
>>>> - When the large framework does not have any jobs running on it, the
>>>> small framework may be given those resources, but, those jobs will have to
>>>> be preempted in order to offer 20 GB to the large framework.
>>>>
>>>> I understand this idea has some forward looking expectations on how
>>>> dynamic reservations would/could work. Caveat: I haven't involved myself
>>>> closely with that feature definition, so could be wrong with my
>>>> expectations.
>>>>
>>>> Until something like that lands, the existing static reservations, of
>>>> course, should work. But, that reduces utilization drastically if the large
>>>> framework runs jobs sporadically.
>>>>
>>>> Another idea is to have one framework schedule both the 20GB jobs and
>>>> 1GB jobs. Within the framework, it can bin pack the 1GB jobs on to as small
>>>> a number of slaves as possible. This increases the likelihood of finding
>>>> 20GB on a slave. Combining that with preemptions from within the framework
>>>> (a simple kill of certain number of 1GB jobs) should satisfy the 20 GB 
>>>> jobs.
>>>>
>>>>
>>>>
>>>> On Wed, Jun 24, 2015 at 9:26 AM, Tim St Clair <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>> > From: "Brian Candler" <[email protected]>
>>>>> > To: [email protected]
>>>>> > Sent: Wednesday, June 24, 2015 10:50:43 AM
>>>>> > Subject: Re: Setting minimum offer size
>>>>> >
>>>>> > On 24/06/2015 16:31, Alex Gaudio wrote:
>>>>> > > Does anyone have other ideas?
>>>>> > HTCondor deals with this by having a "defrag" demon, which
>>>>> periodically
>>>>> > stops hosts accepting small jobs, so that it can coalesce small slots
>>>>> > into larger ones.
>>>>> >
>>>>> >
>>>>> http://research.cs.wisc.edu/htcondor/manual/latest/3_5Policy_Configuration.html#sec:SMP-defrag
>>>>> >
>>>>>
>>>>> Yuppers, and guess who helped work on it ;-)
>>>>>
>>>>> > You can configure policies based on how many drained machines are
>>>>> > already available, and how many can be draining at once.
>>>>> >
>>>>>
>>>>> It had to be done this way, as there was only so much sophistication
>>>>> you can put into scheduling before you start to add latency.
>>>>>
>>>>> > Maybe there would be a benefit if Mesos could work out what is the
>>>>> > largest job any framework has waiting to run, so it knows whether
>>>>> > draining is required and how far to drain down.  This might take the
>>>>> > form of a message to the framework: "suppose I offered you all the
>>>>> > resources on the cluster, what is the largest single job you would
>>>>> want
>>>>> > to run, and which machine(s) could it run on?"  Or something like
>>>>> that.
>>>>> >
>>>>> > Regards,
>>>>> >
>>>>> > Brian.
>>>>> >
>>>>> >
>>>>>
>>>>> --
>>>>> Cheers,
>>>>> Timothy St. Clair
>>>>> Red Hat Inc.
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to