Re: [gridengine users] Increasing global utilisation, the marketing talk and the truth?

William Hay Fri, 17 Aug 2012 01:45:44 -0700

On 16 August 2012 22:28, Reuti <[email protected]> wrote:
> Hi,
>
> Am 16.08.2012 um 22:51 schrieb Jake Carroll:
>
>> I'm currently assessing different job scheduling technologies for a sizeable 
>> compute/HPC project I'm working on.
>>
>> One of the things various vendors seem to always throw out there as a "value 
>> add" in their respective scheduler is their ability to "drive up 
>> utilisation" of the HPC cluster environment with some kind of advanced 
>> scheduling mechanisms. Pretty much all the big guys seem to bang on about 
>> this kind of thing. Moab talk it up, Platform LSM talk about it and say it's 
>> something quite special. I don't hear Altair/PBS Pro say much about it, nor 
>> do I hear it really made reference to in the OGE/SGE circles however.
>>
>> So – I guess what I'm after is some reality. Are there some kind of highly 
>> engineered/premium bits of proprietary code in what companies/schedulers 
>> like Moab and Platform LSF (IBM) offer that can't be achieved in the SGE/OGE 
>> "free" products?
>>
>> The general intention is that you are always running your HPC environment at 
>> full tilt, such that you aren't left with compute nodes being 
>> underutilised/if the HPC environment is idle or under low load, it gives the 
>> users who do need it maximum ability to maximise their compute performance, 
>> but if it's busy, it will scale back appropriately (almost dynamically) such 
>> that SLA's are adhered to.
>>
>> I heard the words "Goal driven SLA sensitive workload scheduling". I thought 
>> that sounded like some lovely marketing speak, but I will try not to be 
>> cynical about it.
>
> I have no insight into LSF capabilities, but to me it looks like all will 
> schedule the jobs to some policy "who is the next one" and besides 
> backfilling it will always generate idle cores at some point: either because 
> all memory is already used up (no no small jobs are left in the waiting 
> queue) or because you have to reserve cores/memory for a later parallel/big 
> job.
>
> I'm not aware that any of them have some kind of linear optimization to 
> handle a cut-off problem: I have a bundle of jobs with a runtime and resource 
> requirement I know. Task: rearrange them in such a way, that all finish in 
> the least overall amount of time. Such a scheduler would have already some 
> real-time behavior, as it could make a forecast when your job will end latest 
> and guarantee this.
>
That sounds rather like a couple of well known problems which are
known to be NP-hard/NP-complete.  So we can be reasonable sure that
there are no schedulers that do more than approximate
a solution to said problem (unless they specify a more powerful
computer/cluster that the one they are scheduling for as a
pre-requisite).


http://en.wikipedia.org/wiki/Multiprocessor_scheduling
http://en.wikipedia.org/wiki/Job_shop_scheduling

William

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Increasing global utilisation, the marketing talk and the truth?

Reply via email to