On 16 August 2012 22:28, Reuti <[email protected]> wrote: > Hi, > > Am 16.08.2012 um 22:51 schrieb Jake Carroll: > >> I'm currently assessing different job scheduling technologies for a sizeable >> compute/HPC project I'm working on. >> >> One of the things various vendors seem to always throw out there as a "value >> add" in their respective scheduler is their ability to "drive up >> utilisation" of the HPC cluster environment with some kind of advanced >> scheduling mechanisms. Pretty much all the big guys seem to bang on about >> this kind of thing. Moab talk it up, Platform LSM talk about it and say it's >> something quite special. I don't hear Altair/PBS Pro say much about it, nor >> do I hear it really made reference to in the OGE/SGE circles however. >> >> So – I guess what I'm after is some reality. Are there some kind of highly >> engineered/premium bits of proprietary code in what companies/schedulers >> like Moab and Platform LSF (IBM) offer that can't be achieved in the SGE/OGE >> "free" products? >> >> The general intention is that you are always running your HPC environment at >> full tilt, such that you aren't left with compute nodes being >> underutilised/if the HPC environment is idle or under low load, it gives the >> users who do need it maximum ability to maximise their compute performance, >> but if it's busy, it will scale back appropriately (almost dynamically) such >> that SLA's are adhered to. >> >> I heard the words "Goal driven SLA sensitive workload scheduling". I thought >> that sounded like some lovely marketing speak, but I will try not to be >> cynical about it. > > I have no insight into LSF capabilities, but to me it looks like all will > schedule the jobs to some policy "who is the next one" and besides > backfilling it will always generate idle cores at some point: either because > all memory is already used up (no no small jobs are left in the waiting > queue) or because you have to reserve cores/memory for a later parallel/big > job. > > I'm not aware that any of them have some kind of linear optimization to > handle a cut-off problem: I have a bundle of jobs with a runtime and resource > requirement I know. Task: rearrange them in such a way, that all finish in > the least overall amount of time. Such a scheduler would have already some > real-time behavior, as it could make a forecast when your job will end latest > and guarantee this. > That sounds rather like a couple of well known problems which are known to be NP-hard/NP-complete. So we can be reasonable sure that there are no schedulers that do more than approximate a solution to said problem (unless they specify a more powerful computer/cluster that the one they are scheduling for as a pre-requisite).
http://en.wikipedia.org/wiki/Multiprocessor_scheduling http://en.wikipedia.org/wiki/Job_shop_scheduling William _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
