Jake Carroll <[email protected]> writes: > That could be interesting. We've noticed that whenever we involve our > MPI's and PE's in this mix, things do become far more complicated. > Examples include watching Grid engine bounce around trying to secure > enough nodes that are "clear"
Yes, it's more difficult to schedule parallel jobs of varying sizes along with serial ones, but does "bounce around" mean observing some instability in the scheduler? > or have the space (slots) and memory > allocation available for a PE to "sit" nicely across several of them, > knowing full well there is enough resources to do it, with the user being > very courteous about specifying sane complex values, but ultimately > watching the PE "Waiting for resources". If it doesn't schedule a job when resources are really available, it would be useful to have a bug report, but it's often difficult to figure out the scheduling, particularly as you don't get diagnostics about resource reservation. > Indeed. Not that I'm ungrateful for a bit of software that is free and has > done so much good for us, for so long - but it is frustrating at times. I > guess this is just one of those complex problems to solve, that may not > have a sane solving rule (in the most simplistic computer science > computational complexity sense). Well the scheduling problem is NP complete but, more importantly, running a typical HPC system with conflicting requirements is a non-trivial management problem, and I don't think it's realistic to expect just to throw a scheduling algorithm at it. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
