Jake Carroll <[email protected]> writes:

> That could be interesting. We've noticed that whenever we involve our
> MPI's and PE's in this mix, things do become far more complicated.
> Examples include watching Grid engine bounce around trying to secure
> enough nodes that are "clear"

Yes, it's more difficult to schedule parallel jobs of varying sizes
along with serial ones, but does "bounce around" mean observing some
instability in the scheduler?

> or have the space (slots) and memory
> allocation available for a PE to "sit" nicely across several of them,
> knowing full well there is enough resources to do it, with the user being
> very courteous about specifying sane complex values, but ultimately
> watching the PE "Waiting for resources".

If it doesn't schedule a job when resources are really available, it
would be useful to have a bug report, but it's often difficult to figure
out the scheduling, particularly as you don't get diagnostics about
resource reservation.

> Indeed. Not that I'm ungrateful for a bit of software that is free and has
> done so much good for us, for so long - but it is frustrating at times. I
> guess this is just one of those complex problems to solve, that may not
> have a sane solving rule (in the most simplistic computer science
> computational complexity sense).

Well the scheduling problem is NP complete but, more importantly,
running a typical HPC system with conflicting requirements is a
non-trivial management problem, and I don't think it's realistic to
expect just to throw a scheduling algorithm at it.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to