Brian Smith <[email protected]> writes: > Hi, Dave, > > I'm mostly trying to verify the behavior of max_reservations as the > clarity of the man page is a little lacking.
No great surprise... I can try to clarify it for things I understand -- maybe after Reuti explains. I don't know whether the design document <http://arc.liv.ac.uk/repos/darcs/sge/doc/devel/rfe/resource_reservation.txt> is any more use. Anyway, experimentally the number of individual resources reserved can be much bigger than max_reservations (which counts jobs as far as I know without checking the code). > I'm on 8.1.1 SoG, and if > I set the number too low, I get starving, short >128 slot parallel > jobs. If I set it too high, everything seems to get stalled. Do you have a high throughput, frequent scheduling, or lots of waiting jobs? Anything else that might be unusual? We don't usually have more than a few 10s of jobs waiting (although often there are very large arrays), and I've not seen particular problems with qmaster stalling. Is there anything useful in its messages file, especially after changing the log level to "info"? Note that DURATION_OFFSET might be relevant, but it doesn't sound so in this case. > I've been looking at the schedule file (and now that you've pointed > out qsched, I'll probably be checking that out as well), and I'm just > having some trouble finding the best middle-ground for my environment. I'm not sure what to suggest and hope someone else has advice. By the way, I doubt think there will be anything very different in this area between that version and older ones. I have seen problems with reservations, but mainly with them apparently being lost and maybe returning. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
