Brian Smith <[email protected]> writes:

> Hi, Dave,
>
> I'm mostly trying to verify the behavior of max_reservations as the
> clarity of the man page is a little lacking.

No great surprise...  I can try to clarify it for things I understand --
maybe after Reuti explains.  I don't know whether the design document
<http://arc.liv.ac.uk/repos/darcs/sge/doc/devel/rfe/resource_reservation.txt>
is any more use.  Anyway, experimentally the number of individual
resources reserved can be much bigger than max_reservations (which
counts jobs as far as I know without checking the code).

> I'm on 8.1.1 SoG, and if
> I set the number too low, I get starving, short >128 slot parallel
> jobs. If I set it too high, everything seems to get stalled.

Do you have a high throughput, frequent scheduling, or lots of waiting
jobs?  Anything else that might be unusual?  We don't usually have more
than a few 10s of jobs waiting (although often there are very large
arrays), and I've not seen particular problems with qmaster stalling.
Is there anything useful in its messages file, especially after changing
the log level to "info"?

Note that DURATION_OFFSET might be relevant, but it doesn't sound so in
this case.

> I've been looking at the schedule file (and now that you've pointed
> out qsched, I'll probably be checking that out as well), and I'm just
> having some trouble finding the best middle-ground for my environment.

I'm not sure what to suggest and hope someone else has advice.  By the
way, I doubt think there will be anything very different in this area
between that version and older ones.  I have seen problems with
reservations, but mainly with them apparently being lost and maybe
returning.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to