On Tue, 22 Feb 2011, Chris Jewell wrote:
Dave Love <d.love at liverpool.ac.uk> writes:
Yes <https://arc.liv.ac.uk/trac/SGE/ticket/1280>. I have a suspicion
that restarting the qmaster has helped sometimes, but I'm not sure about
that. I'd be interested if anyone has any more information/suggestions.
For the record, I've just found that restarting qmaster sorts the issue.
Chris
Really?
I've found that restarting the qmaster makes things worse (with 6.2u5):
after a qmaster restart, anything alrady submitted does not reserve any
resources... unless they are qalter'd to something very slightly
different (at least with our config).
e.g. from h_vmem=1G to h_vmem=1024M
Things get even more bizzare: if you later qalter it back to how it was,
the job doesn't reserve resources anymore. I really should have submitted
this as a bug by now.
For the record, we have observed resource reservation working well for
jobs of any size... as long as you avoid certain situations. Advance
reservations seem to screw things up, for example.
It's probably not your problem but, looking at your "qconf -ssconf", I
would suggest adding a "DURATION_OFFSET=300" to your "params" section: if
you're using tight integration of your parallel jobs, you've probably
noticed that jobs linger in the queue for around 5 minutes after they've
finished. This setting will help the system anticipate that.
Mark
PS In case you're not already doing it:
To check what the scheduler is thinking, "qconf -msconf" and add MONITOR=1
to the comma-separated list of values for "params". Look for "RESERVING"
lines in file $SGE_ROOT/$SGE_CELL/common/schedule.
--
-----------------------------------------------------------------
Mark Dixon Email : [email protected]
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users