On Tue, 22 Feb 2011, Chris Jewell wrote:

Dave Love <d.love at liverpool.ac.uk> writes:

Yes <https://arc.liv.ac.uk/trac/SGE/ticket/1280>.  I have a suspicion
that restarting the qmaster has helped sometimes, but I'm not sure about
that.  I'd be interested if anyone has any more information/suggestions.

For the record, I've just found that restarting qmaster sorts the issue.

Chris

Really?

I've found that restarting the qmaster makes things worse (with 6.2u5): after a qmaster restart, anything alrady submitted does not reserve any resources... unless they are qalter'd to something very slightly different (at least with our config).

e.g. from h_vmem=1G to h_vmem=1024M

Things get even more bizzare: if you later qalter it back to how it was, the job doesn't reserve resources anymore. I really should have submitted this as a bug by now.

For the record, we have observed resource reservation working well for jobs of any size... as long as you avoid certain situations. Advance reservations seem to screw things up, for example.

It's probably not your problem but, looking at your "qconf -ssconf", I would suggest adding a "DURATION_OFFSET=300" to your "params" section: if you're using tight integration of your parallel jobs, you've probably noticed that jobs linger in the queue for around 5 minutes after they've finished. This setting will help the system anticipate that.

Mark

PS In case you're not already doing it:

To check what the scheduler is thinking, "qconf -msconf" and add MONITOR=1 to the comma-separated list of values for "params". Look for "RESERVING" lines in file $SGE_ROOT/$SGE_CELL/common/schedule.

--
-----------------------------------------------------------------
Mark Dixon                       Email    : [email protected]
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to