Hi,

please don't crossposted. I think we have the gridengine.org as a place to 
discuss common setups.

Am 05.04.2011 um 11:56 schrieb William Hay:

> We're planning an outage on our cluster for the 12th of this month.
> I've added reservations for each of the subclusters  to ensure that
> nothing is running at that time.   The command I use is something like
> qrsub -l mem=4G,job=true -a 04120800 -d 24:0:0 -pe '*-j' 256 where mem
> is a consumable resource used to control memory usage and job is an
> exclusive resource associated with each host and the pe varies
> depending on which subcluster I'm reserving.

Can't the mem/job be disregarded here? I mean: just request a reservation for 
all slots and you are done.


> The reservations appear to be fine themselves but checking  the
> schedule file it appears that queued jobs now make  reservations after
> the outage even though they have plenty of time to run before it (I'm
> making the reservations this early because we have a few people
> submitting 7 day jobs).

They are requesting also 7 days, or is this the estimated default duration 
setting in the scheduler configuration?


> If I restart the scheduler then the jobs start reserving slots prior
> to the outage but the queues acquire a qtype of N according to qstat
> -f and jobs don't actually start in them.  I can change the qtype in
> qstat -f to B by using qconf to change the qtype attribute of each
> queue to batch (which it already is according to qconf -sq).

Can you tell us more about your setup? You have different queues, i.e. some 
only being batch and some only for parallel jobs?

--Reuti


> I can change the qtype to BP in qstat -f  by modifying pe_list on each
> queue but it won't let me do this with a reservation in place  (even
> though I'm just repeating what is already there).  If I delete the
> reservation,modify the pe_list and recreate the reservation then I'm
> back to my original problem
> 
> The upshot of this is that the cluster is now dominated by low
> priority small jobs while the high priority parallel jobs are making
> reservations after the outage.
> 
> Also after a scheduler restart it takes a while for existing jobs to
> start making reservations.  For a few hours thereafter only jobs
> submitted after the restart make reservations.
> 
> Running SGE 6.2u3 at the moment.  Is an upgrade likely to fix this?
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to