Hi, please don't crossposted. I think we have the gridengine.org as a place to discuss common setups.
Am 05.04.2011 um 11:56 schrieb William Hay: > We're planning an outage on our cluster for the 12th of this month. > I've added reservations for each of the subclusters to ensure that > nothing is running at that time. The command I use is something like > qrsub -l mem=4G,job=true -a 04120800 -d 24:0:0 -pe '*-j' 256 where mem > is a consumable resource used to control memory usage and job is an > exclusive resource associated with each host and the pe varies > depending on which subcluster I'm reserving. Can't the mem/job be disregarded here? I mean: just request a reservation for all slots and you are done. > The reservations appear to be fine themselves but checking the > schedule file it appears that queued jobs now make reservations after > the outage even though they have plenty of time to run before it (I'm > making the reservations this early because we have a few people > submitting 7 day jobs). They are requesting also 7 days, or is this the estimated default duration setting in the scheduler configuration? > If I restart the scheduler then the jobs start reserving slots prior > to the outage but the queues acquire a qtype of N according to qstat > -f and jobs don't actually start in them. I can change the qtype in > qstat -f to B by using qconf to change the qtype attribute of each > queue to batch (which it already is according to qconf -sq). Can you tell us more about your setup? You have different queues, i.e. some only being batch and some only for parallel jobs? --Reuti > I can change the qtype to BP in qstat -f by modifying pe_list on each > queue but it won't let me do this with a reservation in place (even > though I'm just repeating what is already there). If I delete the > reservation,modify the pe_list and recreate the reservation then I'm > back to my original problem > > The upshot of this is that the cluster is now dominated by low > priority small jobs while the high priority parallel jobs are making > reservations after the outage. > > Also after a scheduler restart it takes a while for existing jobs to > start making reservations. For a few hours thereafter only jobs > submitted after the restart make reservations. > > Running SGE 6.2u3 at the moment. Is an upgrade likely to fix this? > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
