Re: [gridengine users] Wrong scheduling behaviour with parallel jobs

Reuti Tue, 22 Feb 2011 06:36:51 -0800

Hi Andreas,

Am 22.02.2011 um 15:07 schrieb Andreas Haupt:


> we're observing a wrong scheduling behaviour in our parallel Gridengine
> farm. First, let me explain, how this system is used:
> 
> This batch system consists of 8 * 128 slots. Users submit parallel mpi
> jobs only. Those jobs always use a slot amount of 2^x (where: 3 <= x <=
> 7). So the "job width" is an amount between 8 and 128. All users use
> reservation (-R y) for their jobs. The job priority is only defined by
> the fair share.
> 
> Fair share works in the sense that the jobs with the highest priority
> are always on top in the waiting queue. Nevertheless, if enough "low
> width" jobs exist (width <= 64), the 128-slots jobs starve in the queue.
> Every time a 64-slot job finishes, the next one will be started,
> although the 128-slots job has higher priority.
> 
> Do you see a similar behaviour? Is it a misconfiguration? Anything I
> could do (apart from watching the queue regularly and schedule "by
> hand" ...)?
> 
> This system is running Gridengine 6.2u5.

do you request h_rt for the jobs? The default_duration in 6.2u5 in the 
scheduler configuration is inifinity and internaly SGE judges infinity being 
smaller than infinity - hence they slip in. Requesting h_rt and/or setting the 
default_duration to an arbitrary high value (so that it's used for the waiting 
128 slot job) might help.

I assume, that max_reservation is set in your system to a value unequal to zero.

-- Reuti


> Cheers & thanks,
> Andreas
> -- 
> | Andreas Haupt             | E-Mail: [email protected]
> |  DESY Zeuthen             | WWW:    http://www-zeuthen.desy.de/~ahaupt
> |  Platanenallee 6          | Phone:  +49/33762/7-7359
> |  D-15738 Zeuthen          | Fax:    +49/33762/7-7216
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Wrong scheduling behaviour with parallel jobs

Reply via email to