Hi Andreas, Am 22.02.2011 um 15:07 schrieb Andreas Haupt:
> we're observing a wrong scheduling behaviour in our parallel Gridengine > farm. First, let me explain, how this system is used: > > This batch system consists of 8 * 128 slots. Users submit parallel mpi > jobs only. Those jobs always use a slot amount of 2^x (where: 3 <= x <= > 7). So the "job width" is an amount between 8 and 128. All users use > reservation (-R y) for their jobs. The job priority is only defined by > the fair share. > > Fair share works in the sense that the jobs with the highest priority > are always on top in the waiting queue. Nevertheless, if enough "low > width" jobs exist (width <= 64), the 128-slots jobs starve in the queue. > Every time a 64-slot job finishes, the next one will be started, > although the 128-slots job has higher priority. > > Do you see a similar behaviour? Is it a misconfiguration? Anything I > could do (apart from watching the queue regularly and schedule "by > hand" ...)? > > This system is running Gridengine 6.2u5. do you request h_rt for the jobs? The default_duration in 6.2u5 in the scheduler configuration is inifinity and internaly SGE judges infinity being smaller than infinity - hence they slip in. Requesting h_rt and/or setting the default_duration to an arbitrary high value (so that it's used for the waiting 128 slot job) might help. I assume, that max_reservation is set in your system to a value unequal to zero. -- Reuti > Cheers & thanks, > Andreas > -- > | Andreas Haupt | E-Mail: [email protected] > | DESY Zeuthen | WWW: http://www-zeuthen.desy.de/~ahaupt > | Platanenallee 6 | Phone: +49/33762/7-7359 > | D-15738 Zeuthen | Fax: +49/33762/7-7216 > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
