Hi, Am 09.05.2014 um 12:04 schrieb Roberto Nunnari:
> I now have a job in that situation (large parallel job starving because > smaller jobs keep getting slots). > > At present no more smaller jobs are being submitted and there's only one slot > missing and it should get free in a few hours. > > So I tried to add reservation to the job using 'qalter -R y iobid'. Fine. And you adjusted "max_reservations" and the "default_duration" too - what is it set to now? > qstat correctly shows 'reserve: y', but I do not see nowhere that there are > reserved slots. And how can I see reserved slots from command line? I only > see it from 'qmon - cluster queues', and it shows RES 0. This displays "advance reservations", not "resource reservations". It's necessary to have look at the output when setting in SGE's scheduler configuration: "params MONITOR=1" (`man sched_conf`). > The jobs asks 128 slots with h_rt that can only run on a queue that offers a > total of 128 slots, so it asks for all available slots on that queue. But I > have momentary disables one host (I need to reboot one host as soon as the > remaining running job will end this afternoon). > > May that be the cause why I cannot see any reservation? I mean because of the > disabled queue on 1 host there are anyways not enough slots available? Yes. You can check this by: $ qalter -w v <job_id> It might display "verification: no suitable queues." -- Reuti > Best regards, > Robi > > > > Il 08.05.2014 15:15, Reuti ha scritto: >> Hi, >> >> Am 08.05.2014 um 14:32 schrieb Roberto Nunnari: >> >>> Il 06.05.2014 21:58, Reuti ha scritto: >>>> Hi, >>>> >>>> Am 06.05.2014 um 18:45 schrieb Roberto Nunnari: >>>> >>>>> I'm running a small cluster using Oracle Grid Engine 6.2u7 >>>>> >>>>> At times it happens that one user submits a job that requires several >>>>> resources (-pe, -l mem_free, etc). >>>>> >>>>> For instance, user A submits a job X requiring 32 slots out of 100 >>>>> available. >>>>> The other users, keeps submitting serial jobs filling up all the slots >>>>> and always having more jobs waiting on the queue. >>>>> >>>>> The serial jobs will get ahead of job X, and be scheduled as soon as one >>>>> slot is available and job X will be waiting in the queue forever and >>>>> never get to run until no more serial jobs will be submitted and 32 slots >>>>> will be available. >>>>> >>>>> I would like the scheduler to also consider how much the job has been >>>>> waiting in the queue, and possibly also the values regarding the historic >>>>> users resources usage, as returned by qacct -o username >>>>> >>>>> What are the possible solutions to solve this problem? >>>> >>>> You can also look into Resource Reservation, so that the parallel job >>>> collects the necessary resources while waiting: >>> >>> That looks more promising.. I guess the user has to know that he has to use >>> the -R y flags, .right? >> >> Correct. Therefore the idea to use a JSV to attach it to parallel jobs only >> in your set up. Besides this, "-R y" can also be used to collect the >> required memory for a serial job as it could face the same issue while >> waiting for a huge bunch. >> >> >>> What happens if I set max_reservation to 32 and the user submits a job >>> (using -R y) requiring 64 slots? >> >> The max_reservation is not a limit for any requested resource, but the >> number of jobs which are considered to have reservations. You can also have >> a look at `man sched_conf` about this entry for more details. If you face >> this issue only intermittently you could start by setting it to a smaller >> value like 4 or so. >> >> -- Reuti >> > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
