Hi.
I now have a job in that situation (large parallel job starving because
smaller jobs keep getting slots).
At present no more smaller jobs are being submitted and there's only one
slot missing and it should get free in a few hours.
So I tried to add reservation to the job using 'qalter -R y iobid'.
qstat correctly shows 'reserve: y', but I do not see nowhere that there
are reserved slots. And how can I see reserved slots from command line?
I only see it from 'qmon - cluster queues', and it shows RES 0.
The jobs asks 128 slots with h_rt that can only run on a queue that
offers a total of 128 slots, so it asks for all available slots on that
queue. But I have momentary disables one host (I need to reboot one host
as soon as the remaining running job will end this afternoon).
May that be the cause why I cannot see any reservation? I mean because
of the disabled queue on 1 host there are anyways not enough slots
available?
Best regards,
Robi
Il 08.05.2014 15:15, Reuti ha scritto:
Hi,
Am 08.05.2014 um 14:32 schrieb Roberto Nunnari:
Il 06.05.2014 21:58, Reuti ha scritto:
Hi,
Am 06.05.2014 um 18:45 schrieb Roberto Nunnari:
I'm running a small cluster using Oracle Grid Engine 6.2u7
At times it happens that one user submits a job that requires several resources
(-pe, -l mem_free, etc).
For instance, user A submits a job X requiring 32 slots out of 100 available.
The other users, keeps submitting serial jobs filling up all the slots and
always having more jobs waiting on the queue.
The serial jobs will get ahead of job X, and be scheduled as soon as one slot
is available and job X will be waiting in the queue forever and never get to
run until no more serial jobs will be submitted and 32 slots will be available.
I would like the scheduler to also consider how much the job has been waiting
in the queue, and possibly also the values regarding the historic users
resources usage, as returned by qacct -o username
What are the possible solutions to solve this problem?
You can also look into Resource Reservation, so that the parallel job collects
the necessary resources while waiting:
That looks more promising.. I guess the user has to know that he has to use the
-R y flags, .right?
Correct. Therefore the idea to use a JSV to attach it to parallel jobs only in your set
up. Besides this, "-R y" can also be used to collect the required memory for a
serial job as it could face the same issue while waiting for a huge bunch.
What happens if I set max_reservation to 32 and the user submits a job (using
-R y) requiring 64 slots?
The max_reservation is not a limit for any requested resource, but the number
of jobs which are considered to have reservations. You can also have a look at
`man sched_conf` about this entry for more details. If you face this issue only
intermittently you could start by setting it to a smaller value like 4 or so.
-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users