Hi.

I now have a job in that situation (large parallel job starving because smaller jobs keep getting slots).

At present no more smaller jobs are being submitted and there's only one slot missing and it should get free in a few hours.

So I tried to add reservation to the job using 'qalter -R y iobid'.
qstat correctly shows 'reserve: y', but I do not see nowhere that there are reserved slots. And how can I see reserved slots from command line? I only see it from 'qmon - cluster queues', and it shows RES 0.

The jobs asks 128 slots with h_rt that can only run on a queue that offers a total of 128 slots, so it asks for all available slots on that queue. But I have momentary disables one host (I need to reboot one host as soon as the remaining running job will end this afternoon).

May that be the cause why I cannot see any reservation? I mean because of the disabled queue on 1 host there are anyways not enough slots available?

Best regards,
Robi



Il 08.05.2014 15:15, Reuti ha scritto:
Hi,

Am 08.05.2014 um 14:32 schrieb Roberto Nunnari:

Il 06.05.2014 21:58, Reuti ha scritto:
Hi,

Am 06.05.2014 um 18:45 schrieb Roberto Nunnari:

I'm running a small cluster using Oracle Grid Engine 6.2u7

At times it happens that one user submits a job that requires several resources 
(-pe, -l mem_free, etc).

For instance, user A submits a job X requiring 32 slots out of 100 available.
The other users, keeps submitting serial jobs filling up all the slots and 
always having more jobs waiting on the queue.

The serial jobs will get ahead of job X, and be scheduled as soon as one slot 
is available and job X will be waiting in the queue forever and never get to 
run until no more serial jobs will be submitted and 32 slots will be available.

I would like the scheduler to also consider how much the job has been waiting 
in the queue, and possibly also the values regarding the historic users 
resources usage, as returned by qacct -o username

What are the possible solutions to solve this problem?

You can also look into Resource Reservation, so that the parallel job collects 
the necessary resources while waiting:

That looks more promising.. I guess the user has to know that he has to use the 
-R y flags, .right?

Correct. Therefore the idea to use a JSV to attach it to parallel jobs only in your set 
up. Besides this, "-R y" can also be used to collect the required memory for a 
serial job as it could face the same issue while waiting for a huge bunch.


What happens if I set max_reservation to 32 and the user submits a job (using 
-R y) requiring 64 slots?

The max_reservation is not a limit for any requested resource, but the number 
of jobs which are considered to have reservations. You can also have a look at 
`man sched_conf` about this entry for more details. If you face this issue only 
intermittently you could start by setting it to a smaller value like 4 or so.

-- Reuti


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to