Hi,

Am 09.05.2014 um 12:04 schrieb Roberto Nunnari:

> I now have a job in that situation (large parallel job starving because 
> smaller jobs keep getting slots).
> 
> At present no more smaller jobs are being submitted and there's only one slot 
> missing and it should get free in a few hours.
> 
> So I tried to add reservation to the job using 'qalter -R y iobid'.

Fine. And you adjusted "max_reservations" and the "default_duration" too - what 
is it set to now?


> qstat correctly shows 'reserve: y', but I do not see nowhere that there are 
> reserved slots. And how can I see reserved slots from command line? I only 
> see it from 'qmon - cluster queues', and it shows RES 0.

This displays "advance reservations", not "resource reservations".

It's necessary to have look at the output when setting in SGE's scheduler 
configuration: "params MONITOR=1" (`man sched_conf`).


> The jobs asks 128 slots with h_rt that can only run on a queue that offers a 
> total of 128 slots, so it asks for all available slots on that queue. But I 
> have momentary disables one host (I need to reboot one host as soon as the 
> remaining running job will end this afternoon).
> 
> May that be the cause why I cannot see any reservation? I mean because of the 
> disabled queue on 1 host there are anyways not enough slots available?

Yes. You can check this by:

$ qalter -w v <job_id>

It might display "verification: no suitable queues."

-- Reuti


> Best regards,
> Robi
> 
> 
> 
> Il 08.05.2014 15:15, Reuti ha scritto:
>> Hi,
>> 
>> Am 08.05.2014 um 14:32 schrieb Roberto Nunnari:
>> 
>>> Il 06.05.2014 21:58, Reuti ha scritto:
>>>> Hi,
>>>> 
>>>> Am 06.05.2014 um 18:45 schrieb Roberto Nunnari:
>>>> 
>>>>> I'm running a small cluster using Oracle Grid Engine 6.2u7
>>>>> 
>>>>> At times it happens that one user submits a job that requires several 
>>>>> resources (-pe, -l mem_free, etc).
>>>>> 
>>>>> For instance, user A submits a job X requiring 32 slots out of 100 
>>>>> available.
>>>>> The other users, keeps submitting serial jobs filling up all the slots 
>>>>> and always having more jobs waiting on the queue.
>>>>> 
>>>>> The serial jobs will get ahead of job X, and be scheduled as soon as one 
>>>>> slot is available and job X will be waiting in the queue forever and 
>>>>> never get to run until no more serial jobs will be submitted and 32 slots 
>>>>> will be available.
>>>>> 
>>>>> I would like the scheduler to also consider how much the job has been 
>>>>> waiting in the queue, and possibly also the values regarding the historic 
>>>>> users resources usage, as returned by qacct -o username
>>>>> 
>>>>> What are the possible solutions to solve this problem?
>>>> 
>>>> You can also look into Resource Reservation, so that the parallel job 
>>>> collects the necessary resources while waiting:
>>> 
>>> That looks more promising.. I guess the user has to know that he has to use 
>>> the -R y flags, .right?
>> 
>> Correct. Therefore the idea to use a JSV to attach it to parallel jobs only 
>> in your set up. Besides this, "-R y" can also be used to collect the 
>> required memory for a serial job as it could face the same issue while 
>> waiting for a huge bunch.
>> 
>> 
>>> What happens if I set max_reservation to 32 and the user submits a job 
>>> (using -R y) requiring 64 slots?
>> 
>> The max_reservation is not a limit for any requested resource, but the 
>> number of jobs which are considered to have reservations. You can also have 
>> a look at `man sched_conf` about this entry for more details. If you face 
>> this issue only intermittently you could start by setting it to a smaller 
>> value like 4 or so.
>> 
>> -- Reuti
>> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to