Hi,
Am 18.04.2013 um 17:37 schrieb Riccardo Murri:
>>> However, large adavance reservations seem to behave erratically. Our
>>> cluster is comprised of 576 nodes, each with 8 cores, so distributed
>>> among queues:
>>>
>>> # qstat -g c
>>> CLUSTER QUEUE CQLOAD USED
>>> RES AVAIL TOTAL aoACDPS cdsuE
>>>
>>> ----------------------------------------------------------------------------------------------
>>> long.q 1.88 8
>>> 0 0 496 112 0
>>> med.q 0.96 8
>>> 0 16 960 0 0
>>> short.q 0.81 8
>>> 0 264 1536 88 16
>>> test.q 0.36 0
>>> 0 80 80 0 0
>>> very-short.q 0.36 8
>>> 0 40 80 0 0
>>> wide.q 1.23 8
>>> 0 616 1536 288 0
>>>
>>> No two queues overlap, excaept for queue `test.q` which is reserved to
>>> sysadmins and thus of no concern.
>>>
>>> A reservation for 4608 slots (576 nodes with 8 cores each) for 12
>>> hours (from 08:00 to 20:00)
>>> fails:
>>>
>>> # qrsub -a 201305190800 -e 201305192000 -pe parastation 4608
>>> queue "short.q@r06c01b12n01" is temporarily disabled
>>> queue "short.q@r06c01b12n02" is temporarily disabled
>>> advance_reservation: no suitable queues
>>
>> The PE is also attached to very-short.q?
>
> Yes:
>
> murri@login2:~> qconf -sq very-short.q
> qname very-short.q
> ...
> pe_list intel-mpi2_mpd mpich1_rsh mpich2_rsh openmpi
> openmpi2 \
> parastation smp
>
>>> However, trying to reserve the same number of slots for the next day
>>> fails again:
>>>
>>> # qrsub -a 201305200800 -e 201305202000 -pe parastation 4528
>>> queue "short.q@r06c01b12n01" is temporarily disabled
>>> queue "short.q@r06c01b12n02" is temporarily disabled
>>> advance_reservation: no suitable queues
>>
>> Maybe the problem is not the very-short.q, but the actual running jobs which
>> have a longer h_rt defined.
>
> Maximal runtime for any job is 3 days (in long.q, other queues have
> shorter limits), so that shouldn't affect reservations that happen to
> be in one month from now, shouldn't it?
>
> However, some jobs only define {s,h}_cpu and no {s,h}_rt. Could that
> be the cause?
In principle yes, the default_duration will be taken then for the exstimated
runtime of the job. But I must admit, that it's strange that it's blocked at a
later point in time. There is no other AR in the way I assume.
-- Reuti
>> Is there in addition any calendar defined?
>
> No:
>
> murri@login2:~> qconf -scall
> no calendar defined
>
> Thanks,
> Riccardo
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users