Hi Reuti,

>> However, large adavance reservations seem to behave erratically. Our
>> cluster is comprised of 576 nodes, each with 8 cores, so distributed
>> among queues:
>>
>>    # qstat -g c
>>    CLUSTER QUEUE                                       CQLOAD   USED
>>  RES  AVAIL  TOTAL aoACDPS  cdsuE
>>    
>> ----------------------------------------------------------------------------------------------
>>    long.q                                                1.88      8
>>    0      0    496    112      0
>>    med.q                                                 0.96      8
>>    0     16    960      0      0
>>    short.q                                               0.81      8
>>    0    264   1536     88     16
>>    test.q                                                0.36      0
>>    0     80     80      0      0
>>    very-short.q                                          0.36      8
>>    0     40     80      0      0
>>    wide.q                                                1.23      8
>>    0    616   1536    288      0
>>
>> No two queues overlap, excaept for queue `test.q` which is reserved to
>> sysadmins and thus of no concern.
>>
>> A reservation for 4608 slots (576 nodes with 8 cores each) for 12
>> hours (from 08:00 to 20:00)
>> fails:
>>
>>    # qrsub -a 201305190800 -e 201305192000 -pe parastation 4608
>>    queue "short.q@r06c01b12n01" is temporarily disabled
>>    queue "short.q@r06c01b12n02" is temporarily disabled
>>    advance_reservation: no suitable queues
>
> The PE is also attached to very-short.q?

Yes:

    murri@login2:~> qconf -sq very-short.q
    qname                 very-short.q
    ...
    pe_list               intel-mpi2_mpd mpich1_rsh mpich2_rsh openmpi
openmpi2 \
                          parastation smp

>> However, trying to reserve the same number of slots for the next day
>> fails again:
>>
>>    # qrsub -a 201305200800 -e 201305202000 -pe parastation 4528
>>    queue "short.q@r06c01b12n01" is temporarily disabled
>>    queue "short.q@r06c01b12n02" is temporarily disabled
>>    advance_reservation: no suitable queues
>
> Maybe the problem is not the very-short.q, but the actual running jobs which 
> have a longer h_rt defined.

Maximal runtime for any job is 3 days (in long.q, other queues have
shorter limits), so that shouldn't affect reservations that happen to
be in one month from now, shouldn't it?

However, some jobs only define {s,h}_cpu and no {s,h}_rt.  Could that
be the cause?


> Is there in addition any calendar defined?

No:

    murri@login2:~> qconf -scall
    no calendar defined

Thanks,
Riccardo
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to