On 29 October 2012 14:25, Steve Schmerler <[email protected]> wrote:
> Hello
>
> We're using slot reservation (qsub -R y) and I'd like to know if there is 
> another
> way to see how many slots are currently reserved for a particular job besides
> $SGE_ROOT/$SGE_CELL/common/schedule. There I find
>
>     32309:1:RESERVING:1352406813:35996460:P:openmpi:slots:32.000000
>     
> 32309:1:RESERVING:1352406813:35996460:Q:[email protected]:slots:8.000000
>     
> 32309:1:RESERVING:1352406813:35996460:Q:[email protected]:slots:8.000000
>     
> 32309:1:RESERVING:1352406813:35996460:Q:[email protected]:slots:8.000000
>     
> 32309:1:RESERVING:1352406813:35996460:Q:[email protected]:slots:8.000000
>
> Does that indicate that the scheduler already reserved 4x8 slots on 4
> nodes? If so, then this information is not correct since we have only 16
> slots free in that queue right now.

It's reserving all the slots it might need not just the ones that are
currently free but the ones it anticipates being free.  If it didn't
do this it might in some circumstances
overcommit occupied slots.

>
> Where does SGE store reservation-related information -- only in memory
> or also in a file (which I could not locate anywhere)?
>
> If it stores it only in memory, then the reservation state may get reset
> if sge_qmaster crashes (we have the famous qmaster-random-crash problem
> [1] and currently do cron'ed restarts of sge_qmaster)

There isn't really a reservation state to save.  The reservations are
remade anew every scheduling cycle but tend to be stable because the
same algorithm is used against a cluster state where the only new
resource usage since the last scheduling cycle is compatible with the
reservations made last cycle.  That said there can be some funny
heuristic based reservation wobble at times.


William
>
>
> Thanks.
>
> best,
> Steve
>
> [1] 
> http://markmail.org/search/?q=gridengine%20list%3Anet.sunsource.gridengine.users%20sge_master%20segfault#query:gridengine%20list%3Anet.sunsource.gridengine.users%20sge_master%20segfault+page:1+mid:njkqj4byiqvye67i+state:results
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to