Good morning,
we have a little cluster with the basic queue setup of 3 queues:
long
regular
express
We have 25 nodes, 23 of them have the queues long and regular, 2 of them
have the queues regular and express. "Long" is subordinate to "regular",
"regular" is subordinate to "express".
There ist a boolean resource called "express" attached to the express
queue on the 2 "express nodes" and to the regular queue on the 23
"long nodes".
Express jobs are submitted with "qsub -l express".
This setup works fine most of the time but it happens once in a while
that a parallel express jobs runs in queue "regular" _and_ "express" on
the same node and suspends itself, even thought queue "regular" has no
"express" ressource (on that node):
------------------------------------------------------------------------
Complex values:
express prio BOOL == YES NO 0 50000
Queue "long":
hostlist @long
complex_values express=0
Queue "express":
hostlist @express
complex_values express=1
Queue "regular":
hostlist @allhosts
complex_values express=1,[@express=express=0]
Hostgroup "@allhosts":
@allhosts
@express
host01
host02
@long
host03
host04
[ ... ]
host25
------------------------------------------------------------------------
- Why do jobs with "-l express" run in "regular@host01" even though it
does not have the express ressource attached?
- Any ideas on how to work around this problem?
Many thanks!!
Erik Soyez.
--
--
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs,
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users