Thanks Reuti for your quick-as-always answer!
I have added an additional express-PE for each existing PE which
should ensure that express jobs really stick to the express queue;
users submit with wildcard-PEs anyway, so they don't need to change
anything.
Erik Soyez.
On Wed, 6 Mar 2013, Reuti wrote:
Hi,
Am 06.03.2013 um 08:09 schrieb Erik Soyez:
Good morning,
we have a little cluster with the basic queue setup of 3 queues:
long
regular
express
We have 25 nodes, 23 of them have the queues long and regular, 2 of them
have the queues regular and express. "Long" is subordinate to "regular",
"regular" is subordinate to "express".
There ist a boolean resource called "express" attached to the express
queue on the 2 "express nodes" and to the regular queue on the 23
"long nodes".
Express jobs are submitted with "qsub -l express".
I remember seeing it with parallel jobs going to the wrong queue for
some slots despite the fact that the h_rt wasn't met and they were
aborted as a result. But as sudden as I observed it, it was gone again.
This setup works fine most of the time but it happens once in a while
that a parallel express jobs runs in queue "regular" _and_ "express" on
For node03-06 this can indeed happen, but I thought it was fixed for 6.2u5
already.
the same node and suspends itself, even thought queue "regular" has no
"express" ressource (on that node):
------------------------------------------------------------------------
Complex values:
express prio BOOL == YES NO 0 50000
What about a PE "express" instead of/in addition to the express complex?
It should stay in the queue to which this PE is attached then. In case
you want to use the urgency, the BOOL complex can stay attached in
addition of course.
Queue "long":
hostlist @long
complex_values express=0
It's not necessary to set it to express=0, unless you want to submit
explicitly with the request "-l express=0". If you don't specify the
express complex, it's not considered as a condition which needs to be
matched.
Queue "express":
hostlist @express
complex_values express=1
Queue "regular":
hostlist @allhosts
complex_values express=1,[@express=express=0]
Well, it's not possible to "unset" the complex again. maybe it would
help to define it only for node03-node06 as being TRUE.
-- Reuti
Hostgroup "@allhosts":
@allhosts
@express
host01
host02
@long
host03
host04
[ ... ]
host25
------------------------------------------------------------------------
- Why do jobs with "-l express" run in "regular@host01" even though it
does not have the express ressource attached?
- Any ideas on how to work around this problem?
--
--
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs,
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users