Thanks Reuti for your quick-as-always answer!

I have added an additional express-PE for each existing PE which
should ensure that express jobs really stick to the express queue;
users submit with wildcard-PEs anyway, so they don't need to change
anything.

Erik Soyez.


On Wed, 6 Mar 2013, Reuti wrote:

Hi,

Am 06.03.2013 um 08:09 schrieb Erik Soyez:

Good morning,

we have a little cluster with the basic queue setup of 3 queues:

        long
        regular
        express

We have 25 nodes, 23 of them have the queues long and regular, 2 of them
have the queues regular and express.  "Long" is subordinate to "regular",
"regular" is subordinate to "express".

There ist a boolean resource called "express" attached to the express
queue on the 2 "express nodes" and to the regular queue on the 23
"long nodes".

Express jobs are submitted with "qsub -l express".

I remember seeing it with parallel jobs going to the wrong queue for
some slots despite the fact that the h_rt wasn't met and they were
aborted as a result. But as sudden as I observed it, it was gone again.


This setup works fine most of the time but it happens once in a while
that a parallel express jobs runs in queue "regular" _and_ "express" on

For node03-06 this can indeed happen, but I thought it was fixed for 6.2u5 
already.


the same node and suspends itself, even thought queue "regular" has no
"express" ressource (on that node):

------------------------------------------------------------------------
Complex values:
express    prio    BOOL    ==    YES    NO    0    50000

What about a PE "express" instead of/in addition to the express complex?
It should stay in the queue to which this PE is attached then. In case
you want to use the urgency, the BOOL complex can stay attached in
addition of course.


Queue "long":
hostlist          @long
complex_values    express=0

It's not necessary to set it to express=0, unless you want to submit
explicitly with the request "-l express=0". If you don't specify the
express complex, it's not considered as a condition which needs to be
matched.


Queue "express":
hostlist          @express
complex_values    express=1

Queue "regular":
hostlist          @allhosts
complex_values    express=1,[@express=express=0]

Well, it's not possible to "unset" the complex again. maybe it would
help to define it only for node03-node06 as being TRUE.

-- Reuti


Hostgroup "@allhosts":
@allhosts
  @express
     host01
     host02
  @long
     host03
     host04
     [ ... ]
     host25
------------------------------------------------------------------------


- Why do jobs with "-l express" run in "regular@host01" even though it
 does not have the express ressource attached?

- Any ideas on how to work around this problem?





--






























--
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs, Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to