Am 14.02.2011 um 12:31 schrieb Erik Soyez:
> Good day,
>
> we have a major problem with the subordinate queue mechanism in 6.2u5:
>
> Setup: o queues "long" & "standard"
> o queue "long" is subordinate to queue "standard"
> ("subordinate_list long=1")
>
> When a "long" job is spreaded over two hosts (e.g. 24 slots, 12 each)
> it gets suspended by "standard" jobs on one of these host as expected.
> When a second "standard" job starts on the other host and then the
> first "standard" job finishes, the "long" job gets resumed and the
> second host is overloaded with two jobs.
>
> o Is this a known problem?
> o Are there any patches available anywhere?
> o Are there any workarounds?
> o Does anybody know, in which older SGE versions this works
> as expected? (last time we used it was with 6.0u6....)
Confirmed. It looks like to happen whan a job is running on the machine, which
didn't trigger the suspension. As long as new jobs are scheduled to the node
which triggered the suspension, all is fine.
Workarounds I see are: a) suspend the parallel jobs by hand `qmod -sj ...`, b)
run jobs in the long queue with a nice value of 19 (queue_config priority) and
don't suspend them at all.
-- Reuti
> Many thanks, Erik Soyez.
>
>
> --
>
>
>
> --
> Vorstand/Board of Management:
> Dr. Bernd Finkbeiner, Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
> Vorsitzender des Aufsichtsrats/
> Chairman of the Supervisory Board:
> Michel Lepert
> Sitz/Registered Office: Tuebingen
> Registergericht/Registration Court: Stuttgart
> Registernummer/Commercial Register No.: HRB 382196
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users