Hi,
On 05/11/2012 09:03 PM, Alex Chekholko wrote:
On 05/11/2012 05:11 AM, iqtcub wrote:
From what i understood, its possible that this method is broken, am i
right?
I mostly understand your config; I think the primary thing to look at
is the:
queue_sort_method load
load_formula slots
We use load_formula=slots, which puts jobs onto nodes that have the
fewest available slots (which is like "$fill_up", but for batch
instead of parallel).
I believe there was a bug that made that particular setting not work
correctly in earlier versions; it works as we expect on a newer build
(Rayson's OGE 2011.11). Our goal is to pack the single-core jobs on
as few nodes as possible, preserving slots on other nodes for
multi-slot jobs.
Can you describe the behaviour you want to see?
Regards,
Thanks for your answers.
In our test environment we only have 2 core machines, but in our
production environment we have a 12 core ones. Our users mainly use all
the 12 cores on those machines, but there are some jobs using 1 core, 2
cores, 4 cores, 6 and so on.
One of the problems we find from time to time, is that when a job of an
user shares a node with another jobs, the node may run out of memory,
hanging the node, and crashing all the jobs in that node. To overcome
this problem, we forced the users to use the 'smp' PE i pasted above, so
for those users that request the full 12 cores they don't share the node
with another job. For sure, the users could specify the memory their job
is going to use and not having to do this, but we get paid to maintain
the system, not to tell them how to submit jobs.
Anyway, focusing on the subject. What we want to see is that jobs that
don't use all the cores in one machine(that would be 1 core in our test
environment), get "packed" in one node instead of using the less loaded
nodes.
- Reuti, i just tested submitting the jobs without specifying the PE and
it works, but as you can see from my explanation, we also need this to
work in more than one core.
I agree that our test environment isn't very suitable for reproducing
our production environment, as we're talking about machines with 12
cores vs machines with 2 cores. I'll try to get some virtual machines
running with more cores and i'll try this again and if it still doesn't
work i'll try to upgrade.
Once i get the results we'll let you know.
Thanks!
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users