Re: [gridengine users] "Packing" jobs on nodes v2

iqtcub Mon, 14 May 2012 02:48:01 -0700

Hi,

On 05/14/2012 08:39 AM, iqtcub wrote:

Hi,
On 05/11/2012 09:03 PM, Alex Chekholko wrote:
On 05/11/2012 05:11 AM, iqtcub wrote:
 From what i understood, its possible that this method is broken, am i
right?
I mostly understand your config; I think the primary thing to look atis the:
queue_sort_method                 load
load_formula                      slots
We use load_formula=slots, which puts jobs onto nodes that have thefewest available slots (which is like "$fill_up", but for batchinstead of parallel).
I believe there was a bug that made that particular setting not workcorrectly in earlier versions; it works as we expect on a newer build(Rayson's OGE 2011.11). Our goal is to pack the single-core jobs onas few nodes as possible, preserving slots on other nodes formulti-slot jobs.
Can you describe the behaviour you want to see?

Regards,
Thanks for your answers.
In our test environment we only have 2 core machines, but in ourproduction environment we have a 12 core ones. Our users mainly useall the 12 cores on those machines, but there are some jobs using 1core, 2 cores, 4 cores, 6 and so on.
One of the problems we find from time to time, is that when a job ofan user shares a node with another jobs, the node may run out ofmemory, hanging the node, and crashing all the jobs in that node. Toovercome this problem, we forced the users to use the 'smp' PE ipasted above, so for those users that request the full 12 cores theydon't share the node with another job. For sure, the users couldspecify the memory their job is going to use and not having to dothis, but we get paid to maintain the system, not to tell them how tosubmit jobs.
Anyway, focusing on the subject. What we want to see is that jobs thatdon't use all the cores in one machine(that would be 1 core in ourtest environment), get "packed" in one node instead of using the lessloaded nodes.
- Reuti, i just tested submitting the jobs without specifying the PEand it works, but as you can see from my explanation, we also needthis to work in more than one core.
I agree that our test environment isn't very suitable for reproducingour production environment, as we're talking about machines with 12cores vs machines with 2 cores. I'll try to get some virtual machinesrunning with more cores and i'll try this again and if it stilldoesn't work i'll try to upgrade.
Once i get the results we'll let you know.

Thanks!

I just tested with two virtual machines having 12 cores each. Sameresult, if i don't use a PE, it works fine, however if i use the PE smpi mentioned above(using $pe_slots as allocation_rule), it puts each jobin a different node.


I've also just tried using the binaries of OGS2011.11 with the same result.

On the other hand, i've seen that in the other threads in the mailinglist, the people seem to use this option only when the jobs are of 1core, which works perfectly.


So, any hints? Anyone has it working for more than one core?

Thanks
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] "Packing" jobs on nodes v2

Reply via email to