On 08/14/2012 02:31 AM, Reuti wrote:
Am 14.08.2012 um 00:27 schrieb Joseph Farran:
Hi Alex.
Thanks for the info, but the issue is more complex.
The issue is that slots cannot be used with Subordinate queues.
Why not? Reason is here:
http://gridengine.org/pipermail/users/2012-August/004372.html
But it seems working even if you don't attach the slots complex to each exechost to pack
(at least) serial jobs by "load_formula slots".
No it is not. At least in my tests. See:
http://gridengine.org/pipermail/users/2012-August/004425.html
Joseph
Best,
Joseph
On 08/13/2012 03:12 PM, Alex Chekholko wrote:
Hi,
I'm not sure if this helps, but we have a working config with:
queue_sort_method seqno
load_formula slots
That puts single-slot jobs onto a single node if a bunch of nodes are empty,
rather than distributing them evenly across empty nodes.
Regards,
Alex
On 08/13/2012 09:14 AM, Joseph Farran wrote:
Hi Reuti / Rayson.
To make we are on the same page, are you saying that for PE jobs using
"$pe_slots" for the "allocation_rule", that Grid Engine does indeed
ignore the "load_formula" on the scheduler?
If Yes, a couple of questions please:
1) Was there a point in which GE did *not* ignore the
"load_formula" for PE jobs "$pe_slots"?
2) Will this be brought back to GE on a future release?
Joseph
On 08/13/2012 08:22 AM, Reuti wrote:
Am 12.08.2012 um 19:55 schrieb Joseph Farran:
Hi Rayson.
Here is one particular entry:
http://gridengine.org/pipermail/users/2012-May/003495.html
I am using Grid Engine 2011.11 binary
http://dl.dropbox.com/u/47200624/respin/ge2011.11.tar.gz
First of all sorry for using the wrong expression. If you used
"-cores_in_use", it should be the positive "slots". As a lower value
is taken first, a lower remaining number of slots should be taken
first. It's working as it should for serial jobs.
But for parallel ones, even with $pe_slots as allocation rule, it's
ignored already in 6.2u5.
-- Reuti
Thanks,
Joseph
On 8/12/2012 10:10 AM, Rayson Ho wrote:
On Sun, Aug 12, 2012 at 5:27 AM, Joseph Farran<[email protected]> wrote:
I saw some old postings that this used to be a bug with GE, that
parallel
jobs were not using the scheduler load_formula. Was this bug
corrected in
GE2011.11 ?
Hi Joseph,
Can you point me to the previous discussion? We did not receive bug
report related to this problem before...
So far, our main focus is to fix issues& bugs reported by our users
first, and may be we've missed the discussion on this bug.
Rayson
Anyone able to test this in GE2011.11 to see if it was fixed?
Joseph
On 8/11/2012 1:51 PM, Reuti wrote:
Am 11.08.2012 um 20:30 schrieb Joseph Farran:
Yes, all my queues have the same "0" for "seq_no".
Here is my scheduler load formula:
qconf -ssconf
algorithm default
schedule_interval 0:0:15
maxujobs 0
queue_sort_method load
job_load_adjustments NONE
load_adjustment_decay_time 0
load_formula -cores_in_use
Can you please try it with -slots? It should behave the same like
your own
complex. In one of your former post you mentioned a different
relation ==
for it.
-- Reuti
Here is a sample display of what is going on. My compute nodes
have 64
cores each:
I submit 4 1-core jobs to my bio queue. Note: I wait around 30
seconds
before submitting each 1-core job, long enough for my
"cores_in_use" to
report back correctly:
job-ID name user state queue slots
-----------------------------------------------------
2324 TEST me r bio@compute-2-3 1
2325 TEST me r bio@compute-2-3 1
2326 TEST me r bio@compute-2-3 1
2327 TEST me r bio@compute-2-3 1
Everything works great with single 1-core jobs. Jobs 2324
through 2327
packed unto one node ( compute-2-3 ) correctly. The
"cores_in_use" for
compute-2-3 reports "4".
Now I submit one 16-core "openmp" PE job:
job-ID name user state queue slots
-----------------------------------------------------
2324 TEST me r bio@compute-2-3 1
2325 TEST me r bio@compute-2-3 1
2326 TEST me r bio@compute-2-3 1
2327 TEST me r bio@compute-2-3 1
2328 TEST me r bio@compute-2-6 16
The scheduler should have picked compute-2-3 since it has 4
cores_in_use,
but instead, it picked compute-2-6 which had 0 cores_in_use.
So here the
scheduler is now behaving differently than with 1-core jobs.
As a further test I wait until my cores_in_use report back that
compute2-6 has "16" cores in use. I now submit another 16-core
"openmp"
job:
job-ID name user state queue slots
-----------------------------------------------------
2324 TEST me r bio@compute-2-3 1
2325 TEST me r bio@compute-2-3 1
2326 TEST me r bio@compute-2-3 1
2327 TEST me r bio@compute-2-3 1
2328 TEST me r bio@compute-2-6 16
2329 TEST me r bio@compute-2-7 16
The schedule now picks yet a different node compute-2-7 which had 0
cores_in_use. I have tried this several times with many config
changes to
the scheduler and it sure looks like that the scheduler is *not*
using the
"load_formula" for PE jobs. From what I can tell, the scheduler
chooses
nodes in random with PE jobs.
Here is my "openmp" PE:
# qconf -sp openmp
pe_name openmp
slots 9999
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule $pe_slots
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
Here is my "bio" Q showing relevant info:
# qconf -sq bio | egrep "qname|slots|pe_list"
qname bio
pe_list make mpi openmp
slots 64
Thanks for taking a look at this!
On 8/11/2012 4:32 AM, Reuti wrote:
Am 11.08.2012 um 02:57 schrieb Joseph Farran<[email protected]>:
Reuti,
Are you sure this works in GE2011.11?
I have defined my own complex called "cores_in_use" which
counts both
single cores and PE cores correctly.
It works great for single core jobs, but not for PE jobs using the
"$pe_slots" allocation rule.
# qconf -sp openmp
pe_name openmp
slots 9999
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule $pe_slots
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
# qconf -ssconf
algorithm default
schedule_interval 0:0:15
maxujobs 0
queue_sort_method seqno
The seq_no is the same for the queue instances in question?
-- Reuti
job_load_adjustments cores_in_use=1
load_adjustment_decay_time 0
load_formula -cores_in_use
schedd_job_info true
flush_submit_sec 5
flush_finish_sec 5
I wait until the node reports the correct "cores_in_use"
complex, I
then submit a PE openmp job and it totally ignores the
"load_formula" on the
scheduler.
Joseph
On 08/09/2012 12:50 PM, Reuti wrote:
Correct. It uses the "allocation_rule" specified in the PE
instead.
Only for "allocation_rule" set to $PE_SLOTS it will also use the
"load_formula". Unfortunately there is nothing what you can do
to change the
behavior.
-- Reuti
Am 09.08.2012 um 21:23 schrieb Joseph Farran<[email protected]>:
Howdy.
I am using GE2011.11.
I am successfully using GE "load_formula" to load jobs by
core count
using my own "load_sensor" script.
All works as expected with single core jobs, however, for PE
jobs, it
seems as if GE does not abide by the "load_formula".
Does the scheduler use a different "load" formula for single
core
jobs verses parallel jobs suing the PE environment setup?
Joseph
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users