Hi,
Following up with the thread with the same subject (
http://thread.gmane.org/gmane.comp.clustering.opengridengine.user/894/ ).
We're using sge 6.2u5, our setup is 2 machines(its a testing cluster)
with 2 cores each machine.
-qsub -q v20z.q -pe smp 1 script.sub
-wait until the job runs
-qsub -q v20z.q -pe smp 1 script.sub
Each job enters a different node. However if we do:
-for i in 1 2; do qsub -q v20z.q -pe smp 1 script.sub; done
Then both jobs enter into the same node.
Our scheduling conf is as follows:
----------------------------
algorithm default
schedule_interval 0:0:15
maxujobs 0
queue_sort_method load
job_load_adjustments NONE
load_adjustment_decay_time 0:7:30
load_formula slots
schedd_job_info true
flush_submit_sec 0
flush_finish_sec 0
params MONITOR=1
reprioritize_interval 0:0:0
halftime 168
usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor 2.000000
weight_user 0.250000
weight_project 0.250000
weight_department 0.250000
weight_job 0.250000
weight_tickets_functional 0
weight_tickets_share 1000000
share_override_tickets TRUE
share_functional_shares FALSE
max_functional_jobs_to_schedule 200
report_pjob_tickets TRUE
max_pending_tasks_per_job 50
halflife_decay_list none
policy_hierarchy OS
weight_ticket 0.890000
weight_waiting_time 0.000000
weight_deadline 3600000.000000
weight_urgency 0.100000
weight_priority 0.010000
max_reservation 50
default_duration 9999:00:00
--------------------------------------------
The smp PE config is:
pe_name smp
slots 999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $pe_slots
control_slaves FALSE
job_is_first_task TRUE
urgency_slots min
accounting_summary TRUE
-------------------------
The config on both nodes are like this:
hostname v20z-03
load_scaling NONE
complex_values mem_free=7891.796875M,slots=2
load_values arch=lx24-amd64,num_proc=2,mem_total=7935.984375M, \
swap_total=4095.992188M,virtual_total=12031.976562M, \
h_fsize=9.7G,load_avg=0.180000,load_short=0.080000, \
load_medium=0.180000,load_long=0.090000, \
mem_free=7830.246094M,swap_free=4095.992188M, \
virtual_free=11926.238281M,mem_used=105.738281M, \
swap_used=0.000000M,virtual_used=105.738281M, \
cpu=0.000000,m_topology=SCSC,m_topology_inuse=SCSC, \
m_socket=2,m_core=2,np_load_avg=0.090000, \
np_load_short=0.040000,np_load_medium=0.090000, \
np_load_long=0.045000
processors 2
user_lists NONE
xuser_lists NONE
projects NONE
xprojects NONE
usage_scaling cpu=12.300000
report_variables NONE
-------------------------------
The queue config:
qname v20z.q
hostlist @v20z
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:01:00
priority 0
min_cpu_interval 00:01:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list BLCR
pe_list make smp
rerun FALSE
slots 2
tmpdir /scratch
shell /bin/csh
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values split=2
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
-----------------------------------
From what i understood, its possible that this method is broken, am i
right?
I've also tried the scheduler configuration in the following links, with
the same result:
http://article.gmane.org/gmane.comp.clustering.opengridengine.user/1037
http://wiki.gridengine.info/wiki/index.php/StephansBlog
Thanks in advance!
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users