We have a pe environment threaded and each node has 30 slots, 120GB ram.
Jobs requiring pe slots >= 19 are getting stuck in queue in qw state with
following error,
parallel environment: threaded range: 19
scheduling info: cannot run in PE "threaded" because it only
offers 0 slots
which doesnt make any sense. currently there are more than 30 nodes that
are idle with 30 slots each
I am running simple test job, no other complexes are requested.
echo "sleep 10" | qsub -pe threaded 19
We are using GE 2011.11p1
Here is the output of one of execute host in sge config,
hostname compute-2-2.local
load_scaling NONE
complex_values slots=30,h_vmem=120G,io_slots=30
load_values arch=linux-x64,num_proc=30,mem_total=123136.023438M, \
swap_total=3999.992188M,virtual_total=127136.015625M,
\
load_avg=11.020000,load_short=11.000000, \
load_medium=11.020000,load_long=10.810000, \
mem_free=75806.339844M,swap_free=3973.246094M, \
virtual_free=79779.585938M,mem_used=47329.683594M, \
swap_used=26.746094M,virtual_used=47356.429688M, \
cpu=36.200000,m_socket=30,m_core=30,np_load_avg=0.367333, \
np_load_short=0.366667,np_load_medium=0.367333, \
np_load_long=0.360333
processors 30
user_lists NONE
xuser_lists NONE
projects NONE
xprojects NONE
usage_scaling NONE
report_variables NONE
Thanks,
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users