Hi all,
I have open-MPI parallel environment configured in our cluster and it was
working fine till now that we have lots of simple jobs in queue and ompi
ones are not been scheuled. They've been in queue for some time and now
they are the first ones to be scheduled, but they never find not enough
free slots. Every time a slot is free, some jobs with low priority starts :
I've added the "-R y" to force the resoruce reservation, but jobs are still
in queue.
so I'm missing some configuration step and I've been reading and looking
around but I¡ve not found what is it...
65316 0.05000 wath.sh XXXXXX r 02/25/2013 20:37:50
1
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
63070 0.26805 mpi_astrid YYYYY qw 02/25/2013 12:20:45
20
some of my conf:
# qconf -sp ompi
pe_name ompi
slots 128
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
# qconf -sq default|grep omp
pe_list smp ompi
qstta:
job_number: 63070
exec_file: job_scripts/63070
submission_time: Mon Feb 25 12:20:45 2013
owner: XXXX
uid: XXX
group: XXXX
gid: 6171
sge_o_home: /users/jXXXX
sge_o_log_name: XXXX
sge_o_path:
/usr/lib64/openmpi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/lib64/openmpi/bin/:/usr/lib64/compat-openmpi/bin/:/users/jjaeger/dcicin/bin
sge_o_shell: /bin/bash
sge_o_workdir: /nfs/users/XXXXX
sge_o_host: ant-XXX
account: sge
cwd: /users/XXXXX
reserve: y
merge: y
hard resource_list: virtual_free=12G
mail_list: XXX@ant-XXXXXes
notify: FALSE
job_name: mpi_astrid.sh
jobshare: 0
shell_list: NONE:/bin/bash
env_list:
script_file: mpi_astrid.sh
parallel environment: ompi range: 20
version: 1
[...]
cannot run in PE "ompi" because it only offers 2 slots
I 'm sure I'm missing some conf, but I don't know which file is it...
Anyone could give me a hand?
TIA,
Arnau
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users