We have a user with a code that uses threaded solvers inside each MPI
rank. They would like to run two threads per process.
The question is how to launch this? The default -byslot puts all the
processes on the first sets of cpus not leaving any cpus for the
second thread for each process. And half the cpus are wasted.
The -bynode option works in theory, if all our nodes had the same
number of core (they do not).
So right now the user did:
#PBS -l nodes=22:ppn=2
export OMP_NUM_THREADS=2
mpirun -np 22 app
Which made me aware of the problem.
How can I basically tell OMPI that a 'slot' is two cores on the same
machine? This needs to work inside out torque based queueing system.
Sorry If I was not clear about my goal.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985