Hello Open MPI community,
We have a smaller Linux GPU cluster here at Boise State University which is running the following: CentOS 6.5 Bright Cluster Manager 6.1 PBS Pro 11.2 Openmpi Versions: 1.6.5 1.8.4 1.8.5 On our cluster, we allow the compute nodes to be shared with multiple jobs if the job resource requests coincide with each other to fit on one compute node. So I was observing the behavior of our Open MPI installations and noticed the following: - 1. When a user submits an mpirun job, the executable floats between different processor cores throughout its runtime on the compute node. I am sure this because of the Operating System processor scheduler, but is there a way in Open MPI to prevent this by default? Certain option in the build process? Or is this an Operating System configuration change? Is this a good or bad thing that the Operating System interrupts the executable? - - 2. Since we allow sharing of the compute nodes with multiple jobs, I noticed if users utilize the option bind-to-core, Open MPI starts with CPU core 0 and works its way sequentially as stated in the man pages for this option. Since we do allow sharing of the nodes with multiple jobs, I have seen two separate jobs with binding options, overload the same CPU core(s) which causes the job to run longer than expected. Is there a way to configure Open MPI to observe the current binding of other jobs and allocate the job bindings around previous bound jobs? Thanks in advance for any advice you may provide, Jason Cook Boise State University