Hello Open MPI community,


We have a smaller Linux GPU cluster here at Boise State University which is
running the following:

CentOS 6.5

Bright Cluster Manager 6.1

PBS Pro 11.2

Openmpi Versions:

                1.6.5

                1.8.4

                1.8.5


On our cluster, we allow the compute nodes to be shared with multiple jobs
if the job resource requests coincide with each other to fit on one compute
node.


So I was observing the behavior of our Open MPI installations and noticed
the following:


   - 1.       When a user submits an mpirun job, the executable floats
   between different processor cores throughout its runtime on the compute
   node. I am sure this because of the Operating System processor scheduler,
   but is there a way in Open MPI to prevent this by default? Certain option
   in the build process? Or is this an Operating System configuration change?
   Is this a good or bad thing that the Operating System interrupts the
   executable?
   -
   - 2.      Since we allow sharing of the compute nodes with multiple
   jobs, I noticed if users utilize the option bind-to-core, Open MPI starts
   with CPU core 0 and works its way sequentially as stated in the man pages
   for this option. Since we do allow sharing of the nodes with multiple jobs,
   I have seen two separate jobs with binding options, overload the same CPU
   core(s) which causes the job to run longer than expected. Is there a way to
   configure Open MPI to observe the current binding of other jobs and
   allocate the job bindings around previous bound jobs?



Thanks in advance for any advice you may provide,

Jason Cook

Boise State University

Reply via email to