Thanks for your reply. There are some updates, but it was too late last
night to post it.

I now have the AMD/Intel heterogeneous cluster up and running. The initial
problem was that when I installed OpenMPI on the AMD nodes, the library
paths were set to a different location than on the Intel nodes. I am not
sure why.

In any case I then followed the suggestion from the FAQ and instead shared
the same OpenMPI install directory with all the nodes via NFS. Now the job
is running so I can confirm that it is indeed possible to run the same job
on a heterogeneous cluster comprised of AMD and Intel nodes.

I am using OpenMPI 1.7.4 now.

There is a related problem though. I am sharing /opt/openmpi-1.7.4 via NFS
but there does not seem to be a way to tell the nodes where OpenMPI is
located when using non-interactive SSH (using secure key login). SSH does
not seem to parse .bash_profile so I do not know how to tell the jobs on
the nodes where to find OpenMPI except by starting the job with
/opt/openmpi-1.7.4/bin/mpirun.

Regarding open-mx, yes I will look into that next to see if the job is
indeed using it. My msa flag is --mca mx self

Reply via email to