<[email protected]> writes: > I think we're running into a chipset-architecture issue (AMD vs Intel) > in OpenMPI jobs. We're using SGE 6.2u5 and OpenMPI 1.33 with tight > integration. All MPI jobs are launched by SGE.
At least if you're using recent AMD, you'll want to upgrade OpenMPI (and SGE, if you don't use exclusive node access) to get binding right. SGE 8.0.0c+ is useful for partially-full nodes. > We've got a locally-written program that dynamically links against > a package that's compiled with optimizations for different chipsets > (ATLAS[2]). We've built ATLAS with multiple versions, optimized > for each architecture in our cluster. > > This is fine for serial jobs--the login environment > sets the path according to the chipset on each server > (ie., ATLASDIR=/opt/ATLAS/3.8.3/Intel/Xeon/Westmere or > ATLASDIR=/opt/ATLAS/3.8.3/AMD/Opteron). We do the same thing for other > packages that provide chipset optimization (LAPACK[3] and BLAS[4]). [If you don't mind proprietary libraries, ACML probably does better on AMD. Why use the netlib BLAS at all when you have a tuned one available?] > Our executables are dynamically linked, so there's no problem running > the same program on either the Intel or AMD machines. Users simply submit > the job to SGE and the executable uses the correct library for the server > at runtime. > > Everything is fine if the job (MPI master process and slaves) all run > on nodes of the same chip architecture. > > However, there seems to be a problem with OpenMPI jobs if the slave > process runs on a different chipset than the master. I believe > that the slave jobs are launched without going through a shell, so > they don't get the environment settings that would be applied in an > interactive session or SGE job. The slave process seems to run with > the same paths as the parent. For example, if the master MPI job > is launched on an Intel node, LD_LIBRARY_PATH may be set to include > "/opt/ATLAS/3.8.3/Intel/Xeon/Westmere/lib", and this seems to be passed > to slave MPI processes running on AMD nodes, with the result that they > pick up the wrong library and this causes a segmentation fault. Provide architecture-dependent directories with architecture-independent names, e.g. /opt/ATLAS/3.8.3/Intel/Xeon/Westmere/lib stuff is mounted/linked into /usr/local/lib/atlas in your node image for Westmeres. > I could set up separate MPI queues within SGE per-chipset (ie., submit jobs > with "-pe mpi-intel" or "-pe mpi-amd"), but that adds a complication for users > and reduces the effectiveness of SGE doing the scheduling. That's the right thing to do under most circumstances. I'm told by experts that MPI programs normally don't work well with different types of processors which unbalance the processes (and thus presumably reduce efficiency). It shouldn't add complication for the users. Our PEs aren't named for architectures, but do actually distinguish them -- e.g. by core count -- as well as fabrics. "-pe openmpi" is re-written to openmpi-* as usual. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
