Re: [gridengine users] MPI jobs on a multi-architecture cluster?

Reuti Thu, 13 Dec 2012 09:41:17 -0800

Hi,

Am 13.12.2012 um 00:27 schrieb [email protected]:


> I've got a question that's very similar to Joseph Farran's query "How do I
> request the CPU type in qrsh / qsub with SGE 8.1.2?" [1], but which is a
> problem specifically with MPI jobs.
> 
> I think we're running into a chipset-architecture issue (AMD vs Intel)
> in OpenMPI jobs. We're using SGE 6.2u5 and OpenMPI 1.33 with tight
> integration. All MPI jobs are launched by SGE.
> 
> We've got a locally-written program that dynamically links against
> a package that's compiled with optimizations for different chipsets
> (ATLAS[2]). We've built ATLAS with multiple versions, optimized
> for each architecture in our cluster.
> 
> This is fine for serial jobs--the login environment
> sets the path according to the chipset on each server
> (ie., ATLASDIR=/opt/ATLAS/3.8.3/Intel/Xeon/Westmere or
> ATLASDIR=/opt/ATLAS/3.8.3/AMD/Opteron). We do the same thing for other
> packages that provide chipset optimization (LAPACK[3] and BLAS[4]).
> 
> Our executables are dynamically linked, so there's no problem running
> the same program on either the Intel or AMD machines. Users simply submit
> the job to SGE and the executable uses the correct library for the server
> at runtime.
> 
> Everything is fine if the job (MPI master process and slaves) all run
> on nodes of the same chip architecture.
> 
> However, there seems to be a problem with OpenMPI jobs if the slave
> process runs on a different chipset than the master. I believe
> that the slave jobs are launched without going through a shell, so
> they don't get the environment settings that would be applied in an
> interactive session or SGE job. The slave process seems to run with
> the same paths as the parent. For example, if the master MPI job
> is launched on an Intel node, LD_LIBRARY_PATH may be set to include
> "/opt/ATLAS/3.8.3/Intel/Xeon/Westmere/lib", and this seems to be passed
> to slave MPI processes running on AMD nodes, with the result that they
> pick up the wrong library and this causes a segmentation fault.
> 
> I could set up separate MPI queues within SGE per-chipset (ie., submit jobs
> with "-pe mpi-intel" or "-pe mpi-amd"), but that adds a complication for users
> and reduces the effectiveness of SGE doing the scheduling.

Are you really sure to compute on different types of CPUs? As I noticed small 
variations already in the output of different Xeon CPUs (while I didn't 
investigate this further to limit it to the CPU or the same used library [maybe 
with a different execution path]), having a different combination of machines 
in each run makes it hard to reproduce exactly the results you got. For one and 
the same computation I would stay on one type of CPU. As Dave already pointed 
out, this is like limiting it to one fabric:

http://www.gridengine.info/2006/02/14/grouping-jobs-to-nodes-via-wildcard-pes/

(nowadays you can even stay with one queue, and attach different PEs to 
different hostgroups just in one definition).

-- Reuti


> I'm wondering if there's a way to force SGE to select slave nodes from the
> same architecture type as the master MPI process, at run-time. We've already
> got the architecture as an attribute within SGE. In other words, when SGE
> determines which nodes have resources available to make up the "machine list"
> passed to OpenMPI, could that list be restricted to nodes of the same
> architecture as the node that SGE selects for the master process?
> Thanks,
> 
> Mark
> 
>       [1] http://gridengine.org/pipermail/users/2012-December/005329.html
>       [2] http://math-atlas.sourceforge.net/
>       [3] http://www.netlib.org/lapack/
>       [4] http://www.netlib.org/blas/
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] MPI jobs on a multi-architecture cluster?

Reply via email to