Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

Noam Bernstein Thu, 27 Feb 2014 08:06:02 -0500 (EST)

On Feb 27, 2014, at 2:36 AM, Patrick Begou <patrick.be...@legi.grenoble-inp.fr> 
wrote:

> Bernd Dammann wrote:
>> Using the workaround '--bind-to-core' does only make sense for those jobs, 
>> that allocate full nodes, but the majority of our jobs don't do that.
> Why ?
> We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenFOAM and other 
> applications to attach each process on its core because sometimes linux move 
> processes and 2 process can run on the same core, slowing the application. 
> Even if we do not use full nodes.
> '--bind-to-core' is only not applicable if you mix OpenMP and MPI as all your 
> threads will be binded to the same core but I do not remember that OpenFOAM 
> does this yet.

But if your jobs don't allocate full nodes and there are two jobs on the same 
node
they can end up bound to the same subset of cores.  Torque cpusets should in 
principle be able to do this (queuing system allocates distinct sets of cores to
distinct jobs), but I've never used them myself.

Here we've just basically given up on jobs that allocate a non-integer # of 
nodes.  In principle they can (and then I turn off bind by core), but hardly 
anyone 
does it except for some serial jobs.  Then again, we have a mix of 8 and 16 core
nodes.  If we had only 32 or 64 core nodes we might be less tolerant of this 
restriction.

        Noam

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

Reply via email to