On Tue, Dec 17, 2013 at 11:16:48AM -0500, Noam Bernstein wrote:
> On Dec 17, 2013, at 11:04 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> > Are you binding the procs? We don't bind by default (this will change in 
> > 1.7.4), and binding can play a significant role when comparing across 
> > kernels.
> > 
> > add "--bind-to-core" to your cmd line
> 
> I've previously always used mpi_paffinity_alone=1, and the new behavior
> seems to be independent of whether or not I use it.  I'll try bind-to-core.

That would be the problem. That variable no longer exists in 1.7.4 and
has been replaced by hwloc_base_binding_policy. --bind-to core is an
alias of -mca hwloc_base_binding_policy core.

> One more possible clue.  I haven't done a full test, but for one
> particular setup (newer nodes, single node so presumably using
> sm), there are apparently two ways to fix the problem:
> 1. go back to the previous kernel, but stick with openmpi 1.7.3
> 2. stick with the new kernel, but go back to openmpi 1.6.4
> 
> So it appears to be some interaction between the new kernel and 1.7.3 that
> isn't present with 1.6.4.
> 
> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some 
> collective communication), but now I'm wondering whether I should just test
> 1.6.5.
> 
>                                                                       Noam
> 



> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Attachment: pgpvsBxN0Llm0.pgp
Description: PGP signature

Reply via email to