No problem! Glad we could help, and many thanks for tracking down some of our bugs.

On Apr 24, 2007, at 5:28 PM, Mostyn Lewis wrote:

Well, I'm sorry to have caused even a smidgen of grief here.
I moved aside the *paffinity_linux* module and la and it still
bound. I was using InfiniPath HCAs and beta software and eventually found
(sigh) a variable to stop the affine - IPATH_NO_CPUAFFINITY.

So, a

export IPATH_NO_CPUAFFINITY=1
$OPENMPI_GCC/bin/mpirun -x IPATH_NO_CPUAFFINITY -np 1 -host s0158 ./ cpi

showed me what I wanted to see:

18236:cpi *->0 (f=noaffinity,0,1,2,3)

This, in the jargon of my utility, says the mask for taskset is 0xf
and so is not affined and the ->0 says it's on CPU 0.

The reason all this comes about is I do endless benchmarks for my
employer and get to use Scali, QuickSilver(SilverStorm), Qlogic (InfiniPath),
all the ethernet MPICHes and LAMs (fading fast) - even HP MPI - on
our racks which have x cores / socket and sometimes we like to use
our own methodoligies to choose where to bind and in that case need to
switch off any supplied binding. I really wish the default was no
binding like OpenMPI with docs that point out the variables but it's
not always the case.

Sorry again for any trub,
Mostyn


On Tue, 24 Apr 2007, Jeff Squyres wrote:


On Apr 23, 2007, at 9:22 PM, Mostyn Lewis wrote:

I tried this on a humble PC and it works there.
I see in the --mca mpi_show_mca_params 1 print out that there is a
[bb17:06646] paffinity=
entry, so I expect that sets the value entry back to 0?

There should be an mpi_paffinity_alone parameter; that's what drives
the whole process.

I'll get to the SLES10 cluster when I can (other people doing
benchmarks) and see what I can. I see there's no stdbool.h there,
so maybe this is an artifact of defining the bool type on an
operton. I'll get back to you when I can.

Lack of (bool) shouldn't be a factor.  If it is, we have a bug.

The test of boundness was a perl program invoked via system() in a
C MPI program. The /proc/<pid>/stat result shows the CPU you are
bound to (3rd number from the end) and a taskset call gets back the
mask to show if you are bound or not.

Hmm.  What version kernel do you have?  I know there were some issues
with this information until recent versions (I confess to not knowing
which version the information became stable/reliable, unfortunately).

Are you launching under a scheduler, perchance?  N1GE may be setting
affinity before MPI processes are even launched, for example...?
(I'm not too familiar with N1GE -- I'm speculating).

There's a simple acid test to see if OMPI is setting the affinity or
not: remove the linux paffinity component (assuming you compiled the
components as plugins/dynamic shared objects).  Go to the OMPI
installation directory:

        $prefix/lib/openmpi

There should be 2 files in there named mca_paffinity_linux.*.  This
is the component that knows how to set processor affinity in Open
MPI; if it's not there, Open MPI won't know how to set affinity on
your system (and therefore won't).  Rename or move these files so
that they are not findable, such as:

        cd $prefix/lib/openmpi
        mkdir junk
        mv *paffinity_linux* junk

And run your test again.  If you're still getting affinity set, then
it's not Open MPI that is setting it.

--
Jeff Squyres
Cisco Systems

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to