Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-06 Thread Brice Goglin
I think we'll have problems on all machines with Magny-Cours *and* cpuset/cgroups restricting the number of available processors. Not sure how widely common this is. I just checked the hwloc v1.2 branch changelog. Nothing really matters for OMPI except the patch I sent below (commit v1.2@3767).

Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-06 Thread Jeff Squyres
Brice -- Should I apply that patch to the OMPI 1.5 series, or should we do a hwloc 1.2.2 release? I.e., is this broken on all AMD/Magny-Cours machines? Should I also do an emergency OMPI 1.5.x release with (essentially) just this fix? (OMPI 1.5.x currently contains hwloc 1.2.0) On Sep 6,

Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-06 Thread Brice Goglin
Le 05/09/2011 21:29, Brice Goglin a écrit : > Dear Ake, > Could you try the attached patch? It's not optimized, but it's probably > going in the right direction. > (and don't forget to remove the above comment-out if you tried it). Actually, now that I've seen your entire topology, I found out

Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-05 Thread Brice Goglin
Le 04/09/2011 23:30, Brice Goglin a écrit : > Le 04/09/2011 22:35, Ake Sandgren a écrit : >> On Sun, 2011-09-04 at 22:13 +0200, Brice Goglin wrote: >>> Hello, >>> >>> Could you log again on this node (with same cgroups enabled), run >>> hwloc-gather-topology >>> and send the resulting

Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-04 Thread Brice Goglin
Le 04/09/2011 22:35, Ake Sandgren a écrit : > On Sun, 2011-09-04 at 22:13 +0200, Brice Goglin wrote: >> Hello, >> >> Could you log again on this node (with same cgroups enabled), run >> hwloc-gather-topology >> and send the resulting .output and .tar.bz2? >> >> Send them to the hwloc-devel

Re: [OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-04 Thread Brice Goglin
Hello, Could you log again on this node (with same cgroups enabled), run hwloc-gather-topology and send the resulting .output and .tar.bz2? Send them to the hwloc-devel or open a ticket on https://svn.open-mpi.org/trac/hwloc (or send them to me in private if you don't want to subscribe).

[OMPI users] Bug in openmpi 1.5.4 in paffinity

2011-09-04 Thread Ake Sandgren
Hi! I'm getting a segfault in hwloc_setup_distances_from_os_matrix in the call to hwloc_bitmap_or due to objs or objs[i]->cpuset being freed and containing garbage, objs[i]->cpuset has infinite < 0. I only get this when using slurm with cgroups, asking for 2 nodes with 1 cpu each. The cpuset is