Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-06 Thread Brock Palen
Chis, If you assume your Cpusets are correct, and you are not doing any hybrid thread+mpi I found the problem is avoided if you enable -bind-to-core with openmpi 1.6.x We just don't enable binding by default on our setup and thus far no users have been bit by this. Brock Palen www.umich.ed

Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-05 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/11/12 08:57, Brock Palen wrote: > Ok more information (had to build newer hwloc) My job today only > 2 processes are running at half speed and they indeed are sharing > the same core: We've seen the same occasionally using CentOS5/RHEL5 with j

Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-05 Thread Samuel Thibault
Brice Goglin, le Mon 05 Nov 2012 23:23:42 +0100, a écrit : > top can also sort by the last used CPU. Type f to enter the config menu, > hilight the "last cpu" line, and hit 's' to make it the sort column. With older versions of top, type F, then j, then space. Samuel

Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-05 Thread Brice Goglin
Le 05/11/2012 22:57, Brock Palen a écrit : > Ok more information (had to build newer hwloc) My job today only 2 processes > are running at half speed and they indeed are sharing the same core: > > [root@nyx7000 ~]# for x in `cat /tmp/pids `; do echo -n "$x "; hwloc-bind > --get-last-cpu-locatio

Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-05 Thread Brock Palen
Ok more information (had to build newer hwloc) My job today only 2 processes are running at half speed and they indeed are sharing the same core: [root@nyx7000 ~]# for x in `cat /tmp/pids `; do echo -n "$x "; hwloc-bind --get-last-cpu-location --pid $x; done | sort -k 2 1164 0x0001,0x0 11

Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-02 Thread Brice Goglin
Le 02/11/2012 21:22, Brice Goglin a écrit : > hwloc-bind --get-last-cpu-location --pid should give the same > info but it seems broken on my machine right now, going to debug. Actually, that works fine once you try it on a non-multithreaded program that uses all cores :) So you can use top or hw

Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-02 Thread Brice Goglin
Le 02/11/2012 21:03, Brock Palen a écrit : > This isn't a hwloc problem exactly, but maybe you can shed some insight. > > We have some 4 socket 10 core = 40 core nodes, HT off: > > depth 0: 1 Machine (type #1) > depth 1: 4 NUMANodes (type #2) > depth 2:4 Sockets (type #3) >depth