Chis,
If you assume your Cpusets are correct, and you are not doing any hybrid
thread+mpi I found the problem is avoided if you enable -bind-to-core with
openmpi 1.6.x
We just don't enable binding by default on our setup and thus far no users have
been bit by this.
Brock Palen
www.umich.ed
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 06/11/12 08:57, Brock Palen wrote:
> Ok more information (had to build newer hwloc) My job today only
> 2 processes are running at half speed and they indeed are sharing
> the same core:
We've seen the same occasionally using CentOS5/RHEL5 with j
Brice Goglin, le Mon 05 Nov 2012 23:23:42 +0100, a écrit :
> top can also sort by the last used CPU. Type f to enter the config menu,
> hilight the "last cpu" line, and hit 's' to make it the sort column.
With older versions of top, type F, then j, then space.
Samuel
Le 05/11/2012 22:57, Brock Palen a écrit :
> Ok more information (had to build newer hwloc) My job today only 2 processes
> are running at half speed and they indeed are sharing the same core:
>
> [root@nyx7000 ~]# for x in `cat /tmp/pids `; do echo -n "$x "; hwloc-bind
> --get-last-cpu-locatio
Ok more information (had to build newer hwloc) My job today only 2 processes
are running at half speed and they indeed are sharing the same core:
[root@nyx7000 ~]# for x in `cat /tmp/pids `; do echo -n "$x "; hwloc-bind
--get-last-cpu-location --pid $x; done | sort -k 2
1164 0x0001,0x0
11
Le 02/11/2012 21:22, Brice Goglin a écrit :
> hwloc-bind --get-last-cpu-location --pid should give the same
> info but it seems broken on my machine right now, going to debug.
Actually, that works fine once you try it on a non-multithreaded program
that uses all cores :)
So you can use top or hw
Le 02/11/2012 21:03, Brock Palen a écrit :
> This isn't a hwloc problem exactly, but maybe you can shed some insight.
>
> We have some 4 socket 10 core = 40 core nodes, HT off:
>
> depth 0: 1 Machine (type #1)
> depth 1: 4 NUMANodes (type #2)
> depth 2:4 Sockets (type #3)
>depth