Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups
Chis, If you assume your Cpusets are correct, and you are not doing any hybrid thread+mpi I found the problem is avoided if you enable -bind-to-core with openmpi 1.6.x We just don't enable binding by default on our setup and thus far no users have been bit by this. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Nov 5, 2012, at 9:00 PM, Christopher Samuel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 06/11/12 08:57, Brock Palen wrote: > >> Ok more information (had to build newer hwloc) My job today only >> 2 processes are running at half speed and they indeed are sharing >> the same core: > > We've seen the same occasionally using CentOS5/RHEL5 with jobs running > under Torque with cpusets enabled. > > Never been able to explain it and the most recent case was someone > using a home compiled version of NAMD, the problem disappeared when > they started using our provided builds. > > I was fixing up the running problem jobs by hand by assigning procs to > individual cores on the nodes with cpusets. :-/ > > cheers, > Chris > - -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ > > iEYEARECAAYFAlCYb1sACgkQO2KABBYQAh/OGACeNL7bow7z26El31zIg16q+tPw > toIAnigL5SHhZXM42DGY3M2Ewt6PUNIk > =/bNA > -END PGP SIGNATURE- > ___ > hwloc-users mailing list > hwloc-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/11/12 08:57, Brock Palen wrote: > Ok more information (had to build newer hwloc) My job today only > 2 processes are running at half speed and they indeed are sharing > the same core: We've seen the same occasionally using CentOS5/RHEL5 with jobs running under Torque with cpusets enabled. Never been able to explain it and the most recent case was someone using a home compiled version of NAMD, the problem disappeared when they started using our provided builds. I was fixing up the running problem jobs by hand by assigning procs to individual cores on the nodes with cpusets. :-/ cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iEYEARECAAYFAlCYb1sACgkQO2KABBYQAh/OGACeNL7bow7z26El31zIg16q+tPw toIAnigL5SHhZXM42DGY3M2Ewt6PUNIk =/bNA -END PGP SIGNATURE-
Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups
Brice Goglin, le Mon 05 Nov 2012 23:23:42 +0100, a écrit : > top can also sort by the last used CPU. Type f to enter the config menu, > hilight the "last cpu" line, and hit 's' to make it the sort column. With older versions of top, type F, then j, then space. Samuel
Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups
Le 05/11/2012 22:57, Brock Palen a écrit : > Ok more information (had to build newer hwloc) My job today only 2 processes > are running at half speed and they indeed are sharing the same core: > > [root@nyx7000 ~]# for x in `cat /tmp/pids `; do echo -n "$x "; hwloc-bind > --get-last-cpu-location --pid $x; done | sort -k 2 > 1164 0x0001,0x0 > 1158 0x0010,0x0 > 1165 0x0010,0x0 > 1167 0x0020 > 1157 0x0200 > 1159 0x0400 > 1160 0x2000 > 1163 0x4000 > 1166 0x0002 > 1161 0x0004 > 1168 0x0020 > 1162 0x0040 > > 1157 brockp20 0 1885m 1.8g 456 R 99.6 0.2 9:49.55 stream > > 1159 brockp20 0 1885m 1.8g 456 R 99.6 0.2 8:10.91 stream > > 1161 brockp20 0 1885m 1.8g 456 R 99.6 0.2 9:49.55 stream > > 1162 brockp20 0 1885m 1.8g 456 R 99.6 0.2 9:49.54 stream > > 1163 brockp20 0 1885m 1.8g 456 R 99.6 0.2 9:49.55 stream > > 1164 brockp20 0 1885m 1.8g 456 R 99.6 0.2 9:49.53 stream > > 1160 brockp20 0 1885m 1.8g 456 R 97.7 0.2 9:49.54 stream > > 1166 brockp20 0 1885m 1.8g 456 R 97.7 0.2 9:49.53 stream > > 1167 brockp20 0 1885m 1.8g 456 R 97.7 0.2 9:49.46 stream > > 1168 brockp20 0 1885m 1.8g 456 R 97.7 0.2 8:10.86 stream > > 1158 brockp20 0 1885m 1.8g 456 R 48.9 0.2 4:54.78 stream > > 1165 brockp20 0 1885m 1.8g 456 R 48.9 0.2 4:54.76 stream > > > > This is very strange. Is there a way to ask hwloc to show me all processes > that are using a given cpu? > No there's no easy way to do that. You should first check whether this given cpu is idle or not. Running top and pressing 1 will show one line per CPU (yours should be the second CPU line). top can also sort by the last used CPU. Type f to enter the config menu, hilight the "last cpu" line, and hit 's' to make it the sort column. Assuming your top version isn't too different from mine, you should be able to quickly see if any process used your given cpu recently. Brice
Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups
Ok more information (had to build newer hwloc) My job today only 2 processes are running at half speed and they indeed are sharing the same core: [root@nyx7000 ~]# for x in `cat /tmp/pids `; do echo -n "$x "; hwloc-bind --get-last-cpu-location --pid $x; done | sort -k 2 1164 0x0001,0x0 1158 0x0010,0x0 1165 0x0010,0x0 1167 0x0020 1157 0x0200 1159 0x0400 1160 0x2000 1163 0x4000 1166 0x0002 1161 0x0004 1168 0x0020 1162 0x0040 1157 brockp20 0 1885m 1.8g 456 R 99.6 0.2 9:49.55 stream 1159 brockp20 0 1885m 1.8g 456 R 99.6 0.2 8:10.91 stream 1161 brockp20 0 1885m 1.8g 456 R 99.6 0.2 9:49.55 stream 1162 brockp20 0 1885m 1.8g 456 R 99.6 0.2 9:49.54 stream 1163 brockp20 0 1885m 1.8g 456 R 99.6 0.2 9:49.55 stream 1164 brockp20 0 1885m 1.8g 456 R 99.6 0.2 9:49.53 stream 1160 brockp20 0 1885m 1.8g 456 R 97.7 0.2 9:49.54 stream 1166 brockp20 0 1885m 1.8g 456 R 97.7 0.2 9:49.53 stream 1167 brockp20 0 1885m 1.8g 456 R 97.7 0.2 9:49.46 stream 1168 brockp20 0 1885m 1.8g 456 R 97.7 0.2 8:10.86 stream 1158 brockp20 0 1885m 1.8g 456 R 48.9 0.2 4:54.78 stream 1165 brockp20 0 1885m 1.8g 456 R 48.9 0.2 4:54.76 stream This is very strange. Is there a way to ask hwloc to show me all processes that are using a given cpu? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Nov 2, 2012, at 4:30 PM, Brice Goglin wrote: > Le 02/11/2012 21:22, Brice Goglin a écrit : >> hwloc-bind --get-last-cpu-location --pid should give the same >> info but it seems broken on my machine right now, going to debug. > > Actually, that works fine once you try it on a non-multithreaded program > that uses all cores :) > > So you can use top or hwloc-bind --get-last-cpu-location --pid to > find out where each process runs. > > Brice > > ___ > hwloc-users mailing list > hwloc-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups
Le 02/11/2012 21:22, Brice Goglin a écrit : > hwloc-bind --get-last-cpu-location --pid should give the same > info but it seems broken on my machine right now, going to debug. Actually, that works fine once you try it on a non-multithreaded program that uses all cores :) So you can use top or hwloc-bind --get-last-cpu-location --pid to find out where each process runs. Brice
Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups
Le 02/11/2012 21:03, Brock Palen a écrit : > This isn't a hwloc problem exactly, but maybe you can shed some insight. > > We have some 4 socket 10 core = 40 core nodes, HT off: > > depth 0: 1 Machine (type #1) > depth 1: 4 NUMANodes (type #2) > depth 2:4 Sockets (type #3) >depth 3: 4 Caches (type #4) > depth 4: 40 Caches (type #4) > depth 5: 40 Caches (type #4) > depth 6:40 Cores (type #5) >depth 7: 40 PUs (type #6) > > > We run rhel 6.3 we use torque to create cgroups for jobs. I get the > following cgroup for this job all 12 cores for the job are on one node: > cat /dev/cpuset/torque/8845236.nyx.engin.umich.edu/cpus > 0-1,4-5,8,12,16,20,24,28,32,36 > > Not all nicely spaced, but 12 cores > > I then start a code, even a simple serial code with openmpi 1.6.0 on all 12 > cores: > mpirun ./stream > > 45521 brockp20 0 1885m 1.8g 456 R 100.0 0.2 4:02.72 stream > > 45522 brockp20 0 1885m 1.8g 456 R 100.0 0.2 1:46.08 stream > > 45525 brockp20 0 1885m 1.8g 456 R 100.0 0.2 4:02.72 stream > > 45526 brockp20 0 1885m 1.8g 456 R 100.0 0.2 1:46.07 stream > > 45527 brockp20 0 1885m 1.8g 456 R 100.0 0.2 4:02.71 stream > > 45528 brockp20 0 1885m 1.8g 456 R 100.0 0.2 4:02.71 stream > > 45532 brockp20 0 1885m 1.8g 456 R 100.0 0.2 1:46.05 stream > > 45529 brockp20 0 1885m 1.8g 456 R 99.2 0.2 4:02.70 stream > > 45530 brockp20 0 1885m 1.8g 456 R 99.2 0.2 4:02.70 stream > > 45531 brockp20 0 1885m 1.8g 456 R 33.6 0.2 1:20.89 stream > > 45523 brockp20 0 1885m 1.8g 456 R 32.8 0.2 1:20.90 stream > > 45524 brockp20 0 1885m 1.8g 456 R 32.8 0.2 1:20.89 stream > > Note the processes that are not running at 100% cpu, > > hwloc-bind --get --pid 45523 > 0x0011,0x1133 > Hello Brock, I don't see any helpful to answer here :/ Do you know which core is overloaded and which (two?) cores are idle? Does that change during one run or from one run to another? Pressing 1 in top should give that information in the very first lines. Then, you can try to binding another process to one of the idle cores, to see if the kernel accepts that. You can also press "f" and "j" (or "f" and use arrows and space to select "last used cpu") to add a "P" line which tells you the last CPU used by each process. hwloc-bind --get-last-cpu-location --pid should give the same info but it seems broken on my machine right now, going to debug. One thing to check would be to run more than 12 cores and check where the kernel puts them. If it keeps ignoring two cores, that would be funny :) Brice