Re: [hwloc-users] Thread binding problem

2012-09-19 Thread Samuel Thibault
Hello, 2012/9/6 Brice Goglin > > Anyway, having 65000 mempolicies in use is a lot. And that would somehow > correspond to the number of set_area_membind that succeeed before one > fails. So the kernel might indeed fail to merge those. > > That said, these objects are small (24by

Re: [hwloc-users] Thread binding problem

2012-09-07 Thread Gabriele Fatigati
>did you gather this info during the sleep(10) after the failure before >the program exits ? Yes. >You likely need numa devel if you're configuring/building hwloc. The >summary at the end of the hwloc configure will tell you if memory >binding is supported or not, it mostly depends on numa devel.

Re: [hwloc-users] Thread binding problem

2012-09-07 Thread Brice Goglin
Le 07/09/2012 09:43, Gabriele Fatigati a écrit : > Hi, > > Good, you found the kernel limit that exceed. > > proc/memfinfo reports as MemFree 47834588 kB > > numactl -H: > > available: 2 nodes (0-1) > node 0 size: 24194 MB > node 0 free: 22702 MB > node 1 size: 24240 MB > node 1 free: 23997 MB

Re: [hwloc-users] Thread binding problem

2012-09-07 Thread Gabriele Fatigati
Hi, Good, you found the kernel limit that exceed. proc/memfinfo reports as MemFree 47834588 kB numactl -H: available: 2 nodes (0-1) node 0 size: 24194 MB node 0 free: 22702 MB node 1 size: 24240 MB node 1 free: 23997 MB node distances: node 0 1 0: 10 21 1: 21 10 Are you able

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 14:51, Gabriele Fatigati a écrit : > Hi Brice, > > the initial grep is: > > numa_policy65671 65952 24 1441 : tunables 120 60 >8 : slabdata458458 0 > > When set_membind fails is: > > numa_policy 482 1152 24 1441 : tunables 120

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
Hi Brice, the initial grep is: numa_policy65671 65952 24 1441 : tunables 120 608 : slabdata458458 0 When set_membind fails is: numa_policy 482 1152 24 1441 : tunables 120 608 : slabdata 8 8288 What does it means?

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 12:19, Gabriele Fatigati a écrit : > I did't find any strange number in /proc/meminfo. > > I've noted that the program fails exactly > every 65479 hwloc_set_area_membind. So It sounds like some kernel > limit. You can check that also just one thread. > > Maybe never has not noted them

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
I did't find any strange number in /proc/meminfo. I've noted that the program fails exactly every 65479 hwloc_set_area_membind. So It sounds like some kernel limit. You can check that also just one thread. Maybe never has not noted them because usually we bind a large amount of contiguos memory

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 10:44, Samuel Thibault a écrit : > Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit : >> mbind hwloc_linux_set_area_membind() fails: >> >> Error from HWLOC mbind: Cannot allocate memory > Ok. mbind is not really supposed to allocate much memory, but it still > does allo

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Samuel Thibault
Samuel Thibault, le Thu 06 Sep 2012 10:45:45 +0200, a écrit : > Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit : > > mbind hwloc_linux_set_area_membind()  fails: > > > > Error from HWLOC mbind: Cannot allocate memory  > > Ok. mbind is not really supposed to allocate much memory, bu

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Samuel Thibault
Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit : > mbind hwloc_linux_set_area_membind()  fails: > > Error from HWLOC mbind: Cannot allocate memory  Ok. mbind is not really supposed to allocate much memory, but it still does allocate some, to record the policy > //hwloc_obj

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
Oops, I forgot the hwloc_topology_destroy() and also hwloc_bitmap_free(cpuset); Added them, I attach new code using hwloc_set_area_membind function directly and new Valgrind output. 2012/9/6 Brice Goglin > Le 06/09/2012 10:13, Gabriele Fatigati a écrit : > > Downsizing the array, up to 4GB, >

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 10:13, Gabriele Fatigati a écrit : > Downsizing the array, up to 4GB, > > valgrind gives many warnings reported in the attached file. Adding hwloc_topology_destroy() at the end of the file would likely remove most of them. But that won't fix the problem since the leaks are small. =

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
Downsizing the array, up to 4GB, valgrind gives many warnings reported in the attached file. 2012/9/6 Gabriele Fatigati > Sorry, > > I used a wrong hwloc installation. Using the hwloc with the printf > controls: > > mbind hwloc_linux_set_area_membind() fails: > > Error from HWLOC mbind

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
Sorry, I used a wrong hwloc installation. Using the hwloc with the printf controls: mbind hwloc_linux_set_area_membind() fails: Error from HWLOC mbind: Cannot allocate memory so this is the origin of bad allocation. I attach the right valgrind output valgrind --track-origins=yes --log-file=o

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 09:56, Gabriele Fatigati a écrit : > Hi Brice, hi Jeff, > > >Can you add some printf inside hwloc_linux_set_area_membind() in > src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or > not? > > I added printf inside that function, but ENOMEM does not come from there.

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
Hi Brice, hi Jeff, >Can you add some printf inside hwloc_linux_set_area_membind() in src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or not? I added printf inside that function, but ENOMEM does not come from there. >Have you run your application through valgrind or another me

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Jeff Squyres
On Sep 5, 2012, at 2:36 PM, Gabriele Fatigati wrote: > I don't think is a simply out of memory since NUMA node has 48 GB, and I'm > allocating just 8 GB. Mmm. Probably right. Have you run your application through valgrind or another memory-checking debugger? I've seen cases of heap corruptio

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
Dear Jeff, I don't think is a simply out of memory since NUMA node has 48 GB, and I'm allocating just 8 GB. 2012/9/5 Jeff Squyres > Perhaps you simply have run out of memory on that NUMA node, and therefore > the malloc failed. Check "numactl --hardware", for example. > > You might want to che

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Jeff Squyres
Perhaps you simply have run out of memory on that NUMA node, and therefore the malloc failed. Check "numactl --hardware", for example. You might want to check the output of numastat to see if one or more of your NUMA nodes have run out of memory. On Sep 5, 2012, at 12:58 PM, Gabriele Fatigat

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
I've reproduced the problem in a small MPI + OpenMP code. The error is the same: after some memory bind, gives "Cannot allocate memory". Thanks. 2012/9/5 Gabriele Fatigati > Downscaling the matrix size, binding works well, but the memory available > is enought also using more big matrix, so I'

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
Downscaling the matrix size, binding works well, but the memory available is enought also using more big matrix, so I'm a bit confused. Using the same big matrix size without binding the code works well, so how I can explain this behaviour? Maybe hwloc_set_area_membind_nodeset introduces other ex

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
An internal malloc failed then. That would explain why your malloc failed too. It looks like you malloc'ed too much memory in your program? Brice Le 05/09/2012 15:56, Gabriele Fatigati a écrit : > An update: > > placing strerror(errno) after hwloc_set_area_membind_nodeset gives: > "Cannot all

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
An update: placing strerror(errno) after hwloc_set_area_membind_nodeset gives: "Cannot allocate memory" 2012/9/5 Gabriele Fatigati > Hi, > > I've noted that hwloc_set_area_membind_nodeset return -1 but errno is not > equal to EXDEV or ENOSYS. I supposed that these two case was the two unique >

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
What does errno contain? Aside of ENOSYS and EXDEV, you may also get the "usual" error codes such as ENOMEM, EPERM or EINVAL. We didn't document all of them, it mostly depends on the underlying kernel and mbind implementations. Brice Le 05/09/2012 15:44, Gabriele Fatigati a écrit : > Hi, > > I'v

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
Hi, I've noted that hwloc_set_area_membind_nodeset return -1 but errno is not equal to EXDEV or ENOSYS. I supposed that these two case was the two unique possibly. >From the hwloc documentation: -1 with errno set to ENOSYS if the action is not supported -1 with errno set to EXDEV if the binding

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
Hello Gabriele, The only limit that I would think of is the available physical memory on each NUMA node (numactl -H will tell you how much of each NUMA node memory is still available). malloc usually only fails (it returns NULL?) when there no *virtual* memory anymore, that's different. If you don

[hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
Dear Hwloc users and developers, I'm using hwloc 1.4.1 on a multithreaded program in a Linux platform, where each thread bind many non contiguos pieces of a big matrix using in a very intensive way hwloc_set_area_membind_nodeset function: hwloc_set_area_membind_nodeset(topology, punt+offset, len