Re: [hwloc-users] Thread binding problem

2012-09-19 Thread Samuel Thibault
Hello, 2012/9/6 Brice Goglin > > Anyway, having 65000 mempolicies in use is a lot. And that would somehow > correspond to the number of set_area_membind that succeeed before one > fails. So the kernel might indeed fail to merge those. > > That said, these

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 14:51, Gabriele Fatigati a écrit : > Hi Brice, > > the initial grep is: > > numa_policy65671 65952 24 1441 : tunables 120 60 >8 : slabdata458458 0 > > When set_membind fails is: > > numa_policy 482 1152 24 1441 : tunables 120

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Samuel Thibault
Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit : > mbind hwloc_linux_set_area_membind()  fails: > > Error from HWLOC mbind: Cannot allocate memory  Ok. mbind is not really supposed to allocate much memory, but it still does allocate some, to record the policy > //

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
Oops, I forgot the hwloc_topology_destroy() and also hwloc_bitmap_free(cpuset); Added them, I attach new code using hwloc_set_area_membind function directly and new Valgrind output. 2012/9/6 Brice Goglin > Le 06/09/2012 10:13, Gabriele Fatigati a écrit : > > Downsizing

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
Downsizing the array, up to 4GB, valgrind gives many warnings reported in the attached file. 2012/9/6 Gabriele Fatigati > Sorry, > > I used a wrong hwloc installation. Using the hwloc with the printf > controls: > > mbind hwloc_linux_set_area_membind() fails: > >

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
Sorry, I used a wrong hwloc installation. Using the hwloc with the printf controls: mbind hwloc_linux_set_area_membind() fails: Error from HWLOC mbind: Cannot allocate memory so this is the origin of bad allocation. I attach the right valgrind output valgrind --track-origins=yes

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 09:56, Gabriele Fatigati a écrit : > Hi Brice, hi Jeff, > > >Can you add some printf inside hwloc_linux_set_area_membind() in > src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or > not? > > I added printf inside that function, but ENOMEM does not come from there.

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
Hi Brice, hi Jeff, >Can you add some printf inside hwloc_linux_set_area_membind() in src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or not? I added printf inside that function, but ENOMEM does not come from there. >Have you run your application through valgrind or another

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Jeff Squyres
On Sep 5, 2012, at 2:36 PM, Gabriele Fatigati wrote: > I don't think is a simply out of memory since NUMA node has 48 GB, and I'm > allocating just 8 GB. Mmm. Probably right. Have you run your application through valgrind or another memory-checking debugger? I've seen cases of heap

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
Dear Jeff, I don't think is a simply out of memory since NUMA node has 48 GB, and I'm allocating just 8 GB. 2012/9/5 Jeff Squyres > Perhaps you simply have run out of memory on that NUMA node, and therefore > the malloc failed. Check "numactl --hardware", for example. > >

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Jeff Squyres
Perhaps you simply have run out of memory on that NUMA node, and therefore the malloc failed. Check "numactl --hardware", for example. You might want to check the output of numastat to see if one or more of your NUMA nodes have run out of memory. On Sep 5, 2012, at 12:58 PM, Gabriele

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
I've reproduced the problem in a small MPI + OpenMP code. The error is the same: after some memory bind, gives "Cannot allocate memory". Thanks. 2012/9/5 Gabriele Fatigati > Downscaling the matrix size, binding works well, but the memory available > is enought also using

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
Downscaling the matrix size, binding works well, but the memory available is enought also using more big matrix, so I'm a bit confused. Using the same big matrix size without binding the code works well, so how I can explain this behaviour? Maybe hwloc_set_area_membind_nodeset introduces other

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
An internal malloc failed then. That would explain why your malloc failed too. It looks like you malloc'ed too much memory in your program? Brice Le 05/09/2012 15:56, Gabriele Fatigati a écrit : > An update: > > placing strerror(errno) after hwloc_set_area_membind_nodeset gives: > "Cannot

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
An update: placing strerror(errno) after hwloc_set_area_membind_nodeset gives: "Cannot allocate memory" 2012/9/5 Gabriele Fatigati > Hi, > > I've noted that hwloc_set_area_membind_nodeset return -1 but errno is not > equal to EXDEV or ENOSYS. I supposed that these two

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
What does errno contain? Aside of ENOSYS and EXDEV, you may also get the "usual" error codes such as ENOMEM, EPERM or EINVAL. We didn't document all of them, it mostly depends on the underlying kernel and mbind implementations. Brice Le 05/09/2012 15:44, Gabriele Fatigati a écrit : > Hi, > >

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
Hi, I've noted that hwloc_set_area_membind_nodeset return -1 but errno is not equal to EXDEV or ENOSYS. I supposed that these two case was the two unique possibly. >From the hwloc documentation: -1 with errno set to ENOSYS if the action is not supported -1 with errno set to EXDEV if the binding

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
Hello Gabriele, The only limit that I would think of is the available physical memory on each NUMA node (numactl -H will tell you how much of each NUMA node memory is still available). malloc usually only fails (it returns NULL?) when there no *virtual* memory anymore, that's different. If you