Hello,
2012/9/6 Brice Goglin
>
> Anyway, having 65000 mempolicies in use is a lot. And that would somehow
> correspond to the number of set_area_membind that succeeed before one
> fails. So the kernel might indeed fail to merge those.
>
> That said, these objects are small (24by
>did you gather this info during the sleep(10) after the failure before
>the program exits ?
Yes.
>You likely need numa devel if you're configuring/building hwloc. The
>summary at the end of the hwloc configure will tell you if memory
>binding is supported or not, it mostly depends on numa devel.
Le 07/09/2012 09:43, Gabriele Fatigati a écrit :
> Hi,
>
> Good, you found the kernel limit that exceed.
>
> proc/memfinfo reports as MemFree 47834588 kB
>
> numactl -H:
>
> available: 2 nodes (0-1)
> node 0 size: 24194 MB
> node 0 free: 22702 MB
> node 1 size: 24240 MB
> node 1 free: 23997 MB
Hi,
Good, you found the kernel limit that exceed.
proc/memfinfo reports as MemFree 47834588 kB
numactl -H:
available: 2 nodes (0-1)
node 0 size: 24194 MB
node 0 free: 22702 MB
node 1 size: 24240 MB
node 1 free: 23997 MB
node distances:
node 0 1
0: 10 21
1: 21 10
Are you able
Le 06/09/2012 14:51, Gabriele Fatigati a écrit :
> Hi Brice,
>
> the initial grep is:
>
> numa_policy65671 65952 24 1441 : tunables 120 60
>8 : slabdata458458 0
>
> When set_membind fails is:
>
> numa_policy 482 1152 24 1441 : tunables 120
Hi Brice,
the initial grep is:
numa_policy65671 65952 24 1441 : tunables 120 608
: slabdata458458 0
When set_membind fails is:
numa_policy 482 1152 24 1441 : tunables 120 608
: slabdata 8 8288
What does it means?
Le 06/09/2012 12:19, Gabriele Fatigati a écrit :
> I did't find any strange number in /proc/meminfo.
>
> I've noted that the program fails exactly
> every 65479 hwloc_set_area_membind. So It sounds like some kernel
> limit. You can check that also just one thread.
>
> Maybe never has not noted them
I did't find any strange number in /proc/meminfo.
I've noted that the program fails exactly
every 65479 hwloc_set_area_membind. So It sounds like some kernel limit.
You can check that also just one thread.
Maybe never has not noted them because usually we bind a large amount of
contiguos memory
Le 06/09/2012 10:44, Samuel Thibault a écrit :
> Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit :
>> mbind hwloc_linux_set_area_membind() fails:
>>
>> Error from HWLOC mbind: Cannot allocate memory
> Ok. mbind is not really supposed to allocate much memory, but it still
> does allo
Samuel Thibault, le Thu 06 Sep 2012 10:45:45 +0200, a écrit :
> Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit :
> > mbind hwloc_linux_set_area_membind() fails:
> >
> > Error from HWLOC mbind: Cannot allocate memory
>
> Ok. mbind is not really supposed to allocate much memory, bu
Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit :
> mbind hwloc_linux_set_area_membind() fails:
>
> Error from HWLOC mbind: Cannot allocate memory
Ok. mbind is not really supposed to allocate much memory, but it still
does allocate some, to record the policy
> //hwloc_obj
Oops,
I forgot the hwloc_topology_destroy() and also hwloc_bitmap_free(cpuset);
Added them, I attach new code using hwloc_set_area_membind function
directly and new Valgrind output.
2012/9/6 Brice Goglin
> Le 06/09/2012 10:13, Gabriele Fatigati a écrit :
>
> Downsizing the array, up to 4GB,
>
Le 06/09/2012 10:13, Gabriele Fatigati a écrit :
> Downsizing the array, up to 4GB,
>
> valgrind gives many warnings reported in the attached file.
Adding hwloc_topology_destroy() at the end of the file would likely
remove most of them.
But that won't fix the problem since the leaks are small.
=
Downsizing the array, up to 4GB,
valgrind gives many warnings reported in the attached file.
2012/9/6 Gabriele Fatigati
> Sorry,
>
> I used a wrong hwloc installation. Using the hwloc with the printf
> controls:
>
> mbind hwloc_linux_set_area_membind() fails:
>
> Error from HWLOC mbind
Sorry,
I used a wrong hwloc installation. Using the hwloc with the printf controls:
mbind hwloc_linux_set_area_membind() fails:
Error from HWLOC mbind: Cannot allocate memory
so this is the origin of bad allocation.
I attach the right valgrind output
valgrind --track-origins=yes --log-file=o
Le 06/09/2012 09:56, Gabriele Fatigati a écrit :
> Hi Brice, hi Jeff,
>
> >Can you add some printf inside hwloc_linux_set_area_membind() in
> src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or
> not?
>
> I added printf inside that function, but ENOMEM does not come from there.
Hi Brice, hi Jeff,
>Can you add some printf inside hwloc_linux_set_area_membind() in
src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or not?
I added printf inside that function, but ENOMEM does not come from there.
>Have you run your application through valgrind or another me
On Sep 5, 2012, at 2:36 PM, Gabriele Fatigati wrote:
> I don't think is a simply out of memory since NUMA node has 48 GB, and I'm
> allocating just 8 GB.
Mmm. Probably right.
Have you run your application through valgrind or another memory-checking
debugger?
I've seen cases of heap corruptio
Dear Jeff,
I don't think is a simply out of memory since NUMA node has 48 GB, and I'm
allocating just 8 GB.
2012/9/5 Jeff Squyres
> Perhaps you simply have run out of memory on that NUMA node, and therefore
> the malloc failed. Check "numactl --hardware", for example.
>
> You might want to che
Perhaps you simply have run out of memory on that NUMA node, and therefore the
malloc failed. Check "numactl --hardware", for example.
You might want to check the output of numastat to see if one or more of your
NUMA nodes have run out of memory.
On Sep 5, 2012, at 12:58 PM, Gabriele Fatigat
I've reproduced the problem in a small MPI + OpenMP code.
The error is the same: after some memory bind, gives "Cannot allocate
memory".
Thanks.
2012/9/5 Gabriele Fatigati
> Downscaling the matrix size, binding works well, but the memory available
> is enought also using more big matrix, so I'
Downscaling the matrix size, binding works well, but the memory available
is enought also using more big matrix, so I'm a bit confused.
Using the same big matrix size without binding the code works well, so how
I can explain this behaviour?
Maybe hwloc_set_area_membind_nodeset introduces other ex
An internal malloc failed then. That would explain why your malloc
failed too.
It looks like you malloc'ed too much memory in your program?
Brice
Le 05/09/2012 15:56, Gabriele Fatigati a écrit :
> An update:
>
> placing strerror(errno) after hwloc_set_area_membind_nodeset gives:
> "Cannot all
An update:
placing strerror(errno) after hwloc_set_area_membind_nodeset gives:
"Cannot allocate memory"
2012/9/5 Gabriele Fatigati
> Hi,
>
> I've noted that hwloc_set_area_membind_nodeset return -1 but errno is not
> equal to EXDEV or ENOSYS. I supposed that these two case was the two unique
>
What does errno contain?
Aside of ENOSYS and EXDEV, you may also get the "usual" error codes such
as ENOMEM, EPERM or EINVAL.
We didn't document all of them, it mostly depends on the underlying
kernel and mbind implementations.
Brice
Le 05/09/2012 15:44, Gabriele Fatigati a écrit :
> Hi,
>
> I'v
Hi,
I've noted that hwloc_set_area_membind_nodeset return -1 but errno is not
equal to EXDEV or ENOSYS. I supposed that these two case was the two unique
possibly.
>From the hwloc documentation:
-1 with errno set to ENOSYS if the action is not supported
-1 with errno set to EXDEV if the binding
Hello Gabriele,
The only limit that I would think of is the available physical memory on
each NUMA node (numactl -H will tell you how much of each NUMA node
memory is still available).
malloc usually only fails (it returns NULL?) when there no *virtual*
memory anymore, that's different. If you don
Dear Hwloc users and developers,
I'm using hwloc 1.4.1 on a multithreaded program in a Linux platform, where
each thread bind many non contiguos pieces of a big matrix using in a very
intensive way hwloc_set_area_membind_nodeset function:
hwloc_set_area_membind_nodeset(topology, punt+offset, len
28 matches
Mail list logo