Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 14:51, Gabriele Fatigati a écrit :
> Hi Brice,
>
> the initial grep is:
>
> numa_policy65671  65952 24  1441 : tunables  120   60
>8 : slabdata458458  0
>
> When set_membind fails is:
>
> numa_policy  482   1152 24  1441 : tunables  120   60
>8 : slabdata  8  8288
>
> What does it means?

The first number is the number of active objects. That means 65000
mempolicy objects were in use on the first line.
(I wonder if you swapped the lines, I expected higher numbers at the end
of the run)

Anyway, having 65000 mempolicies in use is a lot. And that would somehow
correspond to the number of set_area_membind that succeeed before one
fails. So the kernel might indeed fail to merge those.

That said, these objects are small (24bytes here if I am reading things
correctly), so we're talking about 1,6MB only here. So there's still
something else eating all the memory. /proc/meminfo (MemFree) and
numactl -H should again help.

Brice


>
>
>
> 2012/9/6 Brice Goglin  >
>
> Le 06/09/2012 12:19, Gabriele Fatigati a écrit :
>> I did't find any strange number in /proc/meminfo.
>>
>> I've noted that the program fails exactly
>> every 65479 hwloc_set_area_membind. So It sounds like some kernel
>> limit. You can check that also just one thread.
>>
>> Maybe never has not noted them  because usually we bind a large
>> amount of contiguos memory few times, instead of small and non
>> contiguos pieces of memory many and many times.. :(
>
> If you have root access, try (as root)
> watch -n 1 grep numa_policy /proc/slabinfo
> Put a sleep(10) in your program when set_area_membind() fails, and
> don't let your program exit before you can read the content of
> /proc/slabinfo.
>
> Brice
>
>
>
>
>>
>> 2012/9/6 Brice Goglin > >
>>
>> Le 06/09/2012 10:44, Samuel Thibault a écrit :
>> > Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit :
>> >> mbind hwloc_linux_set_area_membind()  fails:
>> >>
>> >> Error from HWLOC mbind: Cannot allocate memory
>> > Ok. mbind is not really supposed to allocate much memory,
>> but it still
>> > does allocate some, to record the policy
>> >
>> >> //hwloc_obj_t obj =
>> hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, tid);
>> >> hwloc_obj_t obj = hwloc_get_obj_by_type(topology,
>> HWLOC_OBJ_PU, tid);
>> >> hwloc_cpuset_t cpuset = hwloc_bitmap_dup(obj->cpuset);
>> >> hwloc_bitmap_singlify(cpuset);
>> >> hwloc_set_cpubind(topology, cpuset,
>> HWLOC_CPUBIND_THREAD);
>> >>
>> >> for( i = chunk*tid; i < len; i+=PAGE_SIZE) {
>> >> //   res =
>> hwloc_set_area_membind_nodeset(topology, [i],
>> PAGE_SIZE, obj->nodeset, HWLOC_MEMBIND_BIND,
>> HWLOC_MEMBIND_THREAD);
>> >>  res = hwloc_set_area_membind(topology,
>> [i], PAGE_SIZE, cpuset, HWLOC_MEMBIND_BIND,
>> HWLOC_MEMBIND_THREAD);
>> > and I'm afraid that calling set_area_membind for each page
>> might be too
>> > dense: the kernel is probably allocating a memory policy
>> record for each
>> > page, not being able to merge adjacent equal policies.
>> >
>>
>> It's supposed to merge VMA with same policies (from what I
>> understand in
>> the code), but I don't know if that actually works.
>> Maybe Gabriele found a kernel bug :)
>>
>> Brice
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>
>>
>>
>>
>> -- 
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it Tel:  
>> +39 051 6171722 
>>
>> g.fatigati [AT] cineca.it   
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) 

Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
Downsizing the array, up to 4GB,

valgrind gives many warnings reported in the attached file.









2012/9/6 Gabriele Fatigati 

> Sorry,
>
> I used a wrong hwloc installation. Using the hwloc with the printf
> controls:
>
> mbind hwloc_linux_set_area_membind()  fails:
>
> Error from HWLOC mbind: Cannot allocate memory
>
> so this is the origin of bad allocation.
>
> I attach the right valgrind output
>
> valgrind --track-origins=yes --log-file=output_valgrind --leak-check=full
> --tool=memcheck  --show-reachable=yes ./main_hybrid_bind_mem
>
>
>
>
>
> 2012/9/6 Gabriele Fatigati 
>
>> Hi Brice, hi Jeff,
>>
>> >Can you add some printf inside hwloc_linux_set_area_membind() in
>> src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or not?
>>
>> I added printf inside that function, but ENOMEM does not come from there.
>>
>> >Have you run your application through valgrind or another
>> memory-checking debugger?
>>
>> I tried with valgrind :
>>
>> valgrind --track-origins=yes --log-file=output_valgrind --leak-check=full
>> --tool=memcheck  --show-reachable=yes ./main_hybrid_bind_mem
>>
>> ==25687== Warning: set address range perms: large range [0x39454040,
>> 0x2218d4040) (undefined)
>> ==25687==
>> ==25687== Valgrind's memory management: out of memory:
>> ==25687==newSuperblock's request for 4194304 bytes failed.
>> ==25687==34253180928 bytes have already been allocated.
>> ==25687== Valgrind cannot continue.  Sorry.
>>
>>
>> I attach the full output.
>>
>>
>> The code dies also using OpenMP pure code. Very misteriously.
>>
>>
>>
>> 2012/9/5 Jeff Squyres 
>>
>>> On Sep 5, 2012, at 2:36 PM, Gabriele Fatigati wrote:
>>>
>>> > I don't think is a simply out of memory since NUMA node has 48 GB, and
>>> I'm allocating just 8 GB.
>>>
>>> Mmm.  Probably right.
>>>
>>> Have you run your application through valgrind or another
>>> memory-checking debugger?
>>>
>>> I've seen cases of heap corruption lead to malloc incorrectly failing
>>> with ENOMEM.
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> ___
>>> hwloc-users mailing list
>>> hwloc-us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>
>>
>>
>>
>> --
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.itTel:   +39 051 6171722
>>
>> g.fatigati [AT] cineca.it
>>
>
>
>
> --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.itTel:   +39 051 6171722
>
> g.fatigati [AT] cineca.it
>



-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it


output_valgrind
Description: Binary data


Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Gabriele Fatigati
Sorry,

I used a wrong hwloc installation. Using the hwloc with the printf controls:

mbind hwloc_linux_set_area_membind()  fails:

Error from HWLOC mbind: Cannot allocate memory

so this is the origin of bad allocation.

I attach the right valgrind output

valgrind --track-origins=yes --log-file=output_valgrind --leak-check=full
--tool=memcheck  --show-reachable=yes ./main_hybrid_bind_mem





2012/9/6 Gabriele Fatigati 

> Hi Brice, hi Jeff,
>
> >Can you add some printf inside hwloc_linux_set_area_membind() in
> src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or not?
>
> I added printf inside that function, but ENOMEM does not come from there.
>
> >Have you run your application through valgrind or another memory-checking
> debugger?
>
> I tried with valgrind :
>
> valgrind --track-origins=yes --log-file=output_valgrind --leak-check=full
> --tool=memcheck  --show-reachable=yes ./main_hybrid_bind_mem
>
> ==25687== Warning: set address range perms: large range [0x39454040,
> 0x2218d4040) (undefined)
> ==25687==
> ==25687== Valgrind's memory management: out of memory:
> ==25687==newSuperblock's request for 4194304 bytes failed.
> ==25687==34253180928 bytes have already been allocated.
> ==25687== Valgrind cannot continue.  Sorry.
>
>
> I attach the full output.
>
>
> The code dies also using OpenMP pure code. Very misteriously.
>
>
>
> 2012/9/5 Jeff Squyres 
>
>> On Sep 5, 2012, at 2:36 PM, Gabriele Fatigati wrote:
>>
>> > I don't think is a simply out of memory since NUMA node has 48 GB, and
>> I'm allocating just 8 GB.
>>
>> Mmm.  Probably right.
>>
>> Have you run your application through valgrind or another memory-checking
>> debugger?
>>
>> I've seen cases of heap corruption lead to malloc incorrectly failing
>> with ENOMEM.
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>
>
>
>
> --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.itTel:   +39 051 6171722
>
> g.fatigati [AT] cineca.it
>



-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it


output_valgrind
Description: Binary data


Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 09:56, Gabriele Fatigati a écrit :
> Hi Brice, hi Jeff,
>
> >Can you add some printf inside hwloc_linux_set_area_membind() in
> src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or
> not?
>
> I added printf inside that function, but ENOMEM does not come from there.

Not from hwloc_linux_set_area_membind() at all? Or not from mbind?

> >Have you run your application through valgrind or another
> memory-checking debugger?
>
> I tried with valgrind :
>
> valgrind --track-origins=yes --log-file=output_valgrind
> --leak-check=full --tool=memcheck  --show-reachable=yes
> ./main_hybrid_bind_mem
>
> ==25687== Warning: set address range perms: large range [0x39454040,
> 0x2218d4040) (undefined)
> ==25687== 
> ==25687== Valgrind's memory management: out of memory:
> ==25687==newSuperblock's request for 4194304 bytes failed.
> ==25687==34253180928 bytes have already been allocated.
> ==25687== Valgrind cannot continue.  Sorry.

There's really somebody allocating way too much memory here.

You should reduce your array size so that it doesn't fail, and then run
valgrind again to check if somebody is allocated a lot of memory without
ever freeing it.

Brice



>
>
> I attach the full output. 
>
>
> The code dies also using OpenMP pure code. Very misteriously.
>
>
> 2012/9/5 Jeff Squyres >
>
> On Sep 5, 2012, at 2:36 PM, Gabriele Fatigati wrote:
>
> > I don't think is a simply out of memory since NUMA node has 48
> GB, and I'm allocating just 8 GB.
>
> Mmm.  Probably right.
>
> Have you run your application through valgrind or another
> memory-checking debugger?
>
> I've seen cases of heap corruption lead to malloc incorrectly
> failing with ENOMEM.
>
> --
> Jeff Squyres
> jsquy...@cisco.com 
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel:   +39 051
> 6171722
>
> g.fatigati [AT] cineca.it   
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
Dear Jeff,

I don't think is a simply out of memory since NUMA node has 48 GB, and I'm
allocating just 8 GB.

2012/9/5 Jeff Squyres 

> Perhaps you simply have run out of memory on that NUMA node, and therefore
> the malloc failed.  Check "numactl --hardware", for example.
>
> You might want to check the output of numastat to see if one or more of
> your NUMA nodes have run out of memory.
>
>
> On Sep 5, 2012, at 12:58 PM, Gabriele Fatigati wrote:
>
> > I've reproduced the problem in a small MPI + OpenMP code.
> >
> > The error is the same: after some memory bind, gives "Cannot allocate
> memory".
> >
> > Thanks.
> >
> > 2012/9/5 Gabriele Fatigati 
> > Downscaling the matrix size, binding works well, but the memory
> available is enought also using more big matrix, so I'm a bit confused.
> >
> > Using the same big matrix size without binding the code works well, so
> how I can explain this behaviour?
> >
> > Maybe hwloc_set_area_membind_nodeset introduces other extra allocation
> that are resilient after the call?
> >
> >
> >
> > 2012/9/5 Brice Goglin 
> > An internal malloc failed then. That would explain why your malloc
> failed too.
> > It looks like you malloc'ed too much memory in your program?
> >
> > Brice
> >
> >
> >
> >
> > Le 05/09/2012 15:56, Gabriele Fatigati a écrit :
> >> An update:
> >>
> >> placing strerror(errno) after hwloc_set_area_membind_nodeset  gives:
> "Cannot allocate memory"
> >>
> >> 2012/9/5 Gabriele Fatigati 
> >> Hi,
> >>
> >> I've noted that hwloc_set_area_membind_nodeset return -1 but errno is
> not equal to EXDEV or ENOSYS. I supposed that these two case was the two
> unique possibly.
> >>
> >> From the hwloc documentation:
> >>
> >> -1 with errno set to ENOSYS if the action is not supported
> >> -1 with errno set to EXDEV if the binding cannot be enforced
> >>
> >>
> >> Any other binding failure reason? The memory available is enought.
> >>
> >> 2012/9/5 Brice Goglin 
> >> Hello Gabriele,
> >>
> >> The only limit that I would think of is the available physical memory
> on each NUMA node (numactl -H will tell you how much of each NUMA node
> memory is still available).
> >> malloc usually only fails (it returns NULL?) when there no *virtual*
> memory anymore, that's different. If you don't allocate tons of terabytes
> of virtual memory, this shouldn't happen easily.
> >>
> >> Brice
> >>
> >>
> >>
> >>
> >> Le 05/09/2012 14:27, Gabriele Fatigati a écrit :
> >>> Dear Hwloc users and developers,
> >>>
> >>>
> >>> I'm using hwloc 1.4.1 on a multithreaded program in a Linux platform,
> where each thread bind many non contiguos pieces of a big matrix using in a
> very intensive way hwloc_set_area_membind_nodeset function:
> >>>
> >>> hwloc_set_area_membind_nodeset(topology, punt+offset, len, nodeset,
> HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD | HWLOC_MEMBIND_MIGRATE);
> >>>
> >>> Binding seems works well, since the returned code from function is 0
> for every calls.
> >>>
> >>> The problems is that after binding, a simple little new malloc fails,
> without any apparent reason.
> >>>
> >>> Disabling memory binding, the allocations works well.  Is there any
> knows problem if  hwloc_set_area_membind_nodeset is used intensively?
> >>>
> >>> Is there some operating system limit for memory pages binding?
> >>>
> >>> Thanks in advance.
> >>>
> >>> --
> >>> Ing. Gabriele Fatigati
> >>>
> >>> HPC specialist
> >>>
> >>> SuperComputing Applications and Innovation Department
> >>>
> >>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >>>
> >>> www.cineca.itTel:   +39 051 6171722
> >>>
> >>> g.fatigati [AT] cineca.it
> >>>
> >>>
> >>> ___
> >>> hwloc-users mailing list
> >>>
> >>> hwloc-us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> >>
> >>
> >>
> >>
> >> --
> >> Ing. Gabriele Fatigati
> >>
> >> HPC specialist
> >>
> >> SuperComputing Applications and Innovation Department
> >>
> >> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >>
> >> www.cineca.itTel:   +39 051 6171722
> >>
> >> g.fatigati [AT] cineca.it
> >>
> >>
> >>
> >> --
> >> Ing. Gabriele Fatigati
> >>
> >> HPC specialist
> >>
> >> SuperComputing Applications and Innovation Department
> >>
> >> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >>
> >> www.cineca.itTel:   +39 051 6171722
> >>
> >> g.fatigati [AT] cineca.it
> >
> >
> >
> >
> > --
> > Ing. Gabriele Fatigati
> >
> > HPC specialist
> >
> > SuperComputing Applications and Innovation Department
> >
> > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >
> > www.cineca.itTel:   +39 051 6171722
> >
> > g.fatigati [AT] cineca.it
> >
> >
> >
> > --
> > Ing. Gabriele Fatigati
> >
> > HPC specialist
> >
> > SuperComputing Applications and Innovation Department
> >
> > Via Magnanelli 6/3, 

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Jeff Squyres
Perhaps you simply have run out of memory on that NUMA node, and therefore the 
malloc failed.  Check "numactl --hardware", for example.

You might want to check the output of numastat to see if one or more of your 
NUMA nodes have run out of memory. 


On Sep 5, 2012, at 12:58 PM, Gabriele Fatigati wrote:

> I've reproduced the problem in a small MPI + OpenMP code.
> 
> The error is the same: after some memory bind, gives "Cannot allocate memory".
> 
> Thanks.
> 
> 2012/9/5 Gabriele Fatigati 
> Downscaling the matrix size, binding works well, but the memory available is 
> enought also using more big matrix, so I'm a bit confused.
> 
> Using the same big matrix size without binding the code works well, so how I 
> can explain this behaviour?
> 
> Maybe hwloc_set_area_membind_nodeset introduces other extra allocation that 
> are resilient after the call?
> 
> 
> 
> 2012/9/5 Brice Goglin 
> An internal malloc failed then. That would explain why your malloc failed too.
> It looks like you malloc'ed too much memory in your program?
> 
> Brice
> 
> 
> 
> 
> Le 05/09/2012 15:56, Gabriele Fatigati a écrit :
>> An update:
>> 
>> placing strerror(errno) after hwloc_set_area_membind_nodeset  gives: "Cannot 
>> allocate memory"
>> 
>> 2012/9/5 Gabriele Fatigati 
>> Hi,
>> 
>> I've noted that hwloc_set_area_membind_nodeset return -1 but errno is not 
>> equal to EXDEV or ENOSYS. I supposed that these two case was the two unique 
>> possibly.
>> 
>> From the hwloc documentation:
>> 
>> -1 with errno set to ENOSYS if the action is not supported
>> -1 with errno set to EXDEV if the binding cannot be enforced
>> 
>> 
>> Any other binding failure reason? The memory available is enought.
>> 
>> 2012/9/5 Brice Goglin 
>> Hello Gabriele,
>> 
>> The only limit that I would think of is the available physical memory on 
>> each NUMA node (numactl -H will tell you how much of each NUMA node memory 
>> is still available).
>> malloc usually only fails (it returns NULL?) when there no *virtual* memory 
>> anymore, that's different. If you don't allocate tons of terabytes of 
>> virtual memory, this shouldn't happen easily.
>> 
>> Brice
>> 
>> 
>> 
>> 
>> Le 05/09/2012 14:27, Gabriele Fatigati a écrit :
>>> Dear Hwloc users and developers,
>>> 
>>> 
>>> I'm using hwloc 1.4.1 on a multithreaded program in a Linux platform, where 
>>> each thread bind many non contiguos pieces of a big matrix using in a very 
>>> intensive way hwloc_set_area_membind_nodeset function:
>>> 
>>> hwloc_set_area_membind_nodeset(topology, punt+offset, len, nodeset, 
>>> HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD | HWLOC_MEMBIND_MIGRATE);
>>> 
>>> Binding seems works well, since the returned code from function is 0 for 
>>> every calls.
>>> 
>>> The problems is that after binding, a simple little new malloc fails, 
>>> without any apparent reason.
>>> 
>>> Disabling memory binding, the allocations works well.  Is there any knows 
>>> problem if  hwloc_set_area_membind_nodeset is used intensively?
>>> 
>>> Is there some operating system limit for memory pages binding?
>>> 
>>> Thanks in advance.
>>> 
>>> -- 
>>> Ing. Gabriele Fatigati
>>> 
>>> HPC specialist
>>> 
>>> SuperComputing Applications and Innovation Department
>>> 
>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>> 
>>> www.cineca.itTel:   +39 051 6171722
>>> 
>>> g.fatigati [AT] cineca.it   
>>> 
>>> 
>>> ___
>>> hwloc-users mailing list
>>> 
>>> hwloc-us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>> 
>> 
>> 
>> 
>> -- 
>> Ing. Gabriele Fatigati
>> 
>> HPC specialist
>> 
>> SuperComputing Applications and Innovation Department
>> 
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>> 
>> www.cineca.itTel:   +39 051 6171722
>> 
>> g.fatigati [AT] cineca.it   
>> 
>> 
>> 
>> -- 
>> Ing. Gabriele Fatigati
>> 
>> HPC specialist
>> 
>> SuperComputing Applications and Innovation Department
>> 
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>> 
>> www.cineca.itTel:   +39 051 6171722
>> 
>> g.fatigati [AT] cineca.it   
> 
> 
> 
> 
> -- 
> Ing. Gabriele Fatigati
> 
> HPC specialist
> 
> SuperComputing Applications and Innovation Department
> 
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> 
> www.cineca.itTel:   +39 051 6171722
> 
> g.fatigati [AT] cineca.it   
> 
> 
> 
> -- 
> Ing. Gabriele Fatigati
> 
> HPC specialist
> 
> SuperComputing Applications and Innovation Department
> 
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> 
> www.cineca.itTel:   +39 051 6171722
> 
> g.fatigati [AT] cineca.it   
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users


-- 
Jeff 

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
I've reproduced the problem in a small MPI + OpenMP code.

The error is the same: after some memory bind, gives "Cannot allocate
memory".

Thanks.

2012/9/5 Gabriele Fatigati 

> Downscaling the matrix size, binding works well, but the memory available
> is enought also using more big matrix, so I'm a bit confused.
>
> Using the same big matrix size without binding the code works well, so how
> I can explain this behaviour?
>
> Maybe hwloc_set_area_membind_nodeset introduces other extra allocation
> that are resilient after the call?
>
>
>
> 2012/9/5 Brice Goglin 
>
>>  An internal malloc failed then. That would explain why your malloc
>> failed too.
>> It looks like you malloc'ed too much memory in your program?
>>
>> Brice
>>
>>
>>
>>
>> Le 05/09/2012 15:56, Gabriele Fatigati a écrit :
>>
>> An update:
>>
>>  placing strerror(errno) after hwloc_set_area_membind_nodeset  gives:
>> "Cannot allocate memory"
>>
>> 2012/9/5 Gabriele Fatigati 
>>
>>> Hi,
>>>
>>>  I've noted that hwloc_set_area_membind_nodeset return -1 but errno is
>>> not equal to EXDEV or ENOSYS. I supposed that these two case was the two
>>> unique possibly.
>>>
>>>  From the hwloc documentation:
>>>
>>>  -1 with errno set to ENOSYS if the action is not supported
>>> -1 with errno set to EXDEV if the binding cannot be enforced
>>>
>>>
>>>  Any other binding failure reason? The memory available is enought.
>>>
>>> 2012/9/5 Brice Goglin 
>>>
  Hello Gabriele,

 The only limit that I would think of is the available physical memory
 on each NUMA node (numactl -H will tell you how much of each NUMA node
 memory is still available).
 malloc usually only fails (it returns NULL?) when there no *virtual*
 memory anymore, that's different. If you don't allocate tons of terabytes
 of virtual memory, this shouldn't happen easily.

 Brice




 Le 05/09/2012 14:27, Gabriele Fatigati a écrit :

  Dear Hwloc users and developers,


  I'm using hwloc 1.4.1 on a multithreaded program in a Linux platform,
 where each thread bind many non contiguos pieces of a big matrix using in a
 very intensive way hwloc_set_area_membind_nodeset function:

  hwloc_set_area_membind_nodeset(topology, punt+offset, len, nodeset,
 HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD | HWLOC_MEMBIND_MIGRATE);

  Binding seems works well, since the returned code from function is 0
 for every calls.

  The problems is that after binding, a simple little new malloc fails,
 without any apparent reason.

  Disabling memory binding, the allocations works well.  Is there any
 knows problem if  hwloc_set_area_membind_nodeset is used intensively?

  Is there some operating system limit for memory pages binding?

  Thanks in advance.

  --
 Ing. Gabriele Fatigati

 HPC specialist

 SuperComputing Applications and Innovation Department

 Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

 www.cineca.itTel:   +39 051 
 6171722<%2B39%20051%206171722>

 g.fatigati [AT] cineca.it


  ___
 hwloc-users mailing 
 listhwloc-users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



>>>
>>>
>>>  --
>>> Ing. Gabriele Fatigati
>>>
>>> HPC specialist
>>>
>>> SuperComputing Applications and Innovation Department
>>>
>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>
>>> www.cineca.itTel:   +39 051 
>>> 6171722<%2B39%20051%206171722>
>>>
>>> g.fatigati [AT] cineca.it
>>>
>>
>>
>>
>>  --
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.itTel:   +39 051 6171722
>>
>> g.fatigati [AT] cineca.it
>>
>>
>>
>
>
> --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.itTel:   +39 051 6171722
>
> g.fatigati [AT] cineca.it
>



-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it
#include 
#include 
#include 
#include 


#define PAGE_SIZE 4096

int main(int argc,char *argv[]){


	/* Bind memory example: each thread bind a piece of allocated memory in local node
	 */

	MPI_Init (, );
	int rank;
	int result;

	MPI_Comm_rank (MPI_COMM_WORLD, );

hwloc_topology_t topology;
hwloc_cpuset_t cpuset;
	hwloc_obj_t obj;
	hwloc_topology_init();
	hwloc_topology_load(topology);

	size_t i;

	// allocate 8 GB
	size_t 

Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
Downscaling the matrix size, binding works well, but the memory available
is enought also using more big matrix, so I'm a bit confused.

Using the same big matrix size without binding the code works well, so how
I can explain this behaviour?

Maybe hwloc_set_area_membind_nodeset introduces other extra allocation that
are resilient after the call?



2012/9/5 Brice Goglin 

>  An internal malloc failed then. That would explain why your malloc failed
> too.
> It looks like you malloc'ed too much memory in your program?
>
> Brice
>
>
>
>
> Le 05/09/2012 15:56, Gabriele Fatigati a écrit :
>
> An update:
>
>  placing strerror(errno) after hwloc_set_area_membind_nodeset  gives:
> "Cannot allocate memory"
>
> 2012/9/5 Gabriele Fatigati 
>
>> Hi,
>>
>>  I've noted that hwloc_set_area_membind_nodeset return -1 but errno is
>> not equal to EXDEV or ENOSYS. I supposed that these two case was the two
>> unique possibly.
>>
>>  From the hwloc documentation:
>>
>>  -1 with errno set to ENOSYS if the action is not supported
>> -1 with errno set to EXDEV if the binding cannot be enforced
>>
>>
>>  Any other binding failure reason? The memory available is enought.
>>
>> 2012/9/5 Brice Goglin 
>>
>>>  Hello Gabriele,
>>>
>>> The only limit that I would think of is the available physical memory on
>>> each NUMA node (numactl -H will tell you how much of each NUMA node memory
>>> is still available).
>>> malloc usually only fails (it returns NULL?) when there no *virtual*
>>> memory anymore, that's different. If you don't allocate tons of terabytes
>>> of virtual memory, this shouldn't happen easily.
>>>
>>> Brice
>>>
>>>
>>>
>>>
>>> Le 05/09/2012 14:27, Gabriele Fatigati a écrit :
>>>
>>>  Dear Hwloc users and developers,
>>>
>>>
>>>  I'm using hwloc 1.4.1 on a multithreaded program in a Linux platform,
>>> where each thread bind many non contiguos pieces of a big matrix using in a
>>> very intensive way hwloc_set_area_membind_nodeset function:
>>>
>>>  hwloc_set_area_membind_nodeset(topology, punt+offset, len, nodeset,
>>> HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD | HWLOC_MEMBIND_MIGRATE);
>>>
>>>  Binding seems works well, since the returned code from function is 0
>>> for every calls.
>>>
>>>  The problems is that after binding, a simple little new malloc fails,
>>> without any apparent reason.
>>>
>>>  Disabling memory binding, the allocations works well.  Is there any
>>> knows problem if  hwloc_set_area_membind_nodeset is used intensively?
>>>
>>>  Is there some operating system limit for memory pages binding?
>>>
>>>  Thanks in advance.
>>>
>>>  --
>>> Ing. Gabriele Fatigati
>>>
>>> HPC specialist
>>>
>>> SuperComputing Applications and Innovation Department
>>>
>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>
>>> www.cineca.itTel:   +39 051 
>>> 6171722<%2B39%20051%206171722>
>>>
>>> g.fatigati [AT] cineca.it
>>>
>>>
>>>  ___
>>> hwloc-users mailing 
>>> listhwloc-users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>
>>>
>>>
>>
>>
>>  --
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.itTel:   +39 051 
>> 6171722<%2B39%20051%206171722>
>>
>> g.fatigati [AT] cineca.it
>>
>
>
>
>  --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.itTel:   +39 051 6171722
>
> g.fatigati [AT] cineca.it
>
>
>


-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it


Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
An internal malloc failed then. That would explain why your malloc
failed too.
It looks like you malloc'ed too much memory in your program?

Brice




Le 05/09/2012 15:56, Gabriele Fatigati a écrit :
> An update:
>
> placing strerror(errno) after hwloc_set_area_membind_nodeset  gives:
> "Cannot allocate memory"
>
> 2012/9/5 Gabriele Fatigati  >
>
> Hi,
>
> I've noted that hwloc_set_area_membind_nodeset return -1 but errno
> is not equal to EXDEV or ENOSYS. I supposed that these two case
> was the two unique possibly.
>
> From the hwloc documentation:
>
> -1 with errno set to ENOSYS if the action is not supported
> -1 with errno set to EXDEV if the binding cannot be enforced
>
>
> Any other binding failure reason? The memory available is enought.
>
> 2012/9/5 Brice Goglin  >
>
> Hello Gabriele,
>
> The only limit that I would think of is the available physical
> memory on each NUMA node (numactl -H will tell you how much of
> each NUMA node memory is still available).
> malloc usually only fails (it returns NULL?) when there no
> *virtual* memory anymore, that's different. If you don't
> allocate tons of terabytes of virtual memory, this shouldn't
> happen easily.
>
> Brice
>
>
>
>
> Le 05/09/2012 14:27, Gabriele Fatigati a écrit :
>> Dear Hwloc users and developers,
>>
>>
>> I'm using hwloc 1.4.1 on a multithreaded program in a Linux
>> platform, where each thread bind many non contiguos pieces of
>> a big matrix using in a very intensive way
>> hwloc_set_area_membind_nodeset function:
>>
>> hwloc_set_area_membind_nodeset(topology, punt+offset, len,
>> nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD |
>> HWLOC_MEMBIND_MIGRATE);
>>
>> Binding seems works well, since the returned code from
>> function is 0 for every calls.
>>
>> The problems is that after binding, a simple little new
>> malloc fails, without any apparent reason.
>>
>> Disabling memory binding, the allocations works well.  Is
>> there any knows problem if  hwloc_set_area_membind_nodeset is
>> used intensively?
>>
>> Is there some operating system limit for memory pages binding?
>>
>> Thanks in advance.
>>
>> -- 
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it    
>> Tel:   +39 051 6171722 
>>
>> g.fatigati [AT] cineca.it   
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel:   +39
> 051 6171722 
>
> g.fatigati [AT] cineca.it   
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel:   +39 051
> 6171722
>
> g.fatigati [AT] cineca.it   



Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
An update:

placing strerror(errno) after hwloc_set_area_membind_nodeset  gives:
"Cannot allocate memory"

2012/9/5 Gabriele Fatigati 

> Hi,
>
> I've noted that hwloc_set_area_membind_nodeset return -1 but errno is not
> equal to EXDEV or ENOSYS. I supposed that these two case was the two unique
> possibly.
>
> From the hwloc documentation:
>
> -1 with errno set to ENOSYS if the action is not supported
> -1 with errno set to EXDEV if the binding cannot be enforced
>
>
> Any other binding failure reason? The memory available is enought.
>
> 2012/9/5 Brice Goglin 
>
>>  Hello Gabriele,
>>
>> The only limit that I would think of is the available physical memory on
>> each NUMA node (numactl -H will tell you how much of each NUMA node memory
>> is still available).
>> malloc usually only fails (it returns NULL?) when there no *virtual*
>> memory anymore, that's different. If you don't allocate tons of terabytes
>> of virtual memory, this shouldn't happen easily.
>>
>> Brice
>>
>>
>>
>>
>> Le 05/09/2012 14:27, Gabriele Fatigati a écrit :
>>
>> Dear Hwloc users and developers,
>>
>>
>>  I'm using hwloc 1.4.1 on a multithreaded program in a Linux platform,
>> where each thread bind many non contiguos pieces of a big matrix using in a
>> very intensive way hwloc_set_area_membind_nodeset function:
>>
>>  hwloc_set_area_membind_nodeset(topology, punt+offset, len, nodeset,
>> HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD | HWLOC_MEMBIND_MIGRATE);
>>
>>  Binding seems works well, since the returned code from function is 0
>> for every calls.
>>
>>  The problems is that after binding, a simple little new malloc fails,
>> without any apparent reason.
>>
>>  Disabling memory binding, the allocations works well.  Is there any
>> knows problem if  hwloc_set_area_membind_nodeset is used intensively?
>>
>>  Is there some operating system limit for memory pages binding?
>>
>>  Thanks in advance.
>>
>>  --
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.itTel:   +39 051 6171722
>>
>> g.fatigati [AT] cineca.it
>>
>>
>> ___
>> hwloc-users mailing 
>> listhwloc-users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>
>>
>>
>
>
> --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.itTel:   +39 051 6171722
>
> g.fatigati [AT] cineca.it
>



-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it


Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
What does errno contain?
Aside of ENOSYS and EXDEV, you may also get the "usual" error codes such
as ENOMEM, EPERM or EINVAL.
We didn't document all of them, it mostly depends on the underlying
kernel and mbind implementations.
Brice



Le 05/09/2012 15:44, Gabriele Fatigati a écrit :
> Hi,
>
> I've noted that hwloc_set_area_membind_nodeset return -1 but errno is
> not equal to EXDEV or ENOSYS. I supposed that these two case was the
> two unique possibly.
>
> From the hwloc documentation:
>
> -1 with errno set to ENOSYS if the action is not supported
> -1 with errno set to EXDEV if the binding cannot be enforced
>
>
> Any other binding failure reason? The memory available is enought.
>
> 2012/9/5 Brice Goglin  >
>
> Hello Gabriele,
>
> The only limit that I would think of is the available physical
> memory on each NUMA node (numactl -H will tell you how much of
> each NUMA node memory is still available).
> malloc usually only fails (it returns NULL?) when there no
> *virtual* memory anymore, that's different. If you don't allocate
> tons of terabytes of virtual memory, this shouldn't happen easily.
>
> Brice
>
>
>
>
> Le 05/09/2012 14:27, Gabriele Fatigati a écrit :
>> Dear Hwloc users and developers,
>>
>>
>> I'm using hwloc 1.4.1 on a multithreaded program in a Linux
>> platform, where each thread bind many non contiguos pieces of a
>> big matrix using in a very intensive way
>> hwloc_set_area_membind_nodeset function:
>>
>> hwloc_set_area_membind_nodeset(topology, punt+offset, len,
>> nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD |
>> HWLOC_MEMBIND_MIGRATE);
>>
>> Binding seems works well, since the returned code from function
>> is 0 for every calls.
>>
>> The problems is that after binding, a simple little new malloc
>> fails, without any apparent reason.
>>
>> Disabling memory binding, the allocations works well.  Is there
>> any knows problem if  hwloc_set_area_membind_nodeset is used
>> intensively?
>>
>> Is there some operating system limit for memory pages binding?
>>
>> Thanks in advance.
>>
>> -- 
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it Tel:  
>> +39 051 6171722 
>>
>> g.fatigati [AT] cineca.it   
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel:   +39 051
> 6171722
>
> g.fatigati [AT] cineca.it   



Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Gabriele Fatigati
Hi,

I've noted that hwloc_set_area_membind_nodeset return -1 but errno is not
equal to EXDEV or ENOSYS. I supposed that these two case was the two unique
possibly.

>From the hwloc documentation:

-1 with errno set to ENOSYS if the action is not supported
-1 with errno set to EXDEV if the binding cannot be enforced


Any other binding failure reason? The memory available is enought.

2012/9/5 Brice Goglin 

>  Hello Gabriele,
>
> The only limit that I would think of is the available physical memory on
> each NUMA node (numactl -H will tell you how much of each NUMA node memory
> is still available).
> malloc usually only fails (it returns NULL?) when there no *virtual*
> memory anymore, that's different. If you don't allocate tons of terabytes
> of virtual memory, this shouldn't happen easily.
>
> Brice
>
>
>
>
> Le 05/09/2012 14:27, Gabriele Fatigati a écrit :
>
> Dear Hwloc users and developers,
>
>
>  I'm using hwloc 1.4.1 on a multithreaded program in a Linux platform,
> where each thread bind many non contiguos pieces of a big matrix using in a
> very intensive way hwloc_set_area_membind_nodeset function:
>
>  hwloc_set_area_membind_nodeset(topology, punt+offset, len, nodeset,
> HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD | HWLOC_MEMBIND_MIGRATE);
>
>  Binding seems works well, since the returned code from function is 0 for
> every calls.
>
>  The problems is that after binding, a simple little new malloc fails,
> without any apparent reason.
>
>  Disabling memory binding, the allocations works well.  Is there any
> knows problem if  hwloc_set_area_membind_nodeset is used intensively?
>
>  Is there some operating system limit for memory pages binding?
>
>  Thanks in advance.
>
>  --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.itTel:   +39 051 6171722
>
> g.fatigati [AT] cineca.it
>
>
> ___
> hwloc-users mailing 
> listhwloc-users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>


-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it