epends on numa devel.
Ok!
2012/9/7 Brice Goglin
> Le 07/09/2012 09:43, Gabriele Fatigati a écrit :
> > Hi,
> >
> > Good, you found the kernel limit that exceed.
> >
> > proc/memfinfo reports as MemFree 47834588 kB
> >
> > numactl -H:
> >
Brice Goglin
> Le 06/09/2012 14:51, Gabriele Fatigati a écrit :
>
> Hi Brice,
>
> the initial grep is:
>
> numa_policy65671 65952 24 1441 : tunables 120 60
> 8 : slabdata458458 0
>
> When set_membind fails is:
>
> numa_pol
?
2012/9/6 Brice Goglin
> Le 06/09/2012 12:19, Gabriele Fatigati a écrit :
>
> I did't find any strange number in /proc/meminfo.
>
> I've noted that the program fails exactly
> every 65479 hwloc_set_area_membind. So It sounds like some kernel limit.
> You can
uos memory few times, instead of small and non contiguos pieces of
memory many and many times.. :(
2012/9/6 Brice Goglin
> Le 06/09/2012 10:44, Samuel Thibault a écrit :
> > Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit :
> >> mbind hwloc_linux_set_area_membi
Oops,
I forgot the hwloc_topology_destroy() and also hwloc_bitmap_free(cpuset);
Added them, I attach new code using hwloc_set_area_membind function
directly and new Valgrind output.
2012/9/6 Brice Goglin
> Le 06/09/2012 10:13, Gabriele Fatigati a écrit :
>
> Downsizing the array,
Downsizing the array, up to 4GB,
valgrind gives many warnings reported in the attached file.
2012/9/6 Gabriele Fatigati
> Sorry,
>
> I used a wrong hwloc installation. Using the hwloc with the printf
> controls:
>
> mbind hwloc_linux_set_area_membind() fails:
&g
=output_valgrind --leak-check=full
--tool=memcheck --show-reachable=yes ./main_hybrid_bind_mem
2012/9/6 Gabriele Fatigati
> Hi Brice, hi Jeff,
>
> >Can you add some printf inside hwloc_linux_set_area_membind() in
> src/topology-linux.c to see if ENOMEM comes from the mbind
ng OpenMP pure code. Very misteriously.
2012/9/5 Jeff Squyres
> On Sep 5, 2012, at 2:36 PM, Gabriele Fatigati wrote:
>
> > I don't think is a simply out of memory since NUMA node has 48 GB, and
> I'm allocating just 8 GB.
>
> Mmm. Probably right.
>
> H
ple.
>
> You might want to check the output of numastat to see if one or more of
> your NUMA nodes have run out of memory.
>
>
> On Sep 5, 2012, at 12:58 PM, Gabriele Fatigati wrote:
>
> > I've reproduced the problem in a small MPI + OpenMP code.
> >
> >
I've reproduced the problem in a small MPI + OpenMP code.
The error is the same: after some memory bind, gives "Cannot allocate
memory".
Thanks.
2012/9/5 Gabriele Fatigati
> Downscaling the matrix size, binding works well, but the memory available
> is enought also usin
other extra allocation that
are resilient after the call?
2012/9/5 Brice Goglin
> An internal malloc failed then. That would explain why your malloc failed
> too.
> It looks like you malloc'ed too much memory in your program?
>
> Brice
>
>
>
>
> Le 05/09/2012 1
An update:
placing strerror(errno) after hwloc_set_area_membind_nodeset gives:
"Cannot allocate memory"
2012/9/5 Gabriele Fatigati
> Hi,
>
> I've noted that hwloc_set_area_membind_nodeset return -1 but errno is not
> equal to EXDEV or ENOSYS. I supposed that these
de memory
> is still available).
> malloc usually only fails (it returns NULL?) when there no *virtual*
> memory anymore, that's different. If you don't allocate tons of terabytes
> of virtual memory, this shouldn't happen easily.
>
> Brice
>
>
>
>
> L
, the allocations works well. Is there any knows
problem if hwloc_set_area_membind_nodeset is used intensively?
Is there some operating system limit for memory pages binding?
Thanks in advance.
--
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Just another things:
The id showed in the GPU box from lstopo, is the same device_id CUDA
numeration used in some function like setDevice() for example?
More better:
gpu 1 from lstopo = ? gpu 1 for CUDA runtime?
Thanks.
2012/8/29 Gabriele Fatigati
> Good!
>
> Now it works well.
Good!
Now it works well.
Many tanks!
2012/8/28 Samuel Thibault
> Gabriele Fatigati, le Tue 28 Aug 2012 18:10:41 +0200, a écrit :
> > How can cuda branch help me? lstopo output of that branch is the same of
> the
> > trunk.
>
> You need to make sure that hwloc found
> are plenty of such platforms where the GPU is indeed connected to both
> > sockets. Or it could be a buggy BIOS.
>
> Agreed.
>
> Samuel
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.o
Dear hwloc user,
I'm using hwloc 1.5. I would to see how GPUs are connected with the
processor socket using lstopo command.
I attach the figure. The system has two GPUs, but I don't understand how to
find that information from PCI boxes.
Thanks in advance.
--
Ing. Gabriele Fat
Ok,
so, set_membind() merged with HWLOC_MEMBIND_BIND is useless?
The behaviour I want to set is it possible?
2011/9/25 Brice Goglin
> Le 25/09/2011 20:57, Gabriele Fatigati a écrit :
> > after done this, memory is allocated not in a local node of thread
> > that does set_mem
d and malloc, but in node of thread that touches it. And I don't
understand this behaviour :(
2011/9/25 Brice Goglin
> **
> Le 25/09/2011 20:27, Gabriele Fatigati a écrit :
>
> if(tid==0){
>
> set_membind(HWLOCMEMBIND_BIND, node 0)
> malloc(array)...
>
>
set_membind(HWLOCMEMBIND_BIND, node 1)
for(i...)
array(i)
}
end parallel region
array is allocated on node 1, not node 0 as I expected So it seems
set_membind() of second thread influence in some manner array allocation
using first touch.
2011/9/25 Brice Goglin
> **
> Le 25/09/2011 12:41, Ga
2011/9/25 Gabriele Fatigati
>
>> * doing two set_area_membind on the same entire array is useless, the
> second one will overwrite the first one.
>
> But set_area_membind is for memory in general, not for a particular malloc.
> ( Is it rigth?)
>
> set_membind done b
me allocations, and set_area_membind for thread 2 for futures
allocations.
set_membind done by thread 2 has no reference with malloc(array) done by
first thread, so why it influence the real allocation of this array?
2011/9/25 Brice Goglin
> **
> Le 25/09/2011 12:19, Gabriele Fatigati a
ow it works, because in the second
example only first touch appears to have some effects, indipendently which
hwloc function I'm using.
Sorry, but it is quite difficult to understand .. :(
2011/9/25 Brice Goglin
> **
> Le 25/09/2011 11:14, Gabriele Fatigati a écrit :
>
>
> I
o replicate the behaviour of set_area_membind_nodeset() in
some manner for all futures allocation without call this function each time
I allocate some memory. Is it possible to do this?
Thanks.
2011/9/22 Gabriele Fatigati
> Hi,
>
> some questions:
>
> 1) I don't understand the
for all futures allocation without call this function each time
I allocate some memory. Is it possible to do this?
2011/9/22 Brice Goglin
> Le 22/09/2011 12:20, Gabriele Fatigati a écrit :
> > NUMA node(s) near the specified cpuset.
> >
> > What does "nodes near the s
e memory on the nodes decrease only on the node where the second
thread is. Is it rigth?
hwloc_set_membind involves all future allocations?
Thanks in forward.
--
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3, Casalecchio di
B, what you
should now see with get_cpubind is >that process X is now bound to cores
A+B, thread Y to B, and all other threads to A.
2011/9/12 Brice Goglin
> Le 12/09/2011 14:17, Gabriele Fatigati a écrit :
> > Mm, and why? In a hybrid code ( MPI + OpenMP), my idea is to bind a
&g
d why? In a hybrid code ( MPI + OpenMP), my idea is to bind a single
MPI process in one core, and his threads in other cores. Otherwise I have
all threads that runs on a single core..
2011/9/12 Brice Goglin
> **
> Le 12/09/2011 13:58, Gabriele Fatigati a écrit :
>
> Hi Brice,
&g
rocess
and thread are on the same NUMA node, works well, also on different cores.
If the NUMA node of process is different of NUMA node of threads, there is a
problem.
2011/9/12 Brice Goglin
> **
> Le 12/09/2011 13:29, Gabriele Fatigati a écrit :
>
> Hi Birce,
>
> I'
ad on cpus given in bitmap set.
Why you are saying tha process bind is not possible? I'm using it and it
work well!
2011/9/12 Brice Goglin
> Le 12/09/2011 12:52, Gabriele Fatigati a écrit :
> > Dear hwloc users,
> >
> > I'm binding process in a NUMA node and also
nd on NUMA node 1, and not 0.
Why this? Thread binding influence bind of main process?
Thanks in advance.
--
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.itTel:
segfault when checking).
>
> If you really need something like this, put an integer value on the side of
> the topology variable, and make 0 or 1 depending on whether the topology was
> init or not.
>
> Brice
>
>
> ----- Reply message -
> De : "Gabriele Fatigati&q
ill be changed
> into something else when init() is called.
>
> Brice
>
> - Reply message -----
> De : "Gabriele Fatigati"
> Pour : "Hardware locality user list"
> Objet : [hwloc-users] hwloc topology check initializing
> Date : sam., sept. 3, 2011
Dear hwloc users,
how to check if my hwloc topology is initialized? I have to use
hwloc_topology_check? This code not works:
hwloc_topology_t topology;
if( topology==NULL)
exit(-1);
--
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via
Dear hwloc users,
hwloc_get_last_cpu_location() return last CPU where process/thread ran.On
SMT machine, it return the PU where process/thread ran ?
Thanks a lot.
--
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3
MA-aware or not (not sure we should remove this possible
> > difference).
>
> The useful difference is that 0 means we don't know, while 1 means we do
> know there is only one node.
>
> Samuel
>
--
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Appli
.
--
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.itTel: +39 051 6171722
g.fatigati [AT] cineca.it
c uses mbind.
> But I checked the hwloc code again, things look ok, and the kernel is happy
> with our mbind parameters.
> Brice
>
>
> - Reply message -
> De : "Gabriele Fatigati"
> Pour?: "Brice Goglin"
> Cc : "Hardware locality use
: bind_memory_tonode (main.c:97)
>
> valgrind has --tool=memcheck --leak-check=full exec flags.
>
> It give me the same warning also with just one byte memory bound.
>
> Is it a hwloc warning or my applications warning?
>
> Thanks in forward.
>
>
>
>
nd_nodeset (bind.c:396)
==2904==by 0x401CBB: bind_memory_tonode (main.c:97)
valgrind has --tool=memcheck --leak-check=full exec flags.
It give me the same warning also with just one byte memory bound.
Is it a hwloc warning or my applications warning?
Thanks in forward.
--
Ing. Gabriel
Of course,
with gettid() works well.
Thanks so much!
2011/8/11 Samuel Thibault
> Gabriele Fatigati, le Thu 11 Aug 2011 18:05:25 +0200, a écrit :
> > char* bitmap_string=(char*)malloc(256);
> >
> > hwloc_bitmap_t set = hwloc_bitmap_alloc();
> >
> > hwloc_lin
om tid: %d \n", bitmap_string, tid);
------
2011/8/11 Gabriele Fatigati
> Hi Samuel,
>
> I'm using as it in OpenMP parallel region:
>
>
> -
>
> char* bitmap_string=(char*)malloc(256);
>
> hwloc_bitmap_t set = hwloc_bitmap_al
ap_string: %s from tid: %d \n", bitmap_string[0], tid);
2011/8/11 Samuel Thibault
> Gabriele Fatigati, le Thu 11 Aug 2011 10:32:23 +0200, a écrit :
> > I'm using hwloc-1.3a1r3606. Now hwloc_get_last_cpu_location() works
> well:
> >
>
ve me:
thread 0 bind: 0x0008 as core number 3
thread 1 bind: "0x00ff" as all available cores!!
2011/8/10 Gabriele Fatigati
> Ok,
>
> thanks!
>
> 2011/8/10 Samuel Thibault
>
>> Samuel Thibault, le Wed 10 Aug 2011 16:24:39 +0200, a écrit :
>>
Ok,
thanks!
2011/8/10 Samuel Thibault
> Samuel Thibault, le Wed 10 Aug 2011 16:24:39 +0200, a écrit :
> > Gabriele Fatigati, le Wed 10 Aug 2011 16:13:27 +0200, a écrit :
> > > there is something wrong. I'm using two thread, the first one is bound
> on
> > >
ee CPU 2 and 10 working, so bind has worked
well.
2011/8/10 Samuel Thibault
> Gabriele Fatigati, le Wed 10 Aug 2011 15:41:19 +0200, a écrit :
> > hwloc_cpuset_t set = hwloc_bitmap_alloc();
> >
> > int return_value = hwloc_get_last_cpu_location(topology, set,
> &
o, CPU 0 I suppose, but is not where i bound my thread .. :(
2011/8/10 Samuel Thibault
> Gabriele Fatigati, le Wed 10 Aug 2011 15:29:43 +0200, a écrit :
> > hwloc_obj_t core = hwloc_get_obj_by_type(topology, HWLOC_OBJ_MACHINE, 0);
> >
> > int return_value = hwloc_get_last_cpu_
ess/threads
runs. Is it right?
2011/8/10 Samuel Thibault
> Gabriele Fatigati, le Wed 10 Aug 2011 09:35:19 +0200, a écrit :
> > these lines, doesn't works:
> >
> > set = hwloc_bitmap_alloc();
> > hwloc_get_cpubind(topology, &set, 0);
> >
> > h
) give me the cpuset, and hwloc_get_last_cpu_location()
give me CPU index where process/thread runs from cpuset passed. It is right?
The phylosophy of these function are
2011/8/9 Samuel Thibault
> Gabriele Fatigati, le Tue 09 Aug 2011 18:14:55 +0200, a écrit :
> > hwloc_get_cpubind() funct
to use it?
--
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.itTel: +39 051 6171722
g.fatigati [AT] cineca.it
>There is no difference concerning the cpuset.
It means they have the same logical index?
2011/8/9 Samuel Thibault
> Gabriele Fatigati, le Tue 09 Aug 2011 16:58:33 +0200, a écrit :
> > in a non SMT machine, what's the difference between HWLOC_OBJ_CORE
> > and HWLOC_
Dear hwloc users,
in a non SMT machine, what's the difference between HWLOC_OBJ_CORE
and HWLOC_OBJ_PU?
can I exchange one to other?
Thanks.
--
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3, Casalecchio di Reno (BO)
Well,
now it's more clear.
Thanks for the informations!
Regards.
2011/8/4 Samuel Thibault
> Gabriele Fatigati, le Thu 04 Aug 2011 16:56:22 +0200, a écrit :
> > L#0 and L#1 are physically near because hwloc consider shared caches map
> when
> > build topology?
>
>
L#0 and L#1 are physically near because hwloc consider shared caches map
when build topology? Because if not, i don't know how hwloc understand the
physical proximity of cores :(
2011/8/4 Samuel Thibault
> Gabriele Fatigati, le Thu 04 Aug 2011 16:35:36 +0200, a écrit :
> > so phy
mance.
2011/8/4 Samuel Thibault
> Gabriele Fatigati, le Thu 04 Aug 2011 16:14:35 +0200, a écrit :
> > Socket:
> > __
> >| |
> >| |core | |core ||
> >|
e L#1 in a single socket are
physically near.
2011/8/4 Samuel Thibault
> Gabriele Fatigati, le Thu 04 Aug 2011 15:52:09 +0200, a écrit :
> > how the topology gave by lstopo is built? In particolar, how the logical
> index
> > P# are initialized?
>
> P# are not logic
Hi Samuel,
how the topology gave by lstopo is built? In particolar, how the logical
index P# are initialized?
2011/8/4 Samuel Thibault
> Hello,
>
> Gabriele Fatigati, le Mon 01 Aug 2011 12:32:44 +0200, a écrit :
> > So, are not physically near. I aspect that with Hypert
Hi all,
in hwloc manual, return value of many functions in error case are:
-1 with errno set to ENOSYS if the action is not supported
-1 with errno set to EXDEV if the binding cannot be enforced
What's the difference?
Thanks
--
Ing. Gabriele Fatigati
Parallel programmer
CINECA Sy
ind(*topology, set, HWLOC_CPUBIND_THREAD |
HWLOC_CPUBIND_NOMEMBIND);
}
2011/8/2 Gabriele Fatigati
> Mm, i'm not sure. Suppose this:
>
> $pragma omp parallel num_thread(1)
> {
> hwloc_set_cpubind(*topology, set, HWLOC_CPUBIND_THREAD |
> HWLOC_CPUBIND_STRICT | HWLOC_CPUB
amuel Thibault
> Gabriele Fatigati, le Tue 02 Aug 2011 16:23:12 +0200, a écrit :
> > hwloc_set_cpubind(*topology, set, HWLOC_CPUBIND_THREAD |
> HWLOC_CPUBIND_STRICT
> > | HWLOC_CPUBIND_NOMEMBIND);
> >
> > is it possible do multiple call to hwloc_set_cpubi
);
hwloc_set_cpubind(*topology, set, HWLOC_CPUBIND_STRICT);
hwloc_set_cpubind(*topology, set, HWLOC_CPUBIND_NOMEMBIND);
or only the last have effect?
Thanks in forward.
--
Ing. Gabriele Fatigati
Parallel programmer
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnan
using. "Processing Unit" is less confusing, that's why it's the official
> name for the smallest objects in hwloc.
>
> Brice
>
>
>
>
>
>
>
> Le 01/08/2011 15:04, Gabriele Fatigati a écrit :
>
> Hi Brice,
>
> you said:
>
> "
8/1 Brice Goglin
> **
> "PU P#0" means "PU object with physical index 0".
> "P#" prefix means "physical index".
> "L#" prefix means "logical index" (the one you want to use in
> get_obj_by_type).
> Use -l or -p to s
Hi Brice,
so, if I inderstand well, PU P# numbers are not the same specified as
HWLOC_OBJ_PU flag?
2011/8/1 Brice Goglin
> Le 01/08/2011 12:16, Gabriele Fatigati a écrit :
> > Hi,
> >
> > reading a hwloc-v1.2-a4 manual, on page 15, i look an example
> > with 4
But i thinks it not works..
2011/7/29 Samuel Thibault
> Gabriele Fatigati, le Fri 29 Jul 2011 13:34:29 +0200, a écrit :
> > I forgot to tell you these code block is inside a parallel OpenMP region.
> This
> > is the complete code:
> >
> > #pragma om
cutive and not
exclusive, I suppose is better and more sure to use PU id. Or not?
2011/7/29 Samuel Thibault
> Gabriele Fatigati, le Fri 29 Jul 2011 13:24:17 +0200, a écrit :
> > yhanks for yout quick reply!
> >
> > But i have a litte doubt. in a non SMT mac
1/7/29 Samuel Thibault
> Hello,
>
> Gabriele Fatigati, le Fri 29 Jul 2011 12:43:47 +0200, a écrit :
> > I'm so confused. I see couples of cores with the same core id! ( Core#8
> for
> > example) How is it possible?
>
> That's because they are on different sock
well with:
hwloc_set_cpubind(topology, set, HWLOC_CPUBIND_THREAD);
and crash with:
hwloc_set_thread_cpubind(topology, tid, set, HWLOC_CPUBIND_THREAD);
Thanks in forward.
--
Ing. Gabriele Fatigati
Parallel programmer
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnanelli 6/3,
69 matches
Mail list logo