Hello,
Ah yes I see. I made a very basic mistake. It slipped my mind that the
machine only has two numa nodes, and that the memory binding only concerns
itself with numa nodes since that is when non-uniform memory access comes
into play.
Thanks for your time again.
Mike
Am Mi., 2. März 2022 um 1
Le 02/03/2022 à 12:31, Mike a écrit :
Hello,
Can you display both mask before set_area_membind and after
get_area_membind and send the entire output of all processes and
threads? If you can prefix the line with the PID, it'd help a lot :)
What do you mean with output of all process
Hello,
> Can you display both mask before set_area_membind and after
> get_area_membind and send the entire output of all processes and threads?
> If you can prefix the line with the PID, it'd help a lot :)
>
What do you mean with output of all processes and threads?
If I execute with 1 MPI rank a
Le 02/03/2022 à 11:38, Mike a écrit :
Hello,
If you print the set that is built before calling
set_area_membind, you should only see 4 bits in there, right?
(since threadcount=4 in your code)
I'd say 0xf for rank0, 0xf0 for rank1, etc.
set_area_membind() will translate that
Hello,
If you print the set that is built before calling set_area_membind, you
> should only see 4 bits in there, right? (since threadcount=4 in your code)
>
> I'd say 0xf for rank0, 0xf0 for rank1, etc.
>
> set_area_membind() will translate that into a single NUMA node, before
> asking the kernel
Le 02/03/2022 à 10:09, Mike a écrit :
Ok then your mask 0x,0x,,,0x,0x
corresponds exactly to NUMA node 0 (socket 0). Object cpusets can
be displayed on the command-line with "lstopo --cpuset" or
"hwloc-calc numa:0".
This would be OK if you're
>
> Ok then your mask 0x,0x,,,0x,0x
> corresponds exactly to NUMA node 0 (socket 0). Object cpusets can be
> displayed on the command-line with "lstopo --cpuset" or "hwloc-calc numa:0".
>
> This would be OK if you're only spawning threads to the first socket. Do
> yo
Le 02/03/2022 à 09:39, Mike a écrit :
Hello,
Please run "lstopo -.synthetic" to compress the output a lot. I
will be able to reuse it from here and understand your binding mask.
Package:2 [NUMANode(memory=270369247232)] L3Cache:8(size=33554432)
L2Cache:8(size=524288) L1dCache:1(size=32
Hello,
Please run "lstopo -.synthetic" to compress the output a lot. I will be
> able to reuse it from here and understand your binding mask.
>
Package:2 [NUMANode(memory=270369247232)] L3Cache:8(size=33554432)
L2Cache:8(size=524288) L1dCache:1(size=32768) L1iCache:1(size=32768) Core:1
PU:2(indexe
Le 01/03/2022 à 17:34, Mike a écrit :
Hello,
Usually you would rather allocate and bind at the same time so
that the memory doesn't need to be migrated when bound. However,
if you do not touch the memory after allocation, pages are not
actually physically allocated, hence there'
Hello,
Usually you would rather allocate and bind at the same time so that the
> memory doesn't need to be migrated when bound. However, if you do not touch
> the memory after allocation, pages are not actually physically allocated,
> hence there's no to migrate. Might work but keep this in mind.
Le 01/03/2022 à 15:17, Mike a écrit :
Dear list,
I have a program that utilizes Openmpi + multithreading and I want the
freedom to decide on which hardware cores my threads should run. By
using hwloc_set_cpubind() that already works, so now I also want to
bind memory to the hardware cores.
Dear list,
I have a program that utilizes Openmpi + multithreading and I want the
freedom to decide on which hardware cores my threads should run. By using
hwloc_set_cpubind() that already works, so now I also want to bind memory
to the hardware cores. But I just can't get it to work.
Basically,
13 matches
Mail list logo