Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-22 Thread Samuel Thibault
In the end, I'm wondering what we will do for the Debian packages: a
separate libhwloc2-dev package (which is a pain for various reasons) or
not.  It depends whether we have rdependencies ready when we really want
hwloc2 into Debian. FI, here are the rdeps:

Package: gridengine
Package: htop
Package: librsb
Package: mpich
Package: openmpi
Package: pocl
Package: slurm-llnl
Package: starpu
Package: trafficserver

For small fixups like field changes etc. maintainers will probably be
fine with patches, but for more involved changes such as memory nodes,
it'll probably take more time since maintainers may not be happy to
backport heavy changes.

Samuel
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel


Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-22 Thread Samuel Thibault
Brice Goglin, on ven. 22 déc. 2017 12:35:35 +0100, wrote:
> That won't work. You can have memory attached at different levels of the
> hierarchy (things like HBM inside a die, normal memory attached to a
> package, and slow memory attached to the memory interconnect). The
> notion of NUMA node and proximity domain is changing. It's not a set of
> CPU+memory anymore. Things are moving towards the separation of "memory
> initiator" (CPUs) and "memory target" (memory banks, possibly behind
> memory-side caches). And those targets can be attached to different things.

Alright.

So hwloc might there be a lever to push people into thinking that way :)

Samuel
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-22 Thread Brice Goglin
Le 22/12/2017 à 11:42, Samuel Thibault a écrit :
> Hello,
>
> Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote:
>>   + Memory, I/O and Misc objects are now stored in dedicated children lists,
>> not in the usual children list that is now only used for CPU-side 
>> objects.
>> - hwloc_get_next_child() may still be used to iterate over these 4 lists
>>   of children at once.
> I hadn't realized this before: so the NUMA-related hierarchy level can
> not be easily obtained with hwloc_get_type_depth and such, that's really
> a concern. For instance in slurm-llnl one can find
>
>   if (hwloc_get_type_depth(topology, HWLOC_OBJ_NODE) >
>   hwloc_get_type_depth(topology, HWLOC_OBJ_SOCKET)) {
>
> and probably others are doing this too, e.g. looking up from a CPU to
> find the NUMA level becomes very different from looking up from a cPU to
> find the L3 level etc.
>
> Instead of moving these objects to another place which is very
> different to find, can't we rather create another type of object, e.g.
> HWLOC_OBJ_MEMORY, to represent the different kinds of memories that can
> be found in a given NUMA level, and keep HWLOC_OBJ_NODE as it is?
>

That won't work. You can have memory attached at different levels of the
hierarchy (things like HBM inside a die, normal memory attached to a
package, and slow memory attached to the memory interconnect). The
notion of NUMA node and proximity domain is changing. It's not a set of
CPU+memory anymore. Things are moving towards the separation of "memory
initiator" (CPUs) and "memory target" (memory banks, possibly behind
memory-side caches). And those targets can be attached to different things.



I agree that finding local NUMA nodes is harder now. I thought about
having an explicit type saying "I have memory children, other don't"
(you propose NUMA with MEMORY children, I rather thought about MEMORY
with NUMA children because people are used to NUMA node numbers, and
memory-bind to NUMA nodes). But again, there's no guarantee that they
will be at the same depth in the hierarchy since they might be attached
to different kinds of resources. Things like comparing their depth with
socket depth won't work either. So we'd end up with multiple levels just
like Groups.

I will add helpers to simplify the lookup (give me my local NUMA node if
there's a single one, give me the number of "normal" NUMA nodes so I can
split the machine in parts, ...) but it's too early to add these, we
need more feedback first.



About Slurm-llnl, their code is obsolete anyway. NUMA is inside Socket
in all modern architectures. So they expose a "Socket" resource that is
actually a NUMA node. They used an easy way to detect whether there are
multiple NUMAs per socket or the contrary. We can still detect that in
v2.0, even if the code is different. Once we'll understand what they
*really* want to do, we'll help them update that code to v2.0.

Brice

___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-22 Thread Samuel Thibault
BTW, I find differing information on the soname that hwloc2 will have:
https://github.com/open-mpi/hwloc/wiki/Upgrading-to-v2.0-API mentions
version 6, but VERSION uses 12:0:0 (and thus the soname uses 12).

Samuel
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel


Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v2.0.0-beta1 released

2017-12-22 Thread Samuel Thibault
Hello,

Brice Goglin, on mar. 19 déc. 2017 11:48:39 +0100, wrote:
>   + Memory, I/O and Misc objects are now stored in dedicated children lists,
> not in the usual children list that is now only used for CPU-side objects.
> - hwloc_get_next_child() may still be used to iterate over these 4 lists
>   of children at once.

I hadn't realized this before: so the NUMA-related hierarchy level can
not be easily obtained with hwloc_get_type_depth and such, that's really
a concern. For instance in slurm-llnl one can find

if (hwloc_get_type_depth(topology, HWLOC_OBJ_NODE) >
hwloc_get_type_depth(topology, HWLOC_OBJ_SOCKET)) {

and probably others are doing this too, e.g. looking up from a CPU to
find the NUMA level becomes very different from looking up from a cPU to
find the L3 level etc.

Instead of moving these objects to another place which is very
different to find, can't we rather create another type of object, e.g.
HWLOC_OBJ_MEMORY, to represent the different kinds of memories that can
be found in a given NUMA level, and keep HWLOC_OBJ_NODE as it is?

I'm really afraid that otherwise this change will hurt a lot of people
and remain a pain for programming things for a long time.

Samuel
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel