Re: [hwloc-users] Using distances

2012-04-21 Thread Brice Goglin

On 21/04/2012 13:15, Jeffrey Squyres wrote:

On Apr 21, 2012, at 7:09 AM, Brice Goglin wrote:


I assume you have the entire distance (latency) matrix between all NUMA nodes 
as usually reported by the BIOS.

const struct hwloc_distance_s *distances = 
hwloc_get_whole_distance_matrix_by_type(topology, HWLOC_OBJ_NODE);
assert(distances);
assert(distances->latency);

Is this stored on the topology object?


No it's stored in the object that covers all objects connected by the 
distance matrix. So usually in the root object.



Hence, if this distance data is already covered by the XML export/import, then 
I should have this data.


Yes it should be there.


Now distances->latency[a+b*distances->nbobjs] contains the latency between NUMA 
nodes whose *logical* indexes are a and b (it may be asymmetrical).

Now get the NUMA node object close to your PUs and the NUMA objects close to each 
OFED device, take their ->logical_index and you'll get the latencies.

Ah, ok.  This is what I didn't understand from the docs -- is there no distance 
to actual PCI devices?  I.e., distance is only measured between NUMA nodes?

I ask because the functions allow measuring distance by depth and type -- are 
those effectively ignored, and really all you can check is the distance between 
NUMA nodes?


You can have distance matrices between any object sets of any 
type/depth. Depends what the BIOS reports or what the user adds. The 
BIOS usually only reports NUMA node distances.


We could extend them by saying that the distance between any child of 
NUMA node X and any child of NUMA node Y is equal to the distance 
between NUMA nodes X and Y, but we don't do that.


One reason is that the current distance stuff lets the user add a 
distance matrix between NUMA nodes and another one between sockets, even 
if they are incompatible. When this happens, which one do you use to 
generate the distance between two cores?


There are some tickets in trac they will help clarifying all this mess.

Brice




Re: [hwloc-users] Using distances

2012-04-21 Thread Jeffrey Squyres
On Apr 21, 2012, at 7:09 AM, Brice Goglin wrote:

> I assume you have the entire distance (latency) matrix between all NUMA nodes 
> as usually reported by the BIOS.
> 
> const struct hwloc_distance_s *distances = 
> hwloc_get_whole_distance_matrix_by_type(topology, HWLOC_OBJ_NODE);
> assert(distances);
> assert(distances->latency);

Is this stored on the topology object?

I ask because we've already done stuff to ensure that there's only 1 hwloc 
discovery per machine.  If you recall, we do that in the ORTE daemon, export it 
to XML, and then locally send it to each MPI process on the same machine.  
They, in turn, import the XML to create their own topology object.

Hence, if this distance data is already covered by the XML export/import, then 
I should have this data.

> Now distances->latency[a+b*distances->nbobjs] contains the latency between 
> NUMA nodes whose *logical* indexes are a and b (it may be asymmetrical).
> 
> Now get the NUMA node object close to your PUs and the NUMA objects close to 
> each OFED device, take their ->logical_index and you'll get the latencies.

Ah, ok.  This is what I didn't understand from the docs -- is there no distance 
to actual PCI devices?  I.e., distance is only measured between NUMA nodes?

I ask because the functions allow measuring distance by depth and type -- are 
those effectively ignored, and really all you can check is the distance between 
NUMA nodes?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-users] Using distances

2012-04-21 Thread Brice Goglin

On 21/04/2012 12:23, Jeffrey Squyres wrote:

I'm trying to use hwloc distances in Open MPI (e.g., find the distance from 
each OpenFabrics device to the PU(s) where this process is bound), and I'm a 
bit confused by the distances documentation.

If I have a WHOLE_SYSTEM topology, and I know that this process is bound to one 
or more PUs (e.g., both PUs in a core), can you summarize how I use the hwloc 
distances functionality to determine the distance from my process to each of 
the OF devices?



I assume you have the entire distance (latency) matrix between all NUMA 
nodes as usually reported by the BIOS.


const struct hwloc_distance_s *distances = 
hwloc_get_whole_distance_matrix_by_type(topology, HWLOC_OBJ_NODE);

assert(distances);
assert(distances->latency);

Now distances->latency[a+b*distances->nbobjs] contains the latency 
between NUMA nodes whose *logical* indexes are a and b (it may be 
asymmetrical).



Now get the NUMA node object close to your PUs and the NUMA objects 
close to each OFED device, take their ->logical_index and you'll get the 
latencies.


Brice