Le 06/08/2012 23:47, Wheeler, Kyle Bruce a écrit :
> Hello,
>
> I'm failing to understand what hwloc (v1.5) is doing. I'm trying to use 
> hwloc_get_latency() to determine the distance between two cores.
>
> The two cores are on different sockets. According to libnuma's numactl, the 
> latency between the two sockets is 20, whereas between cores on the same 
> socket is 10. According to hwloc-ls -v, the latency is 2.0, whereas between 
> cores on the same socket is 1.0. Thus, I know that hwloc is getting topology 
> information.
>
> However, programmatically, hwloc_get_latency() just returns -1. I tried using 
> hwloc_get_whole_distance_matrix_by_depth(), and found that the distance 
> matrix is only defined for depth 0

Hello Kyle,
The distance/latency API is indeed difficult to understand because it
tries to be (too) generic.
You should not be getting a distance matrix for depth 0 above. You get
one for depth=1 (the depth of NUMAnodes in your topology).

>  which, according to hwloc_obj_type_string(hwloc_get_depth_type(topology, 0)) 
> is "Machine". Now, the documentation for 
> hwloc_get_whole_distance_matrix_by_depth() says it returns "a distances 
> structure containing a matrix with all distances between all objects at the 
> given depth". Given that I only have one object that depth 0 (just the one 
> machine), what does this mean? If I try with depth 1 (aka "NUMANode" or 
> HWLOC_OBJ_NODE), I get NULL back, suggesting that there is no matrix of 
> distances between NUMANodes. Of course, that's not true; hwloc-ls reports 
> that matrix! So what's going on here?

hwloc-ls uses hwloc_get_whole_distance_matrix_by_depth() :

    for (depth = 0; depth < topodepth; depth++) {
      distances = hwloc_get_whole_distance_matrix_by_depth(topology, depth);
      if (!distances || !distances->latency)
        continue;
      printf("latency matrix between %ss (depth %u) by %s indexes:\n",
             hwloc_obj_type_string(hwloc_get_depth_type(topology, depth)),
             depth,
             logical ? "logical" : "physical");
      hwloc_utils_print_distance_matrix(topology, hwloc_get_root_obj(topology), 
distances->nbobjs, depth, distances->latency, logical);
    }


So I don't see how you could be seeing something different. Can you send
your code and your XML topology?

> I would add that the hwloc_distances_s returned by 
> hwloc_get_whole_distance_matrix_by_depth(topology, 0) is: { 0, 0, 0x0, 0, 0 }

That's strange, I need to look at this.

> And why is hwloc_get_latency() failing?

If you pass some Core objects to get_latency(), it's expected that it
fails because the topology only has latencies between NUMA nodes. You
should walk up the object parent links until you find NUMAnode objects.
We've been thinking of handling this case inside hwloc but we're not
sure it's always a good idea to do so.


We have several tickets open against the distance code. We know it's not
perfect so we'll be happy to hear your feedback. There are so many
things involved in this case that it's hard to figure out what's
actually important to users.

Brice

Reply via email to