Le 04/09/2011 23:30, Brice Goglin a écrit :
> Le 04/09/2011 22:35, Ake Sandgren a écrit :
>> On Sun, 2011-09-04 at 22:13 +0200, Brice Goglin wrote: 
>>> Hello,
>>>
>>> Could you log again on this node (with same cgroups enabled), run
>>>     hwloc-gather-topology <name>
>>> and send the resulting <name>.output and <name>.tar.bz2?
>>>
>>> Send them to the hwloc-devel or open a ticket on
>>> https://svn.open-mpi.org/trac/hwloc (or send them to me in private if
>>> you don't want to subscribe).
>> Since it's a bit late here i'm lazy and sending to you directly.
>>
>> Output from both nodes involved in the batchjob
>> slurm -N 2 --ntasks-per-node=1 ... was what i was using.
>>
>> Hope it helps. If not let me know if there is anything else i can do.
>>
>> /Åke S.
> Thanks, I understand the problem but it's not easy to fix. To workaround
> the crash until I come with a real fix, you can comment-out
>     hwloc_topology__set_distance_matrix()
> at the end of look_sysfsnode() in topology-linux.c

Dear Ake,
Could you try the attached patch? It's not optimized, but it's probably
going in the right direction.
(and don't forget to remove the above comment-out if you tried it).
Thanks
Brice

Index: src/topology.c
===================================================================
--- src/topology.c	(révision 3750)
+++ src/topology.c	(copie de travail)
@@ -1856,6 +1856,8 @@
   /*
    * Now that objects are numbered, take distance matrices from backends and put them in the main topology
    */
+  hwloc_restrict_distances(topology, HWLOC_RESTRICT_FLAG_ADAPT_DISTANCES);
+  hwloc_convert_distances_indexes_into_objects(topology);
   hwloc_finalize_logical_distances(topology);

 #  ifdef HWLOC_HAVE_XML

Reply via email to