Re: [hwloc-devel] Bug report: topology strange on SGI UltraViolet
Brice Goglin, le Thu 29 Jul 2010 13:01:10 +0200, a écrit : > > To my opinion, the job hwloc does in forming "groups" is basically OK. > > Also the group content makes sense. > > We're lucky that it somehow matches the physical ordering, > but it is really meaningless given the distance matrix. Well, it does, even if it is arbitrary, it corresponds to distances and can be useful for binding applications. It could be an optional module in hwloc. Samuel
Re: [hwloc-devel] Bug report: topology strange on SGI UltraViolet
> To my opinion, the job hwloc does in forming "groups" is basically OK. > Also the group content makes sense. We're lucky that it somehow matches the physical ordering, but it is really meaningless given the distance matrix. That's why Group2 matches nothing in reality. Group3 matches nothing as well from what I understand. This meaningless part will disappear in hwloc 1.1. You will only see 24 Group0 objects. > The only "strange" thing is, that the grouping code becomes disturbed on > this special machine, which only contains 3/4 of the NUMA nodes that are > found in a fully equipped rack. It's an artifact of the aforementioned meaningless grouping code. If you have 2^N objects with such a distance matrix, the grouping code will create a binary tree. If it's not 2^N, you'll artifact like here since the binary tree isn't complete. > Physically, the 2nd enclosure is only > half filled. I'm wondering what would happen in a fully equipped rack. > > Will there be 4xGroup3, or 2xGroup4 with 2xGroup3 each? From my feeling > the latter should be happen. Yes, the latter would happen. Such a distance matrix always groups pairs of consecutive objects starting from #0. So you'll get two pairs of Group3 grouped in 2 Group4 objects. Brice
Re: [hwloc-devel] Bug report: topology strange on SGI UltraViolet
Le 28/07/2010 18:53, Brice Goglin a écrit : > Actually no, but it's very hard to see :) > lstopo - | egrep "(NUMA|Group)" > shows that Group4#0 only contains Group3#0 and #1. > Group3#2 is directly a child of the machine (the indentation is smaller). > > For the record, this is caused by the fact that Group objects are ignored when they bring no new hierarchy (when they have a single child or are the only child of another object). Group4#1 is created first and removed later. I don't think there's any way to change this default behavior with the current API. Maybe we should have something intermediate such as "ignore what does not bring no new hierarchy if you can remove the entire level" so that we don't get only half of Group4 level. Brice
Re: [hwloc-devel] Bug report: topology strange on SGI UltraViolet
Le 28/07/2010 20:59, Bernd Kallies a écrit : > So it seems to me that you basically get a distance matrix of PU objects > NUMA node objects actually. That's what Linux and Solaris report. > from the system (the machine vendor), and probably you do agglomerative > average linkage cluster analysis on it to determine the number and > hierarchy of HWLOC_OBJ_GROUP objects (beyond what can be named by some > hardware building block like core or cache etc). Is this right? > I'm wondering if this is the right approach. Did you try other distance > functions (e.g. single linkage)? > In 1.0.x we look at "complete graphs with minimal distances" and then at "transitive graphs with minimal distances". One problem with this old code is: if finds that Group0#0 and #1 have minimal distance between them (22) but it ignores the fact that Group0#2 is also at the same distance from #1. And so on. This code actually gives completely invalid groups on some strange HP machines. In trunk, the code was reworked/cleaned to only look for transitive graphs. Given your distance matrix, everybody is transitively connected to everybody through one or several minimal distance links, so everybody is grouped together in the end. > Besides that, and from the viewpoint of a tree representation of the > result of clustering, I would expect that every pair of two objects of > same type have common anchestors of the same type. For the given UV > topology I would not expect that there are two Group3 that have a Group4 > ancestor, while the 3rd Group3 is direct child of Machine. I would > expect EITHER that the 3rd Group3 is also child of a Group4 (maybe a > second one), OR that there is no Group4. > Right, I'll see if I can fix this without changing to many things in the 1.0 branch. Brice
Re: [hwloc-devel] Bug report: topology strange on SGI UltraViolet
On Wed, 2010-07-28 at 20:36 +0200, Brice Goglin wrote: > Le 28/07/2010 18:53, Brice Goglin a écrit : > > Distance matrix between Group0 objects: > > 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 > > 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 > > 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 > > 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 > > 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 > > 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 > > 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 > > 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 > > 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 > > 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 > > 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 > > 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 > > 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 > > 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 > > 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 > > 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 > > 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 > > 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 > > 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 > > 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 > > 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 > > 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 > > 64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 > > 66 64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 > > > > Between Group1: > > 17 24 28 32 36 40 44 48 52 56 60 64 > > 24 17 24 28 32 36 40 44 48 52 56 60 > > 28 24 17 24 28 32 36 40 44 48 52 56 > > 32 28 24 17 24 28 32 36 40 44 48 52 > > 36 32 28 24 17 24 28 32 36 40 44 48 > > 40 36 32 28 24 17 24 28 32 36 40 44 > > 44 40 36 32 28 24 17 24 28 32 36 40 > > 48 44 40 36 32 28 24 17 24 28 32 36 > > 52 48 44 40 36 32 28 24 17 24 28 32 > > 56 52 48 44 40 36 32 28 24 17 24 28 > > 60 56 52 48 44 40 36 32 28 24 17 24 > > 64 60 56 52 48 44 40 36 32 28 24 17 > > > > Group2: > > 20 28 36 44 52 60 > > 28 20 28 36 44 52 > > 36 28 20 28 36 44 > > 44 36 28 20 28 36 > > 52 44 36 28 20 28 > > 60 52 44 36 28 20 > > > > Group3: > > 24 36 52 > > 36 24 36 > > 52 36 24 > > > > Actually, all these distance matrices (except the NUMA nodes' one, the > one not included above) show a ring topology without the link between > the first and the last object. So grouping makes no sense there. hwloc > 1.0.x groups object #2N with object #2N+1 because its grouping algorithm > isn't very clever. It could also link #2N-1 with #2N, it wouldn't be > worse. The grouping algorithm is more clever in svn trunk. It detects > this ring properly and does not group anything (except pairs of NUMA node). > > It's actually surprising that this machine doesn't show a better > distance matrix. I guess SGI still has a hypercube or whatever nice > topology interconnected IRUs and blades. Older Altix machines had very > nice distance matrices were we would detect multiple levels of groups > that really showed the physical hierarchy of blades/IRUs/... I wonder if > your SGI BIOS is buggy :) It would not be the first case of a buggy BIOS. I'll forward our discussion to our SGI representatives and Alexis Cousein and Rüdiger Wolff from SGI (M. Raymond may know him). Let's see what they say. We are one of the early UltraViolet customers. >From my point of view, having the groupings beyond the blade level in the hwloc topology is good for our purposes. We want to use the hwloc topology to calculate pinning maps for MPI applications. Currently we use the distance map got via hwloc to scatter tasks according to a maximum-distance approach between HWLOC_OBJ_PU objects. I also gave our current algorithm to the MVAPICH2 dev team, which wants to put it into the next 1.5.x release. With the example UV topology we discuss here, our pinning map starts with PU objects os_index 0,256,128,320,... that means 1st task on 1st CPU of 1st Group3, 2nd task on 1st CPU of 3rd Group3 (which is the lonely one), 3rd task on 1st CPU of 2nd Group3. Having in mind that an MPI application that got all CPUs of this topology may start only 3 tasks and each task allocates a lot of memory far beyond than what a single NUMA node has directly attached, then reducing the topology to the NUMA-node or blade level would be a bad idea, because then our pinning map would start with 0,16,32,48,... (when having only the Group0 level but not the higher groupings). Comments appreciated !!! > Michael Raymond, anything
Re: [hwloc-devel] Bug report: topology strange on SGI UltraViolet
Le 28/07/2010 18:53, Brice Goglin a écrit : > Distance matrix between Group0 objects: > 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 > 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 > 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 > 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 > 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 > 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 > 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 > 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 > 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 > 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 48 > 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 46 > 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 44 > 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 42 > 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 40 > 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 38 > 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 36 > 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 34 > 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 32 > 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 30 > 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 28 > 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 26 > 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 24 > 64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 22 > 66 64 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 13 > > Between Group1: > 17 24 28 32 36 40 44 48 52 56 60 64 > 24 17 24 28 32 36 40 44 48 52 56 60 > 28 24 17 24 28 32 36 40 44 48 52 56 > 32 28 24 17 24 28 32 36 40 44 48 52 > 36 32 28 24 17 24 28 32 36 40 44 48 > 40 36 32 28 24 17 24 28 32 36 40 44 > 44 40 36 32 28 24 17 24 28 32 36 40 > 48 44 40 36 32 28 24 17 24 28 32 36 > 52 48 44 40 36 32 28 24 17 24 28 32 > 56 52 48 44 40 36 32 28 24 17 24 28 > 60 56 52 48 44 40 36 32 28 24 17 24 > 64 60 56 52 48 44 40 36 32 28 24 17 > > Group2: > 20 28 36 44 52 60 > 28 20 28 36 44 52 > 36 28 20 28 36 44 > 44 36 28 20 28 36 > 52 44 36 28 20 28 > 60 52 44 36 28 20 > > Group3: > 24 36 52 > 36 24 36 > 52 36 24 > Actually, all these distance matrices (except the NUMA nodes' one, the one not included above) show a ring topology without the link between the first and the last object. So grouping makes no sense there. hwloc 1.0.x groups object #2N with object #2N+1 because its grouping algorithm isn't very clever. It could also link #2N-1 with #2N, it wouldn't be worse. The grouping algorithm is more clever in svn trunk. It detects this ring properly and does not group anything (except pairs of NUMA node). It's actually surprising that this machine doesn't show a better distance matrix. I guess SGI still has a hypercube or whatever nice topology interconnected IRUs and blades. Older Altix machines had very nice distance matrices were we would detect multiple levels of groups that really showed the physical hierarchy of blades/IRUs/... I wonder if your SGI BIOS is buggy :) Michael Raymond, anything to say about this? Brice
Re: [hwloc-devel] Bug report: topology strange on SGI UltraViolet
Bernd Kallies, le Wed 28 Jul 2010 18:09:28 +0200, a écrit : > > > topology is understandeable. I'm wondering about "Group4", which > > > contains the three "Group3" objects. lstopo should print "1534GB" > > > instead of "1022GB". There is only one "Group4" object, and there are no > > > other direct children of the root object. > > > > Indeed, there's something wrong. > > Can you send the output of tests/linux/gather_topology.sh so that I try > > to debug this from here? > > Is attached. Actually the Group4 object doesn't contain the three Group3 objects: € grep 'Group[34]' gather-topology-uv.tar.gz.output Group4 #0 (total=1071374336KB) Group3 #0 (total=534634496KB) Group3 #1 (total=536739840KB) Group3 #2 (total=536739840KB) You can also see it using lstopo --gridsize 2 --fontsize 5 for instance. So it seems all good to me. > We have one UV rack, which is filled with 3/4 of the max. number of > blades. According to the specs, two NUMA nodes form one "blade". This > level corresponds to "Group0" in the hwloc topology. Two blades are > cross-linked via the NUMAlink, forming "paired nodes" = "Group1". What > "Group2" might correspond to - I don't know. "Group3" corresponds to one > "chassis" or IRU. "Group4" may be an "enclosure", and "Machine" is the > "rack". Wow, it's impressive that hwloc actually finds out all this just from the distance matrix :) > From my opinion the hwloc topology for our machine should contain 2x > Group4. hwloc can not find Group4: it finds out groups from the distance matrix. Since there are no two Group3 objects to group, it doesn't know some notion of Group4 exists there. > However, when walking the topology tree via the API, then it seems to > contain correct details. Yep :) Samuel