Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
- "Samuel Thibault"wrote: > What do you mean by "module support"? http://modules.sourceforge.net/ They make managing multiple software installations on clusters much much easier.. -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency
Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
Michael Raymond, le Mon 30 Nov 2009 09:56:23 -0600, a écrit : > Software modules, eg on SuSE see the Modules RPM. The way that a lot > of software installations used to be managed was to throw them all under > /usr in the standard directories. Ah, ok, right. (module is so generic a name, yesterday I was working on multiboot modules :) ) Samuel
Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
Software modules, eg on SuSE see the Modules RPM. The way that a lot of software installations used to be managed was to throw them all under /usr in the standard directories. This kept library paths and include in default places but breaks down when you have multiple installations of a product, conflicting names, want to keep installations on central file servers, etc. Modules let you install the package wherever is convenient and then easy load the environment when you actually want the package. For example: module load intel-compilers-11 module load mpt/1.25 module unload mpt/1.25 intel-compilers-11 module load intel-compilers-9 module load mpt/1.22 module load totalview This easily lets me modify which versions of the various products I'm using. The `module load` command modifies my shell's LD_LIBRARY_PATH, LD_PATH, PATH, MANPATH, MPIROOT, CPATH, etc variables. For hwloc I could: module load hwloc lstopo This keeps hwloc out of /usr thus keeping the Linux File Hierarchy Standard (LFHS) happy. Samuel Thibault wrote: > Michael Raymond, le Mon 30 Nov 2009 09:23:02 -0600, a écrit : >> At the moment I'm thinking SLES11 (and RHEL6) RPMs of 0.9.3 / TOT >> installed in /opt[/sgi]/hwloc. I'd also add module support. > > What do you mean by "module support"? > > Samuel > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Michael A. Raymond Message Passing Toolkit Team Silicon Graphics Inc (651) 683-3434
Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
Michael Raymond, le Mon 30 Nov 2009 09:23:02 -0600, a écrit : > At the moment I'm thinking SLES11 (and RHEL6) RPMs of 0.9.3 / TOT > installed in /opt[/sgi]/hwloc. I'd also add module support. What do you mean by "module support"? Samuel
Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
I don't know what has been announced and what hasn't, but hwloc nicely solves a problem for us and I intend to make sure that it works on all our hardware. On 20 Nov 2009, at 23:55 , Chris Samuel wrote: Hi Michael, - "Michael Raymond"wrote: Our architecture has blades with two Nehalems on them, and the blades are connected together in a CC-NUMA fashion. I've heard on the grapevine that there will be memory only blades too, which will have a Nehalem EX on them but with all cores disabled (just its memory controller active instead). Are you able to test on that sort of config too ? cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency ___ hwloc-devel mailing list hwloc-de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel Michael Raymond Message Passing Toolkit Team Silicon Graphics Inc (651) 683-3434
Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
Hi Michael, - "Michael Raymond"wrote: > Our architecture has blades with two Nehalems on > them, and the blades are connected together in a > CC-NUMA fashion. I've heard on the grapevine that there will be memory only blades too, which will have a Nehalem EX on them but with all cores disabled (just its memory controller active instead). Are you able to test on that sort of config too ? cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency
Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
Samuel Thibault, le Fri 20 Nov 2009 15:54:43 +0100, a écrit : > Introduce several numagroup types? How many? That's not easy to > answer. Or maybe we can add an "ignore" configuration function that also takes a pair of depth parameters to ignore a range of depths for a given type. Here you would ignore the NUMA level and the NUMAGROUP depths different from the one you're interested in, and this also permits to e.g. ignore the L1 and L2 caches but not the L3 cache, etc. Samuel
Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
NUMAGROUP sounds fine to me. Misc appears to be working for me though and I'd like to start shipping hwloc on all our boxes in the next few months. Samuel Thibault wrote: > Hello, > > Michael Raymond, le Fri 20 Nov 2009 08:43:10 -0600, a écrit : >> In one pattern I might want to place processes on all the Cores in a >> Misc and then move to the next Misc. A topology tree that looks like >> System -> Misc -> Core makes that easy. Having Nodes in there just adds >> unneeded complexity. > > Ok, I see. What I'd see is instead of using the MISC type for numa > groups, introducing a NUMAGROUP object type. In that case, ignoring > NUMA but not NUMAGROUP makes sense and would provide that result. > > However, with a better version of hwloc you may still get > > System -> Numagroup -> Numagroup -> Core > > because e.g. thanks to more precise distances hwloc has noticed that the > first Numagroup level itself is hierarchical, forming another Numagroup > level. > > Introduce several numagroup types? How many? That's not easy to > answer. > > Samuel > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Michael A. Raymond Message Passing Toolkit Team Silicon Graphics Inc (651) 683-3434
Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
Yes. Here's output from a small one of those with only 2 pre-release blades. Brice Goglin wrote: > Michael Raymond wrote: >> Our architecture has blades with two Nehalems on them, and the blades >> are connected together in a CC-NUMA fashion. Each Nehalem shows up as a >> Node and the blades show up as Miscs. > > So you're running on the Altix UV with Nehalem-EX that SGI announced at > SC? Is there any chance we get the tarball generated by > tests/linux/gather_topology.sh? :) > > Brice > > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Michael A. Raymond Message Passing Toolkit Team Silicon Graphics Inc (651) 683-3434 foo.tar.gz Description: GNU Zip compressed data
Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
Michael Raymond wrote: > Our architecture has blades with two Nehalems on them, and the blades > are connected together in a CC-NUMA fashion. Each Nehalem shows up as a > Node and the blades show up as Miscs. So you're running on the Altix UV with Nehalem-EX that SGI announced at SC? Is there any chance we get the tarball generated by tests/linux/gather_topology.sh? :) Brice
Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
Michael Raymond, le Fri 20 Nov 2009 08:18:53 -0600, a écrit : > It looks like I spoke too soon on the fix. That solves the problem > but it keeps the Miscs from being created and in some situations I'd > like to keep the Miscs but not the nodes. Oh? In which situation? Can't you just ignore them when parsing the tree? What I don't see is why you would care about the structure that nodes provide but not about the nodes themselves. Your patch makes the code quite convoluted :) Samuel
Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2
Michael Raymond, le Thu 19 Nov 2009 14:33:49 -0600, a écrit : > --- hwloc-0.9.2/src/topology-linux.c 2009-11-03 16:40:31.0 -0600 > +++ hwloc-new//src/topology-linux.c 2009-11-19 14:20:43.630035434 -0600 > @@ -536,6 +536,10 @@ >struct dirent *dirent; >hwloc_obj_t node; > > + if (topology->ignored_types[HWLOC_OBJ_NODE] == > HWLOC_IGNORE_TYPE_ALWAYS) { > + return; > + } > + >dir = hwloc_opendir(path, topology->backend_params.sysfs.root_fd); >if (dir) > { Mmm, indeed. And it will happen on other OSes where we get the distances too, e.g. Solaris. Does the attached more generic patch properly fixes it too? > Also I'm concerned about the value of CPUSET_MASK_LEN in > hwloc_admin_disable_set_from_cpuset(). It's only 64 characters but our > Linux boxes can have to 2048 processors. I don't think there's any harm > in bumping that up a little. Mmm, even better, we can avoid using a constant size completely, I've commited a fix. Samuel Index: src/topology.c === --- src/topology.c (révision 1364) +++ src/topology.c (copie de travail) @@ -298,6 +298,9 @@ if (getenv("HWLOC_IGNORE_DISTANCES")) return; + if (topology->ignored_types[HWLOC_OBJ_NODE] == HWLOC_IGNORE_TYPE_ALWAYS) +return; + #ifdef HWLOC_DEBUG hwloc_debug("node distance matrix:\n"); hwloc_debug(" ");