Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-30 Thread Chris Samuel

- "Samuel Thibault"  wrote:

> What do you mean by "module support"?

http://modules.sourceforge.net/

They make managing multiple software installations
on clusters much much easier..

-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-30 Thread Samuel Thibault
Michael Raymond, le Mon 30 Nov 2009 09:56:23 -0600, a écrit :
>   Software modules, eg on SuSE see the Modules RPM.  The way that a lot
> of software installations used to be managed was to throw them all under
> /usr in the standard directories.

Ah, ok, right.
(module is so generic a name, yesterday I was working on multiboot
modules :) )

Samuel


Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-30 Thread Michael Raymond
  Software modules, eg on SuSE see the Modules RPM.  The way that a lot
of software installations used to be managed was to throw them all under
/usr in the standard directories.  This kept library paths and include
in default places but breaks down when you have multiple installations
of a product, conflicting names, want to keep installations on central
file servers, etc.  Modules let you install the package wherever is
convenient and then easy load the environment when you actually want the
package.

  For example:

module load intel-compilers-11
module load mpt/1.25


module unload mpt/1.25 intel-compilers-11
module load intel-compilers-9
module load mpt/1.22
module load totalview


  This easily lets me modify which versions of the various products I'm
using.  The `module load` command modifies my shell's LD_LIBRARY_PATH,
LD_PATH, PATH, MANPATH, MPIROOT, CPATH, etc variables.  For hwloc I could:

module load hwloc
lstopo

  This keeps hwloc out of /usr thus keeping the Linux File Hierarchy
Standard (LFHS) happy.

Samuel Thibault wrote:
> Michael Raymond, le Mon 30 Nov 2009 09:23:02 -0600, a écrit :
>>   At the moment I'm thinking SLES11 (and RHEL6) RPMs of 0.9.3 / TOT
>> installed in /opt[/sgi]/hwloc.  I'd also add module support.
> 
> What do you mean by "module support"?
> 
> Samuel
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel

-- 
Michael A. Raymond
Message Passing Toolkit Team
Silicon Graphics Inc
(651) 683-3434



Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-30 Thread Samuel Thibault
Michael Raymond, le Mon 30 Nov 2009 09:23:02 -0600, a écrit :
>   At the moment I'm thinking SLES11 (and RHEL6) RPMs of 0.9.3 / TOT
> installed in /opt[/sgi]/hwloc.  I'd also add module support.

What do you mean by "module support"?

Samuel


Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-21 Thread Michael Raymond
  I don't know what has been announced and what hasn't, but hwloc  
nicely solves a problem for us and I intend to make sure that it works  
on all our hardware.


On 20 Nov 2009, at 23:55 , Chris Samuel wrote:


Hi Michael,

- "Michael Raymond"  wrote:


Our architecture has blades with two Nehalems on
them, and the blades are connected together in a
CC-NUMA fashion.


I've heard on the grapevine that there will be memory
only blades too, which will have a Nehalem EX on them
but with all cores disabled (just its memory controller
active instead).

Are you able to test on that sort of config too ?

cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
___
hwloc-devel mailing list
hwloc-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel



Michael Raymond
Message Passing Toolkit Team
Silicon Graphics Inc
(651) 683-3434




Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-21 Thread Chris Samuel
Hi Michael,

- "Michael Raymond"  wrote:

> Our architecture has blades with two Nehalems on
> them, and the blades are connected together in a
> CC-NUMA fashion.

I've heard on the grapevine that there will be memory
only blades too, which will have a Nehalem EX on them
but with all cores disabled (just its memory controller
active instead).

Are you able to test on that sort of config too ?

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-20 Thread Samuel Thibault
Samuel Thibault, le Fri 20 Nov 2009 15:54:43 +0100, a écrit :
> Introduce several numagroup types?  How many?  That's not easy to
> answer.

Or maybe we can add an "ignore" configuration function that also takes a
pair of depth parameters to ignore a range of depths for a given type.
Here you would ignore the NUMA level and the NUMAGROUP depths different
from the one you're interested in, and this also permits to e.g. ignore
the L1 and L2 caches but not the L3 cache, etc.

Samuel


Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-20 Thread Michael Raymond
  NUMAGROUP sounds fine to me.  Misc appears to be working for me though
and I'd like to start shipping hwloc on all our boxes in the next few
months.

Samuel Thibault wrote:
> Hello,
> 
> Michael Raymond, le Fri 20 Nov 2009 08:43:10 -0600, a écrit :
>>   In one pattern I might want to place processes on all the Cores in a
>> Misc and then move to the next Misc.  A topology tree that looks like
>> System -> Misc -> Core makes that easy.  Having Nodes in there just adds
>> unneeded complexity.
> 
> Ok, I see.  What I'd see is instead of using the MISC type for numa
> groups, introducing a NUMAGROUP object type.  In that case, ignoring
> NUMA but not NUMAGROUP makes sense and would provide that result.
> 
> However, with a better version of hwloc you may still get
> 
> System -> Numagroup -> Numagroup -> Core
> 
> because e.g. thanks to more precise distances hwloc has noticed that the
> first Numagroup level itself is hierarchical, forming another Numagroup
> level.
> 
> Introduce several numagroup types?  How many?  That's not easy to
> answer.
> 
> Samuel
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel

-- 
Michael A. Raymond
Message Passing Toolkit Team
Silicon Graphics Inc
(651) 683-3434



Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-20 Thread Michael Raymond
  Yes.  Here's output from a small one of those with only 2 pre-release
blades.

Brice Goglin wrote:
> Michael Raymond wrote:
>>   Our architecture has blades with two Nehalems on them, and the blades
>> are connected together in a CC-NUMA fashion.  Each Nehalem shows up as a
>> Node and the blades show up as Miscs.
> 
> So you're running on the Altix UV with Nehalem-EX that SGI announced at
> SC? Is there any chance we get the tarball generated by
> tests/linux/gather_topology.sh? :)
> 
> Brice
> 
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel

-- 
Michael A. Raymond
Message Passing Toolkit Team
Silicon Graphics Inc
(651) 683-3434



foo.tar.gz
Description: GNU Zip compressed data


Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-20 Thread Brice Goglin
Michael Raymond wrote:
>   Our architecture has blades with two Nehalems on them, and the blades
> are connected together in a CC-NUMA fashion.  Each Nehalem shows up as a
> Node and the blades show up as Miscs.

So you're running on the Altix UV with Nehalem-EX that SGI announced at
SC? Is there any chance we get the tarball generated by
tests/linux/gather_topology.sh? :)

Brice



Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-20 Thread Samuel Thibault
Michael Raymond, le Fri 20 Nov 2009 08:18:53 -0600, a écrit :
>   It looks like I spoke too soon on the fix.  That solves the problem
> but it keeps the Miscs from being created and in some situations I'd
> like to keep the Miscs but not the nodes.

Oh?  In which situation?  Can't you just ignore them when parsing the
tree?

What I don't see is why you would care about the structure that nodes
provide but not about the nodes themselves.  Your patch makes the code
quite convoluted :)

Samuel


Re: [hwloc-devel] Crash with ignoring HWLOC_OBJ_NODE in 0.9.2

2009-11-19 Thread Samuel Thibault
Michael Raymond, le Thu 19 Nov 2009 14:33:49 -0600, a écrit :
> --- hwloc-0.9.2/src/topology-linux.c  2009-11-03 16:40:31.0 -0600
> +++ hwloc-new//src/topology-linux.c   2009-11-19 14:20:43.630035434 -0600
> @@ -536,6 +536,10 @@
>struct dirent *dirent;
>hwloc_obj_t node;
> 
> +  if (topology->ignored_types[HWLOC_OBJ_NODE] ==
> HWLOC_IGNORE_TYPE_ALWAYS) {
> +   return;
> +  }
> +
>dir = hwloc_opendir(path, topology->backend_params.sysfs.root_fd);
>if (dir)
>  {

Mmm, indeed.  And it will happen on other OSes where we get the
distances too, e.g. Solaris.  Does the attached more generic patch
properly fixes it too?

>   Also I'm concerned about the value of CPUSET_MASK_LEN in
> hwloc_admin_disable_set_from_cpuset().  It's only 64 characters but our
> Linux boxes can have to 2048 processors.  I don't think there's any harm
> in bumping that up a little.

Mmm, even better, we can avoid using a constant size completely, I've
commited a fix.

Samuel
Index: src/topology.c
===
--- src/topology.c  (révision 1364)
+++ src/topology.c  (copie de travail)
@@ -298,6 +298,9 @@
   if (getenv("HWLOC_IGNORE_DISTANCES"))
 return;

+  if (topology->ignored_types[HWLOC_OBJ_NODE] == HWLOC_IGNORE_TYPE_ALWAYS)
+return;
+
 #ifdef HWLOC_DEBUG
   hwloc_debug("node distance matrix:\n");
   hwloc_debug("   ");