Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-30 Thread Brice Goglin
Le 30/11/2011 08:44, Stefan Eilemann a écrit :
> Let me know if I can help. We would be quite interested in this feature.

You can help by asking the relevant people for help :)
* ask the OpenCL board to add an device query property that tells us the
locality of a device. If they return the BusID of a PCI device, that's
OK (that's what Nvidia added to CUDA for us). If they give the set of
closest CPUs or NUMA nodes, it would help too.
* ask the Xorg guys to give you a way to retrieve the PCI BusID from a X
display at runtime.

> Ideally there should be the following fields. I'll use the Equalizer terms, 
> but feel free to use others if you don't like them:
>
> - port: the X server number or unused (Windows/Mac)
> - GL device: The X screen, affinity device (Windows) or CGL renderer ID (OS X)
> - Cuda/OpenCL device
>
> The latter is interesting to establish a mapping between GL and Cuda device 
> numbers, which are not necessarily symmetric.

I created a trac ticket about this
https://svn.open-mpi.org/trac/hwloc/ticket/54

Brice



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Jeff Squyres
On Nov 29, 2011, at 1:04 PM, Brice Goglin wrote:

> "XML output" should be "XML input/output" or "XML support".

Done:

-
Hwloc optional build support status (more details can be found above):

Probe / display PCI devices: yes
Graphical output (Cairo):yes
XML input / output:  full
Memory support:  binding, set policy, migrate pages
-

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin

> Hwloc optional build support status (more details can be found above):
>
> Probe / display PCI devices: yes
> Graphical output (Cairo):yes
> XML output:  full

"XML output" should be "XML input/output" or "XML support".

> Memory support:  binding, set policy, migrate pages

Looks ok otherwise.

Brice



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Jeff Squyres
On Nov 29, 2011, at 12:01 PM, Brice Goglin wrote:

> Yes, always installed. There are some configure checks for verbs, but
> it's only used for enabling verbs-related helper testing.

Ok, how's this for output at the end of configure? 

Linux:

-
Hwloc optional build support status (more details can be found above):

Probe / display PCI devices: yes
Graphical output (Cairo):yes
XML output:  full
Memory support:  binding, set policy, migrate pages
-

OS X:

-
Hwloc optional build support status (more details can be found above):

Probe / display PCI devices: no
Graphical output (Cairo):yes
XML output:  full
Memory support:  none
-

XML support will show "basic" if libxml2 is not found.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Jeff Squyres
On Nov 29, 2011, at 11:53 AM, Brice Goglin wrote:

>> What about MX, verbs, Cuda, ...?
> 
> MX and verbs are not used internally, we just have public helpers to
> interoperate with them (and tests).

I forget -- are the helpers installed/available even if the MX 
headers/libraries are not found at configure time?  (ditto for verbs, cuda, 
etc.)

> Same for cuda in trunk (until Samuel's cuda branch gets merged).
> 
> Brice
> 
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Guy Streeter
On 11/29/2011 02:57 AM, Stefan Eilemann wrote:
> Bonjour Brice,
> 
> On 29. Nov 2011, at 9:45, Brice Goglin wrote:
> 
>> hwloc 1.3 already has support for PCI device detection. These new
>> objects contain a "class" field that can help you know if it's a NIC/GPU/...
> 
> Ok, time to upgrade my installation. The cluster has RHEL6.1 which ships with 
> an older version.
> 

There is a request pending to have hwloc updated to 1.3 in RHEL6. I do not yet
have a schedule for it.

--Guy



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Stefan Eilemann
Hi Jeff,

On 29. Nov 2011, at 15:28, Jeff Squyres wrote:

>> I think messages of found/not found optional modules could be more prominent 
>> at the end of the configure process.
> 
> FWIW, I've traditionally been against such things for two reasons:

Your call, really. The information is there and not too hard to find, but I 
missed it on the first run. Most software I know provides this in a very 
concise list at the end (Supported: A B C\n Unsupported: D E F).


Cheers,

Stefan.
-- 
http://www.eyescale.ch
http://www.equalizergraphics.com
http://www.linkedin.com/in/eilemann






Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Jeff Squyres
On Nov 29, 2011, at 7:25 AM, Stefan Eilemann wrote:

>> You are probably missing the libpci-devel package.
> 
> Thanks, that either doesn't exist or wasn't installed on Redhat. It works now.
> 
> I think messages of found/not found optional modules could be more prominent 
> at the end of the configure process.

FWIW, I've traditionally been against such things for two reasons:

1. The information *was* displayed above (i.e., that pci-devel wasn't 
found/wasn't usable/whatever).  I realize that most people don't read the 
stdout of configure at all, but all the information you need is already there.

2. A list of what will/will not be built at the end tends to grow lengthy such 
that it dilutes the value of repeating the information at the end.

That being said, I can *somewhat* see the value of displaying a user-friendly 
"PCI device support will not be built" vs. the output of a configure test, 
which might be somewhat obscure.  However, in hwloc's case, the configure test 
output is pretty self-evident.  Examples:

checking for PCI... no
checking pci/pci.h usability... no
checking pci/pci.h presence... no
checking for pci/pci.h... no
checking for LIBXML2... yes
checking for xmlNewDoc... yes
checking for final LIBXML2 support... yes

A simple string search for "pci" and "xml" will find these lines in the 
configure output.  Assumedly, if you're building from source, you've likely got 
at least *some* experience and it shouldn't be unreasonable to ask you to go 
look in the output of configure.

Don't get me wrong -- I'm not dead-set against a listing at the bottom.  I just 
find it redundant and somewhat of a maintenance hassle.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Stefan Eilemann

On 29. Nov 2011, at 11:41, Samuel Thibault wrote:

> You are probably missing the libpci-devel package.

Thanks, that either doesn't exist or wasn't installed on Redhat. It works now.

I think messages of found/not found optional modules could be more prominent at 
the end of the configure process.


Cheers,

Stefan.
-- 
http://www.eyescale.ch
http://www.equalizergraphics.com
http://www.linkedin.com/in/eilemann






Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Samuel Thibault
Stefan Eilemann, le Tue 29 Nov 2011 11:40:18 +0100, a écrit :
> Maybe I'm missing something, but I don't see any PCI-related output with 
> lstopo.

You are probably missing the libpci-devel package.

Samuel


Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Stefan Eilemann
Hi Brice,

On 29. Nov 2011, at 9:45, Brice Goglin wrote:

> hwloc 1.3 already has support for PCI device detection. These new
> objects contain a "class" field that can help you know if it's a NIC/GPU/...
> 
> Just run lstopo
> on your machine to see what I am talking about.

Maybe I'm missing something, but I don't see any PCI-related output with lstopo.

I just compiled 1.3 from scratch, and run lstopo as user and hwloc-info as root:

$ sudo ./local/bin/hwloc-info -v
[sudo] password for eilemann: 
Machine (24GB)
  NUMANode L#0 (P#0 12GB) + Socket L#0 + L3 L#0 (12MB)
L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#1)
L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#3)
L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#4)
L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
  NUMANode L#1 (P#1 12GB) + Socket L#1 + L3 L#1 (12MB)
L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#6)
L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)
L2 L#8 (256KB) + L1 L#8 (32KB) + Core L#8 + PU L#8 (P#8)
L2 L#9 (256KB) + L1 L#9 (32KB) + Core L#9 + PU L#9 (P#9)
L2 L#10 (256KB) + L1 L#10 (32KB) + Core L#10 + PU L#10 (P#10)
L2 L#11 (256KB) + L1 L#11 (32KB) + Core L#11 + PU L#11 (P#11)
[eilemann@node01 ~]$ 

The lstopo graphical output contains the same information.


Cheers,

Stefan.
-- 
http://www.eyescale.ch
http://www.equalizergraphics.com
http://www.linkedin.com/in/eilemann






Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Stefan Eilemann
Bonjour Brice,

On 29. Nov 2011, at 9:45, Brice Goglin wrote:

> hwloc 1.3 already has support for PCI device detection. These new
> objects contain a "class" field that can help you know if it's a NIC/GPU/...

Ok, time to upgrade my installation. The cluster has RHEL6.1 which ships with 
an older version.

> How are you using GPUs and NICs in your software? Which libraries or
> ways do you use to access them?

I use them mostly with OpenGL ('XOpenDisplay(":0.")' and RDMA in 
Equalizer/Collage (see links in signature). Is there a straight way to 
associate the GPUs with the corresponding X screen? I guess at least the path 
through the Xorg PCI ID should work, but it would be nice to have that in hwloc.

We also use Cuda/OpenMPI here, but I guess this will be easier to support. I'll 
look into the latest source of lstopo to see how it's done.


BTW, I recently created a library for ZeroConf GPU discovery[1], this might be 
of interest for you.


Cheers,

Stefan.

[1] http://www.equalizergraphics.com/gpu-sd
-- 
http://www.eyescale.ch
http://www.equalizergraphics.com
http://www.linkedin.com/in/eilemann






Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin
Hello Stefan,

hwloc 1.3 already has support for PCI device detection. These new
objects contain a "class" field that can help you know if it's a NIC/GPU/...

However it's hard to know which PCI device is eth0 or eth1, so we also
try to add some OS device inside PCI device. If you're using Linux, you
will see which network device (eth0, ...), IB device (mlx4_0, ...), or
disk (sda, ...) corresponds to each PCI device (if any). Just run lstopo
on your machine to see what I am talking about. Then you should read the
I/O devices section in the doc.

There's also some work to insert CUDA device information inside those
PCI devices.

Additionally, we have some helpers to retrieve locality of some custom
libraries objects (OFED, CUDA, ...). See the interoperability section in
the doc.

How are you using GPUs and NICs in your software? Which libraries or
ways do you use to access them?

hope this helps.
Brice




Le 29/11/2011 09:32, Stefan Eilemann a écrit :
> All,
>
> We have the need to discover which GPUs and NICs are close to which CPUs[1], 
> independent from CUDA. From the overview page there are hints that there is 
> some kind of support planned, but it's unclear to me of how much of this is 
> implemented.
>
> Is there support in hwloc, and in which version, for this? If yes, can you 
> give me a hint/code snippet on how to do this? If no, what does it take to 
> get this support in hwloc?
>
>
> Cheers,
>
> Stefan.
>
> [1] https://github.com/Eyescale/Equalizer/issues/57
>