Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30/11/11 03:24, Guy Streeter wrote:

> There is a request pending to have hwloc updated to 1.3 in
> RHEL6. I do not yet have a schedule for it.

I wouldn't hold your breath, I'm still waiting for a nasty
kernel bug to be fixed in RHEL5 (ethernet packets delivered
on wrong interface of a dual ported 10GigE NIC) reported
about a year ago in 5.5.

They're now arguing about whether or not they should fix
it in RHEL 5.8 or put off for another release yet again
(even though it was already fixed upstream in the Mellanox
drivers when I reported it).

cheers,
Chris
- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7VZLcACgkQO2KABBYQAh93BgCfQ/t3dDRavWS1CgN6chjxhqLm
m+oAnRqw7N9Ck4UW3a5GPcLZypYPe3bW
=1QXj
-END PGP SIGNATURE-


Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Jeff Squyres
On Nov 29, 2011, at 1:04 PM, Brice Goglin wrote:

> "XML output" should be "XML input/output" or "XML support".

Done:

-
Hwloc optional build support status (more details can be found above):

Probe / display PCI devices: yes
Graphical output (Cairo):yes
XML input / output:  full
Memory support:  binding, set policy, migrate pages
-

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin

> Hwloc optional build support status (more details can be found above):
>
> Probe / display PCI devices: yes
> Graphical output (Cairo):yes
> XML output:  full

"XML output" should be "XML input/output" or "XML support".

> Memory support:  binding, set policy, migrate pages

Looks ok otherwise.

Brice



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Jeff Squyres
On Nov 29, 2011, at 12:01 PM, Brice Goglin wrote:

> Yes, always installed. There are some configure checks for verbs, but
> it's only used for enabling verbs-related helper testing.

Ok, how's this for output at the end of configure? 

Linux:

-
Hwloc optional build support status (more details can be found above):

Probe / display PCI devices: yes
Graphical output (Cairo):yes
XML output:  full
Memory support:  binding, set policy, migrate pages
-

OS X:

-
Hwloc optional build support status (more details can be found above):

Probe / display PCI devices: no
Graphical output (Cairo):yes
XML output:  full
Memory support:  none
-

XML support will show "basic" if libxml2 is not found.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin
Le 29/11/2011 17:58, Jeff Squyres a écrit :
> On Nov 29, 2011, at 11:53 AM, Brice Goglin wrote:
>
>>> What about MX, verbs, Cuda, ...?
>> MX and verbs are not used internally, we just have public helpers to
>> interoperate with them (and tests).
> I forget -- are the helpers installed/available even if the MX 
> headers/libraries are not found at configure time?  (ditto for verbs, cuda, 
> etc.)

Yes, always installed. There are some configure checks for verbs, but
it's only used for enabling verbs-related helper testing.

Brice



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Jeff Squyres
On Nov 29, 2011, at 11:53 AM, Brice Goglin wrote:

>> What about MX, verbs, Cuda, ...?
> 
> MX and verbs are not used internally, we just have public helpers to
> interoperate with them (and tests).

I forget -- are the helpers installed/available even if the MX 
headers/libraries are not found at configure time?  (ditto for verbs, cuda, 
etc.)

> Same for cuda in trunk (until Samuel's cuda branch gets merged).
> 
> Brice
> 
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin
Le 29/11/2011 17:50, Jeff Squyres a écrit :
> On Nov 29, 2011, at 10:33 AM, Brice Goglin wrote:
>
>>> - Kerrighard
>>> - PCI device support
>>> - XML support
>> I would put XML, PCI, Cairo and libnuma
> What about MX, verbs, Cuda, ...?

MX and verbs are not used internally, we just have public helpers to
interoperate with them (and tests).

Same for cuda in trunk (until Samuel's cuda branch gets merged).

Brice



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Jeff Squyres
On Nov 29, 2011, at 10:33 AM, Brice Goglin wrote:

>> - Kerrighard
>> - PCI device support
>> - XML support
> 
> I would put XML, PCI, Cairo and libnuma

What about MX, verbs, Cuda, ...?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Guy Streeter
On 11/29/2011 02:57 AM, Stefan Eilemann wrote:
> Bonjour Brice,
> 
> On 29. Nov 2011, at 9:45, Brice Goglin wrote:
> 
>> hwloc 1.3 already has support for PCI device detection. These new
>> objects contain a "class" field that can help you know if it's a NIC/GPU/...
> 
> Ok, time to upgrade my installation. The cluster has RHEL6.1 which ships with 
> an older version.
> 

There is a request pending to have hwloc updated to 1.3 in RHEL6. I do not yet
have a schedule for it.

--Guy



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Guy Streeter
On 11/29/2011 06:25 AM, Stefan Eilemann wrote:
> 
> On 29. Nov 2011, at 11:41, Samuel Thibault wrote:
> 
>> You are probably missing the libpci-devel package.
> 
> Thanks, that either doesn't exist or wasn't installed on Redhat. It works now.
> 
> I think messages of found/not found optional modules could be more prominent 
> at the end of the configure process.
> 
> 
> Cheers,
> 
> Stefan.

The package is pciutils-devel on RHEL

--Guy


Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin
Le 29/11/2011 16:19, Jeff Squyres a écrit :
> On Nov 29, 2011, at 10:16 AM, Stefan Eilemann wrote:
>
>>> FWIW, I've traditionally been against such things for two reasons:
>> Your call, really. The information is there and not too hard to find, but I 
>> missed it on the first run. Most software I know provides this in a very 
>> concise list at the end (Supported: A B C\n Unsupported: D E F).
> Let me throw this back to Brice / Samuel...
>
> If we had such a thing at the bottom of configure, what items should we show? 
>  I can think of the following obvious ones offhand:
>
> - Kerrighard
> - PCI device support
> - XML support
>

I would put XML, PCI, Cairo and libnuma

Brice



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Jeff Squyres
On Nov 29, 2011, at 10:16 AM, Stefan Eilemann wrote:

>> FWIW, I've traditionally been against such things for two reasons:
> 
> Your call, really. The information is there and not too hard to find, but I 
> missed it on the first run. Most software I know provides this in a very 
> concise list at the end (Supported: A B C\n Unsupported: D E F).

Let me throw this back to Brice / Samuel...

If we had such a thing at the bottom of configure, what items should we show?  
I can think of the following obvious ones offhand:

- Kerrighard
- PCI device support
- XML support

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Stefan Eilemann
Hi Jeff,

On 29. Nov 2011, at 15:28, Jeff Squyres wrote:

>> I think messages of found/not found optional modules could be more prominent 
>> at the end of the configure process.
> 
> FWIW, I've traditionally been against such things for two reasons:

Your call, really. The information is there and not too hard to find, but I 
missed it on the first run. Most software I know provides this in a very 
concise list at the end (Supported: A B C\n Unsupported: D E F).


Cheers,

Stefan.
-- 
http://www.eyescale.ch
http://www.equalizergraphics.com
http://www.linkedin.com/in/eilemann






Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Jeff Squyres
On Nov 29, 2011, at 7:25 AM, Stefan Eilemann wrote:

>> You are probably missing the libpci-devel package.
> 
> Thanks, that either doesn't exist or wasn't installed on Redhat. It works now.
> 
> I think messages of found/not found optional modules could be more prominent 
> at the end of the configure process.

FWIW, I've traditionally been against such things for two reasons:

1. The information *was* displayed above (i.e., that pci-devel wasn't 
found/wasn't usable/whatever).  I realize that most people don't read the 
stdout of configure at all, but all the information you need is already there.

2. A list of what will/will not be built at the end tends to grow lengthy such 
that it dilutes the value of repeating the information at the end.

That being said, I can *somewhat* see the value of displaying a user-friendly 
"PCI device support will not be built" vs. the output of a configure test, 
which might be somewhat obscure.  However, in hwloc's case, the configure test 
output is pretty self-evident.  Examples:

checking for PCI... no
checking pci/pci.h usability... no
checking pci/pci.h presence... no
checking for pci/pci.h... no
checking for LIBXML2... yes
checking for xmlNewDoc... yes
checking for final LIBXML2 support... yes

A simple string search for "pci" and "xml" will find these lines in the 
configure output.  Assumedly, if you're building from source, you've likely got 
at least *some* experience and it shouldn't be unreasonable to ask you to go 
look in the output of configure.

Don't get me wrong -- I'm not dead-set against a listing at the bottom.  I just 
find it redundant and somewhat of a maintenance hassle.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Stefan Eilemann

On 29. Nov 2011, at 11:41, Samuel Thibault wrote:

> You are probably missing the libpci-devel package.

Thanks, that either doesn't exist or wasn't installed on Redhat. It works now.

I think messages of found/not found optional modules could be more prominent at 
the end of the configure process.


Cheers,

Stefan.
-- 
http://www.eyescale.ch
http://www.equalizergraphics.com
http://www.linkedin.com/in/eilemann






Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Samuel Thibault
Stefan Eilemann, le Tue 29 Nov 2011 11:40:18 +0100, a écrit :
> Maybe I'm missing something, but I don't see any PCI-related output with 
> lstopo.

You are probably missing the libpci-devel package.

Samuel


Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Stefan Eilemann
Hi Brice,

On 29. Nov 2011, at 9:45, Brice Goglin wrote:

> hwloc 1.3 already has support for PCI device detection. These new
> objects contain a "class" field that can help you know if it's a NIC/GPU/...
> 
> Just run lstopo
> on your machine to see what I am talking about.

Maybe I'm missing something, but I don't see any PCI-related output with lstopo.

I just compiled 1.3 from scratch, and run lstopo as user and hwloc-info as root:

$ sudo ./local/bin/hwloc-info -v
[sudo] password for eilemann: 
Machine (24GB)
  NUMANode L#0 (P#0 12GB) + Socket L#0 + L3 L#0 (12MB)
L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#1)
L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#3)
L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#4)
L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
  NUMANode L#1 (P#1 12GB) + Socket L#1 + L3 L#1 (12MB)
L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#6)
L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)
L2 L#8 (256KB) + L1 L#8 (32KB) + Core L#8 + PU L#8 (P#8)
L2 L#9 (256KB) + L1 L#9 (32KB) + Core L#9 + PU L#9 (P#9)
L2 L#10 (256KB) + L1 L#10 (32KB) + Core L#10 + PU L#10 (P#10)
L2 L#11 (256KB) + L1 L#11 (32KB) + Core L#11 + PU L#11 (P#11)
[eilemann@node01 ~]$ 

The lstopo graphical output contains the same information.


Cheers,

Stefan.
-- 
http://www.eyescale.ch
http://www.equalizergraphics.com
http://www.linkedin.com/in/eilemann






Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin
Le 29/11/2011 09:57, Stefan Eilemann a écrit :
>
> I use them mostly with OpenGL ('XOpenDisplay(":0.")' and RDMA in 
> Equalizer/Collage (see links in signature). Is there a straight way to 
> associate the GPUs with the corresponding X screen? I guess at least the path 
> through the Xorg PCI ID should work, but it would be nice to have that in 
> hwloc.

I need to think about it, it doesn't look very easy to implement.

Brice



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Stefan Eilemann
Bonjour Brice,

On 29. Nov 2011, at 9:45, Brice Goglin wrote:

> hwloc 1.3 already has support for PCI device detection. These new
> objects contain a "class" field that can help you know if it's a NIC/GPU/...

Ok, time to upgrade my installation. The cluster has RHEL6.1 which ships with 
an older version.

> How are you using GPUs and NICs in your software? Which libraries or
> ways do you use to access them?

I use them mostly with OpenGL ('XOpenDisplay(":0.")' and RDMA in 
Equalizer/Collage (see links in signature). Is there a straight way to 
associate the GPUs with the corresponding X screen? I guess at least the path 
through the Xorg PCI ID should work, but it would be nice to have that in hwloc.

We also use Cuda/OpenMPI here, but I guess this will be easier to support. I'll 
look into the latest source of lstopo to see how it's done.


BTW, I recently created a library for ZeroConf GPU discovery[1], this might be 
of interest for you.


Cheers,

Stefan.

[1] http://www.equalizergraphics.com/gpu-sd
-- 
http://www.eyescale.ch
http://www.equalizergraphics.com
http://www.linkedin.com/in/eilemann






Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin
Hello Stefan,

hwloc 1.3 already has support for PCI device detection. These new
objects contain a "class" field that can help you know if it's a NIC/GPU/...

However it's hard to know which PCI device is eth0 or eth1, so we also
try to add some OS device inside PCI device. If you're using Linux, you
will see which network device (eth0, ...), IB device (mlx4_0, ...), or
disk (sda, ...) corresponds to each PCI device (if any). Just run lstopo
on your machine to see what I am talking about. Then you should read the
I/O devices section in the doc.

There's also some work to insert CUDA device information inside those
PCI devices.

Additionally, we have some helpers to retrieve locality of some custom
libraries objects (OFED, CUDA, ...). See the interoperability section in
the doc.

How are you using GPUs and NICs in your software? Which libraries or
ways do you use to access them?

hope this helps.
Brice




Le 29/11/2011 09:32, Stefan Eilemann a écrit :
> All,
>
> We have the need to discover which GPUs and NICs are close to which CPUs[1], 
> independent from CUDA. From the overview page there are hints that there is 
> some kind of support planned, but it's unclear to me of how much of this is 
> implemented.
>
> Is there support in hwloc, and in which version, for this? If yes, can you 
> give me a hint/code snippet on how to do this? If no, what does it take to 
> get this support in hwloc?
>
>
> Cheers,
>
> Stefan.
>
> [1] https://github.com/Eyescale/Equalizer/issues/57
>



[hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Stefan Eilemann
All,

We have the need to discover which GPUs and NICs are close to which CPUs[1], 
independent from CUDA. From the overview page there are hints that there is 
some kind of support planned, but it's unclear to me of how much of this is 
implemented.

Is there support in hwloc, and in which version, for this? If yes, can you give 
me a hint/code snippet on how to do this? If no, what does it take to get this 
support in hwloc?


Cheers,

Stefan.

[1] https://github.com/Eyescale/Equalizer/issues/57

-- 
http://www.eyescale.ch
http://www.equalizergraphics.com
http://www.linkedin.com/in/eilemann