Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1

2011-12-15 Thread Paul H. Hargrove



On 12/14/2011 10:36 PM, Brice Goglin wrote:

I committed the silence-warning patch but I will keep the other part for
now. I am a bit afraid of changing that much code in 1.3.1 without being
sure whether it's necessary.


Sounds good to me.
I certainly have no grounds to argue that RHL8 support is "vital".

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1

2011-12-15 Thread Brice Goglin
Le 14/12/2011 22:42, Paul H. Hargrove a écrit :
>
>
> On 12/14/2011 1:21 PM, Brice Goglin wrote:
>> The attached patch might work. I am not sure all this is actually
>> necessary because things have been working fine so far, apart from your
>> warnings.
>
> Yup, the patch silences the warnings.  That was simpler than I had
> anticipated.
>
>> By the way, does lstopo show PCI devices on your machine even when you
>> have these warnings?
>
> Yes, the PCI device info looks complete even with the warnings.

OK thanks again for all the testing. We'll have to make sure your name
appears in the 1.3.1 changelog :)
I committed the silence-warning patch but I will keep the other part for
now. I am a bit afraid of changing that much code in 1.3.1 without being
sure whether it's necessary.

Brice



Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1

2011-12-14 Thread Paul H. Hargrove



On 12/14/2011 1:21 PM, Brice Goglin wrote:

The attached patch might work. I am not sure all this is actually
necessary because things have been working fine so far, apart from your
warnings.


Yup, the patch silences the warnings.  That was simpler than I had 
anticipated.



By the way, does lstopo show PCI devices on your machine even when you
have these warnings?


Yes, the PCI device info looks complete even with the warnings.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1

2011-12-14 Thread Brice Goglin
Le 14/12/2011 21:56, Paul H. Hargrove a écrit :
> Now that I think of it, this situation seems to imply that running the
> code in topology-libpci.c as root on a system w/ a Intel PIIX4
> controller could lock-up ones machine.  Thoughts?

I can't know for sure. I would be surprised if sudo lspci -xxx could
lockup an entire machine. If it was that dangerous, I hope that the
kernel guys would have added a quirk for this device to prevent people
from killing the machine. What do they mean with "random location" in
the comment ? Something between 64 and 256 ? Or something really random
after 256 ?

FWIW, I increased the cached config space from 64 bytes to 256 when
adding support for getting the pcie link speed, which is indeed only
available to root (extended pcie capability).

The attached patch might work. I am not sure all this is actually
necessary because things have been working fine so far, apart from your
warnings.

By the way, does lstopo show PCI devices on your machine even when you
have these warnings?

Brice

diff --git a/src/topology-libpci.c b/src/topology-libpci.c
index 4564319..01f8e03 100644
--- a/src/topology-libpci.c
+++ b/src/topology-libpci.c
@@ -560,6 +560,11 @@ hwloc_pci_error(char *msg, ...)
   longjmp(err_buf, 1);
 }

+static void
+hwloc_pci_warning(char *msg __hwloc_attribute_unused, ...)
+{
+}
+
 void
 hwloc_look_libpci(struct hwloc_topology *topology)
 {
@@ -576,6 +581,7 @@ hwloc_look_libpci(struct hwloc_topology *topology)

   pciaccess = pci_alloc();
   pciaccess->error = hwloc_pci_error;
+  pciaccess->warning = hwloc_pci_warning;

   if (setjmp(err_buf)) {
 pci_cleanup(pciaccess);


Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1

2011-12-14 Thread Paul H. Hargrove



On 12/14/2011 12:03 PM, Brice Goglin wrote:

Le 14/12/2011 20:45, Paul H. Hargrove a écrit :

I can run "lspci -vv" to get plenty of correct information and no such
messages.

What about lspci -vvxxx ? These options makes the pci lib read 256
instead of 64 bytes of config space ?


Yup, that reproduces the problem:
$ lspci -vvxxx
pcilib: proc_read: tried to read 256 bytes at 0, but got only 64
lspci: Unable to read 256 bytes of configuration space.

However, my lspci manpage says only root can use -xxx:
   -xxx   Show  hexadecimal  dump of whole PCI configuration space. 
Avail-
  able only for root as several PCI devices crash when you 
try  to
  read  undefined  portions  of  the  config space (this 
behaviour
  probably doesn't violate the PCI standard,  but  it's  
at  least

  very stupid).

So, I tried "sudo lspci -vvxxx" and get lots of output and no error.

Following up on that, I find "sudo hwloc-calc" runs w/o the errors.

Here is a portion of proc_bus_pci_read() from my 
/usr/src/linux-2.4.21-60.EL/drivers/pci/proc.c:

/*
 * Normal users can read only the standardized portion of the
 * configuration space as several chips lock up when trying to read
 * undefined locations (think of Intel PIIX4 as a typical example).
 */
if (capable(CAP_SYS_ADMIN))
size = PCI_CFG_SPACE_SIZE;
else if (dev->hdr_type == PCI_HEADER_TYPE_CARDBUS)
size = 128;
else
size = 64;

I find that this exact code is still present in linux-2.6.39.  So, the 
kernel behavior has not changed in this respect and I suspect therefore 
that more recent libpci is simply masking the short read.


I would suggest logic in topology-libpci that tries first to read 256 
bytes and they tries again to read 128 and then 64.
With a smart pciaccess->error handler one would NOT produce this error 
message until 256, 128 and 64-byte reads all fail.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1

2011-12-14 Thread Brice Goglin
Le 14/12/2011 20:45, Paul H. Hargrove a écrit :
> I can run "lspci -vv" to get plenty of correct information and no such
> messages.

What about lspci -vvxxx ? These options makes the pci lib read 256
instead of 64 bytes of config space ?

I may need to check something in the first 64bytes before I can safely
try to read the next 192 ones.

Brice