Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1
On 12/14/2011 10:36 PM, Brice Goglin wrote: I committed the silence-warning patch but I will keep the other part for now. I am a bit afraid of changing that much code in 1.3.1 without being sure whether it's necessary. Sounds good to me. I certainly have no grounds to argue that RHL8 support is "vital". -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1
Le 14/12/2011 22:42, Paul H. Hargrove a écrit : > > > On 12/14/2011 1:21 PM, Brice Goglin wrote: >> The attached patch might work. I am not sure all this is actually >> necessary because things have been working fine so far, apart from your >> warnings. > > Yup, the patch silences the warnings. That was simpler than I had > anticipated. > >> By the way, does lstopo show PCI devices on your machine even when you >> have these warnings? > > Yes, the PCI device info looks complete even with the warnings. OK thanks again for all the testing. We'll have to make sure your name appears in the 1.3.1 changelog :) I committed the silence-warning patch but I will keep the other part for now. I am a bit afraid of changing that much code in 1.3.1 without being sure whether it's necessary. Brice
Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1
On 12/14/2011 1:21 PM, Brice Goglin wrote: The attached patch might work. I am not sure all this is actually necessary because things have been working fine so far, apart from your warnings. Yup, the patch silences the warnings. That was simpler than I had anticipated. By the way, does lstopo show PCI devices on your machine even when you have these warnings? Yes, the PCI device info looks complete even with the warnings. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1
Le 14/12/2011 21:56, Paul H. Hargrove a écrit : > Now that I think of it, this situation seems to imply that running the > code in topology-libpci.c as root on a system w/ a Intel PIIX4 > controller could lock-up ones machine. Thoughts? I can't know for sure. I would be surprised if sudo lspci -xxx could lockup an entire machine. If it was that dangerous, I hope that the kernel guys would have added a quirk for this device to prevent people from killing the machine. What do they mean with "random location" in the comment ? Something between 64 and 256 ? Or something really random after 256 ? FWIW, I increased the cached config space from 64 bytes to 256 when adding support for getting the pcie link speed, which is indeed only available to root (extended pcie capability). The attached patch might work. I am not sure all this is actually necessary because things have been working fine so far, apart from your warnings. By the way, does lstopo show PCI devices on your machine even when you have these warnings? Brice diff --git a/src/topology-libpci.c b/src/topology-libpci.c index 4564319..01f8e03 100644 --- a/src/topology-libpci.c +++ b/src/topology-libpci.c @@ -560,6 +560,11 @@ hwloc_pci_error(char *msg, ...) longjmp(err_buf, 1); } +static void +hwloc_pci_warning(char *msg __hwloc_attribute_unused, ...) +{ +} + void hwloc_look_libpci(struct hwloc_topology *topology) { @@ -576,6 +581,7 @@ hwloc_look_libpci(struct hwloc_topology *topology) pciaccess = pci_alloc(); pciaccess->error = hwloc_pci_error; + pciaccess->warning = hwloc_pci_warning; if (setjmp(err_buf)) { pci_cleanup(pciaccess);
Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1
On 12/14/2011 12:03 PM, Brice Goglin wrote: Le 14/12/2011 20:45, Paul H. Hargrove a écrit : I can run "lspci -vv" to get plenty of correct information and no such messages. What about lspci -vvxxx ? These options makes the pci lib read 256 instead of 64 bytes of config space ? Yup, that reproduces the problem: $ lspci -vvxxx pcilib: proc_read: tried to read 256 bytes at 0, but got only 64 lspci: Unable to read 256 bytes of configuration space. However, my lspci manpage says only root can use -xxx: -xxx Show hexadecimal dump of whole PCI configuration space. Avail- able only for root as several PCI devices crash when you try to read undefined portions of the config space (this behaviour probably doesn't violate the PCI standard, but it's at least very stupid). So, I tried "sudo lspci -vvxxx" and get lots of output and no error. Following up on that, I find "sudo hwloc-calc" runs w/o the errors. Here is a portion of proc_bus_pci_read() from my /usr/src/linux-2.4.21-60.EL/drivers/pci/proc.c: /* * Normal users can read only the standardized portion of the * configuration space as several chips lock up when trying to read * undefined locations (think of Intel PIIX4 as a typical example). */ if (capable(CAP_SYS_ADMIN)) size = PCI_CFG_SPACE_SIZE; else if (dev->hdr_type == PCI_HEADER_TYPE_CARDBUS) size = 128; else size = 64; I find that this exact code is still present in linux-2.6.39. So, the kernel behavior has not changed in this respect and I suspect therefore that more recent libpci is simply masking the short read. I would suggest logic in topology-libpci that tries first to read 256 bytes and they tries again to read 128 and then 64. With a smart pciaccess->error handler one would NOT produce this error message until 256, 128 and 64-byte reads all fail. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] pcilib error messages w/ rhl8 and hwloc-1.3.1rc1
Le 14/12/2011 20:45, Paul H. Hargrove a écrit : > I can run "lspci -vv" to get plenty of correct information and no such > messages. What about lspci -vvxxx ? These options makes the pci lib read 256 instead of 64 bytes of config space ? I may need to check something in the first 64bytes before I can safely try to read the next 192 ones. Brice