RE: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
From: Benjamin Herrenschmidt > Sent: 15 July 2020 23:49 > On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote: > > > I've 'played' with PCIe error handling - without much success. > > > What might be useful is for a driver that has just read ~0u to > > > be able to ask 'has there been an error signalled for this device?'. > > > > In many cases a driver will know that ~0 is not a valid value for the > > register it's reading. But if ~0 *could* be valid, an interface like > > you suggest could be useful. I don't think we have anything like that > > today, but maybe we could. It would certainly be nice if the PCI core > > noticed, logged, and cleared errors. We have some of that for AER, > > but that's an optional feature, and support for the error bits in the > > garden-variety PCI_STATUS register is pretty haphazard. As you note > > below, this sort of SERR/PERR reporting is frequently hard-wired in > > ways that takes it out of our purview. > > We do have pci_channel_state (via pci_channel_offline()) which covers > the cases where the underlying error handling (such as EEH or unplug) > results in the device being offlined though this tend to be > asynchronous so it might take a few ~0's before you get it. On one of my systems I don't think the error TLP from the target made its way past the first bridge - I could see the error in it's status registers. But I couldn't find any of the AER status registers in the root bridge. So I think you'd need a software poll of the bridge registers to find out (and clear) the error. The NMI on the dell system (which is supposed to meet some special NEBS? server requirements) is just stupid. Too late to be synchronous and impossible for the OS to handle. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote: > > I've 'played' with PCIe error handling - without much success. > > What might be useful is for a driver that has just read ~0u to > > be able to ask 'has there been an error signalled for this device?'. > > In many cases a driver will know that ~0 is not a valid value for the > register it's reading. But if ~0 *could* be valid, an interface like > you suggest could be useful. I don't think we have anything like that > today, but maybe we could. It would certainly be nice if the PCI core > noticed, logged, and cleared errors. We have some of that for AER, > but that's an optional feature, and support for the error bits in the > garden-variety PCI_STATUS register is pretty haphazard. As you note > below, this sort of SERR/PERR reporting is frequently hard-wired in > ways that takes it out of our purview. We do have pci_channel_state (via pci_channel_offline()) which covers the cases where the underlying error handling (such as EEH or unplug) results in the device being offlined though this tend to be asynchronous so it might take a few ~0's before you get it. It's typically used to break potentially infinite loops in some drivers. There is no interface to check whether *an* error happened though for the most cases it will be captured in the status register, which is harvested (and cleared ?) by some EDAC drivers iirc... All this lacks coordination, I agree. Cheers, Ben.
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
On Wed, 2020-07-15 at 08:47 +0200, Arnd Bergmann wrote: > drivers/misc/cardreader/rts5261.c: retval = > rtsx_pci_write_config_dword(pcr, PCR_SETTING_REG2, lval); > > That last one is interesting because I think this is a case in which > we > actually want to check for errors, as the driver seems to use it > to ensure that accessing extended config space at offset 0x814 > works before relying on the value. Unfortunately the implementation > seems buggy as it a) keeps using the possibly uninitialized value > after > printing a warning and b) returns the PCIBIOS_* value in place of a > negative errno and then ignores it in the caller. In cases like this, usually checking against ~0 is sufficient Cheers, Ben.
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
On Wed, Jul 15, 2020 at 02:38:29PM +, David Laight wrote: > From: Oliver O'Halloran > > Sent: 15 July 2020 05:19 > > > > On Wed, Jul 15, 2020 at 8:03 AM Arnd Bergmann wrote: > ... > > > - config space accesses are very rare compared to memory > > > space access and on the hardware side the error handling > > > would be similar, but readl/writel don't return errors, they just > > > access wrong registers or return 0x. > > > arch/powerpc/kernel/eeh.c has a ton extra code written to > > > deal with it, but no other architectures do. > > > > TBH the EEH MMIO hooks were probably a mistake to begin with. Errors > > detected via MMIO are almost always asynchronous to the error itself > > so you usually just wind up with a misleading stack trace rather than > > any kind of useful synchronous error reporting. It seems like most > > drivers don't bother checking for 0xFFs either and rely on the > > asynchronous reporting via .error_detected() instead, so I have to > > wonder what the point is. I've been thinking of removing the MMIO > > hooks and using a background poller to check for errors on each PHB > > periodically (assuming we don't have an EEH interrupt) instead. That > > would remove the requirement for eeh_dev_check_failure() to be > > interrupt safe too, so it might even let us fix all the godawful races > > in EEH. > > I've 'played' with PCIe error handling - without much success. > What might be useful is for a driver that has just read ~0u to > be able to ask 'has there been an error signalled for this device?'. In many cases a driver will know that ~0 is not a valid value for the register it's reading. But if ~0 *could* be valid, an interface like you suggest could be useful. I don't think we have anything like that today, but maybe we could. It would certainly be nice if the PCI core noticed, logged, and cleared errors. We have some of that for AER, but that's an optional feature, and support for the error bits in the garden-variety PCI_STATUS register is pretty haphazard. As you note below, this sort of SERR/PERR reporting is frequently hard-wired in ways that takes it out of our purview. Bjorn
RE: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
From: Oliver O'Halloran > Sent: 15 July 2020 05:19 > > On Wed, Jul 15, 2020 at 8:03 AM Arnd Bergmann wrote: ... > > - config space accesses are very rare compared to memory > > space access and on the hardware side the error handling > > would be similar, but readl/writel don't return errors, they just > > access wrong registers or return 0x. > > arch/powerpc/kernel/eeh.c has a ton extra code written to > > deal with it, but no other architectures do. > > TBH the EEH MMIO hooks were probably a mistake to begin with. Errors > detected via MMIO are almost always asynchronous to the error itself > so you usually just wind up with a misleading stack trace rather than > any kind of useful synchronous error reporting. It seems like most > drivers don't bother checking for 0xFFs either and rely on the > asynchronous reporting via .error_detected() instead, so I have to > wonder what the point is. I've been thinking of removing the MMIO > hooks and using a background poller to check for errors on each PHB > periodically (assuming we don't have an EEH interrupt) instead. That > would remove the requirement for eeh_dev_check_failure() to be > interrupt safe too, so it might even let us fix all the godawful races > in EEH. I've 'played' with PCIe error handling - without much success. What might be useful is for a driver that has just read ~0u to be able to ask 'has there been an error signalled for this device?'. I got an error generated by doing an MMIO access that was inside the address range forwarded to the slave, but outside any of its BARs. (Two BARs of different sizes leaves a nice gap.) This got reported up to the bridge nearest the slave (which supported error handling), but not to the root bridge (which I don't think does). ISTR a message about EEH being handled by the hardware (the machine is up but dmesg is full of messages from a bouncing USB mouse). With such partial error reporting useful info can still be extracted. Of course, what actually happens on a PCIe error is that the signal gets routed to some 'board support logic' and then passed back into the kernel as an NMI - which then crashes the kernel! This even happens when the PCIe link goes down after we've done a soft-remove of the device itself! Rather makes updating the board's FPGA without a reboot tricky. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
On Wed, Jul 15, 2020 at 1:46 AM Bjorn Helgaas wrote: So something like: > > void pci_read_config_word(struct pci_dev *dev, int where, u16 *val) > > and where we used to return anything non-zero, we just set *val = ~0 > instead? I think we do that already in most, maybe all, cases. Right, this is what I had in mind. If we start by removing the handling of the return code in all files that clearly don't need it, looking at whatever remains will give a much better idea of what a good interface should be. > git grep -E "(if|return|=).*\ finds about 400 matches. Right, and this is some 112 files to look at. I had a slightly different regex, which found more false-positives, but also these: arch/x86/kernel/amd_nb.c: : pci_read_config_dword(root, 0x64, value)); drivers/i2c/busses/i2c-sis630.c: pci_write_config_byte(sis630_dev, SIS630_BIOS_CTL_REG, b | 0x80)) { drivers/i2c/busses/i2c-viapro.c: !pci_read_config_word(pdev, SMBBA2, _smba) && drivers/ide/rz1000.c: !pci_write_config_word(dev, 0x40, reg & 0xdfff)) { drivers/net/ethernet/realtek/r8169_main.c: pci_write_config_byte(pdev, 0x070f, val) == PCIBIOS_SUCCESSFUL) include/linux/rtsx_pci.h:#define rtsx_pci_read_config_dword(pcr, where, val) pci_read_config_dword((pcr)->pci, where, val) include/linux/rtsx_pci.h:#define rtsx_pci_write_config_dword(pcr, where, val) pci_write_config_dword((pcr)->pci, where, val) drivers/misc/cardreader/rts5261.c: retval = rtsx_pci_read_config_dword(pcr, drivers/misc/cardreader/rts5261.c: retval = rtsx_pci_write_config_dword(pcr, PCR_SETTING_REG2, lval); That last one is interesting because I think this is a case in which we actually want to check for errors, as the driver seems to use it to ensure that accessing extended config space at offset 0x814 works before relying on the value. Unfortunately the implementation seems buggy as it a) keeps using the possibly uninitialized value after printing a warning and b) returns the PCIBIOS_* value in place of a negative errno and then ignores it in the caller. Arnd
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
On Wed, Jul 15, 2020 at 8:03 AM Arnd Bergmann wrote: > > - Most error checking is static: PCIBIOS_BAD_REGISTER_NUMBER > only happens if you pass an invalid register number, but most > callers pass a compile-time constant register number that is > known to be correct, or the driver would never work. Similarly, > PCIBIOS_DEVICE_NOT_FOUND wouldn't normally happen > since you pass a valid pci_device pointer that was already > probed. Having some feedback about obvious programming errors is still useful when doing driver development. Reporting those via printk() would probably be more useful to those who care though. > - config space accesses are very rare compared to memory > space access and on the hardware side the error handling > would be similar, but readl/writel don't return errors, they just > access wrong registers or return 0x. > arch/powerpc/kernel/eeh.c has a ton extra code written to > deal with it, but no other architectures do. TBH the EEH MMIO hooks were probably a mistake to begin with. Errors detected via MMIO are almost always asynchronous to the error itself so you usually just wind up with a misleading stack trace rather than any kind of useful synchronous error reporting. It seems like most drivers don't bother checking for 0xFFs either and rely on the asynchronous reporting via .error_detected() instead, so I have to wonder what the point is. I've been thinking of removing the MMIO hooks and using a background poller to check for errors on each PHB periodically (assuming we don't have an EEH interrupt) instead. That would remove the requirement for eeh_dev_check_failure() to be interrupt safe too, so it might even let us fix all the godawful races in EEH. > - If we add code to detect errors in pci_read_config_* > and do some of the stuff from powerpc's > eeh_dev_check_failure(), we are more likely to catch > intermittent failures when drivers don't check, or bugs > with invalid arguments in device drivers than relying on > drivers to get their error handling right when those code > paths don't ever get covered in normal testing. Adding some kind of error detection to the generic config accessors wouldn't hurt, but detection is only half the problem. The main job of eeh_dev_check_failure() is waking up the EEH recovery thread which actually handles notifying drivers, device resets, etc and you'd want something in the PCI core. Right now there's two implementations of that reporting logic: one for EEH in arch/powerpc/eeh_driver.c and one for AER/DPC in drivers/pci/pcie/err.c. I think the latter could be moved into the PCI core easily enough since there's not much about it that's really specific to PCIe. Ideally we could drop the EEH specific one too, but I'm not sure how to implement that without it devolving into callback spaghetti. Oliver
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
On Tue, 2020-07-14 at 18:46 -0500, Bjorn Helgaas wrote: > Yes. I have no problem with that. There are a few cases where it's > important to check for errors, e.g., we read a status register and do > something based on a bit being set. A failure will return all bits > set, and we may do the wrong thing. But most of the errors we care > about will be on MMIO reads, not config reads, so we can probably > ignore most config read errors. And in both cases, we don't have the plumbing to provide accurate and reliable error returns for all platforms anyways (esp. not for MMIO). I think it makes sense to stick to the good old "if all 1's, then go out of line" including for config space. ../.. > Yep, except for things like device removal or other PCI errors. A whole bunch of these are reported asynchronously, esp for writes (and yes, including config writes, they are supposed to be non-posted but more often than not, the path from the CPU to the PCI bridge remains posted for writes including config ones). > So maybe a good place to start is by removing some of the useless > error checking for pci_read_config_*() and pci_write_config_*(). > That's a decent-sized but not impractical project that could be done > per subsystem or something: > > git grep -E "(if|return|=).*\ > finds about 400 matches. > > Some of those callers probably really *do* want to check for errors, > and I guess we'd have to identify them and do them separately as you > mentioned. I'd be curious about these considering how unreliable our error return is accross the board. Cheers, Ben.
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
On Tue, 2020-07-14 at 23:02 +0200, Kjetil Oftedal wrote: > > > > > For b), it might be nice to also change other aspects of the > > > interface, e.g. passing a pci_host_bridge pointer plus bus number > > > instead of a pci_bus pointer, or having the callback in the > > > pci_host_bridge structure. > > > > I like this idea a lot, too. I think the fact that > > pci_bus_read_config_word() requires a pci_bus * complicates things in > > a few places. > > > > I think it's completely separate, as you say, and we should defer it > > for now because even part a) is a lot of work. I added it to my list > > of possible future projects. > > > > What about strange PCI devices such as Non-Transparent bridges? > They will require their own PCI Config space accessors that is not > connected to a host bridge if one wants to do some sort of > punch-through enumeration. > I guess the kernel doesn't care much about them? Well, today they would require a pci_bus anyway.. . so if you want to do that sort of funny trick you may as well create a "virtual" host bridge. Cheers, Ben.
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
On Tue, 2020-07-14 at 13:45 -0500, Bjorn Helgaas wrote: > > > fail for valid arguments on a valid pci_device* ? > > I really like this idea. > > pci_write_config_*() has one return value, and only 100ish of 2500 > callers check for errors. It's sometimes possible for config > accessors to detect PCI errors and return failure, e.g., device was > removed or didn't respond, but most of them don't, and detecting > these > errors is not really that valuable. > > pci_read_config_*() is much more interesting because it returns two > things, the function return value and the value read from the PCI > device, and it's complicated to check both. .../... I agree. It's a mess at the moment. We have separate mechanism to convey PCI errors (among other things the channel state) which should apply to config space when detection is possible. I think returning all 1's is the right thing to do here and avoids odd duplicate error detection logic which I bet you is never properly tested. > > For b), it might be nice to also change other aspects of the > > interface, e.g. passing a pci_host_bridge pointer plus bus number > > instead of a pci_bus pointer, or having the callback in the > > pci_host_bridge structure. > > I like this idea a lot, too. I think the fact that > pci_bus_read_config_word() requires a pci_bus * complicates things in > a few places. > > I think it's completely separate, as you say, and we should defer it > for now because even part a) is a lot of work. I added it to my list > of possible future projects. Agreed on both points. Cheers, Ben.
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
[+cc Kjetil] On Wed, Jul 15, 2020 at 12:01:56AM +0200, Arnd Bergmann wrote: > On Tue, Jul 14, 2020 at 8:45 PM Bjorn Helgaas wrote: > > On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote: > > > On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa > > > Starting with a), my first question is whether any high-level > > > drivers even need to care about errors from these functions. I see > > > 4913 callers that ignore the return code, and 576 that actually > > > check it, and almost none care about the specific error (as you > > > found as well). Unless we conclude that most PCI drivers are wrong, > > > could we just change the return type to 'void' and assume they never > > > fail for valid arguments on a valid pci_device* ? > > > > I really like this idea. > > > > pci_write_config_*() has one return value, and only 100ish of 2500 > > callers check for errors. It's sometimes possible for config > > accessors to detect PCI errors and return failure, e.g., device was > > removed or didn't respond, but most of them don't, and detecting these > > errors is not really that valuable. > > > > pci_read_config_*() is much more interesting because it returns two > > things, the function return value and the value read from the PCI > > device, and it's complicated to check both. > > > > Again it's sometimes possible for config read accessors to detect PCI > > errors, but in most cases a PCI error means the accessor returns > > success and the value from PCI is ~0. > > > > Checking the function return value catches programming errors (bad > > alignment, etc) but misses most of the interesting errors (device was > > unplugged or reported a PCI error). > > My thinking was more that most of the time the error checking may > be completely bogus to start with, and I would just not check for > errors at all. Yes. I have no problem with that. There are a few cases where it's important to check for errors, e.g., we read a status register and do something based on a bit being set. A failure will return all bits set, and we may do the wrong thing. But most of the errors we care about will be on MMIO reads, not config reads, so we can probably ignore most config read errors. > > Checking the value returned from PCI is tricky because ~0 is a valid > > value for some config registers, and only the driver knows for sure. > > If the driver knows that ~0 is a possible value, it would have to do > > something else, e.g., another config read of a register that *cannot* > > be ~0, to see whether it's really an error. > > > > I suspect that if we had a single value to look at it would be easier > > to get right. Error checking with current interface would look like > > this: > > > > err = pci_read_config_word(dev, addr, ); > > if (err) > > return -EINVAL; > > > > if (PCI_POSSIBLE_ERROR(val)) { > > /* if driver knows ~0 is invalid */ > > return -EINVAL; > > > > /* if ~0 is potentially a valid value */ > > err = pci_read_config_word(dev, PCI_VENDOR_ID, ); > > if (err) > > return -EINVAL; > > > > if (PCI_POSSIBLE_ERROR(val2)) > > return -EINVAL; > > } > > > > Error checking with a possible interface that returned only a single > > value could look like this: > > > > val = pci_config_read_word(dev, addr); > > if (PCI_POSSIBLE_ERROR(val)) { > > /* if driver knows ~0 is invalid */ > > return -EINVAL; > > > > /* if ~0 is potentially a valid value */ > > val2 = pci_config_read_word(dev, PCI_VENDOR_ID); > > if (PCI_POSSIBLE_ERROR(val2)) > > return -EINVAL; > > } > > > > Am I understanding you correctly? > > That would require changing all callers of the function, which > I think would involve changing some 700 files. Yeah, that would be a disaster. So something like: void pci_read_config_word(struct pci_dev *dev, int where, u16 *val) and where we used to return anything non-zero, we just set *val = ~0 instead? I think we do that already in most, maybe all, cases. > What I was suggesting was to only change the return type to void and > categorize all drivers that today check it as either > > a) checking the return code is not helpful, or possibly even > wrong, so we just stop doing it. I expect those to be the > vast majority of callers, but that could be wrong. > > b) Code that legitimately check the error code and need to >take an appropriate action. These could be changed to >calling a different interface such as 'pci_bus_read_config_word' >or a new 'pci_device_last_error()' function. Yep, makes sense. > The reasons I suspect that most callers don't actually need > to check for errors are: > > - Most error checking is static: PCIBIOS_BAD_REGISTER_NUMBER > only happens if you pass an invalid register number, but most > callers pass a compile-time constant register number that is > known to be correct, or the driver would never work. Similarly, > PCIBIOS_DEVICE_NOT_FOUND wouldn't normally happen > since
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
On Tue, Jul 14, 2020 at 12:45 PM Bjorn Helgaas wrote: > > [trimmed the cc list; it's still too large but maybe arch folks care] > > On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote: > > On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa > > wrote: > > > This goal of these series is to move the definition of *all* > > > PCIBIOS* from include/linux/pci.h to arch/x86 and limit their use > > > within there. All other tree specific definition will be left for > > > intact. Maybe they can be renamed. > > > > > > PCIBIOS* is an x86 concept as defined by the PCI spec. The > > > returned error codes of PCIBIOS* are positive values and this > > > introduces some complexities which other archs need not incur. > > > > I think the intention is good, but I find the series in its current > > form very hard to review, in particular the way you touch some > > functions three times with trivial changes. Instead of > > > > 1) replace PCIBIOS_SUCCESSFUL with 0 > > 2) drop pointless 0-comparison > > 3) reformat whitespace > > > > I would suggest to combine the first two steps into one patch per > > subsystem and drop the third step. > > I agree. BUT please don't just run out and post new patches to do > this. Let's talk about Arnd's further ideas below first. > > > ... > > Maybe the work can be split up differently, with a similar end > > result but fewer and easier reviewed patches. The way I'd look at > > the problem, there are three main areas that can be dealt with one > > at a time: > > > > a) callers of the high-level config space accessors > >pci_{write,read}_config_{byte,word,dword}, mostly in device > >drivers. > > b) low-level implementation of the config space accessors > > through struct pci_ops > > c) all other occurrences of these constants > > > > Starting with a), my first question is whether any high-level > > drivers even need to care about errors from these functions. I see > > 4913 callers that ignore the return code, and 576 that actually > > check it, and almost none care about the specific error (as you > > found as well). Unless we conclude that most PCI drivers are wrong, > > could we just change the return type to 'void' and assume they never > > fail for valid arguments on a valid pci_device* ? > > I really like this idea. > > pci_write_config_*() has one return value, and only 100ish of 2500 > callers check for errors. It's sometimes possible for config > accessors to detect PCI errors and return failure, e.g., device was > removed or didn't respond, but most of them don't, and detecting these > errors is not really that valuable. > > pci_read_config_*() is much more interesting because it returns two > things, the function return value and the value read from the PCI > device, and it's complicated to check both. > > Again it's sometimes possible for config read accessors to detect PCI > errors, but in most cases a PCI error means the accessor returns > success and the value from PCI is ~0. > > Checking the function return value catches programming errors (bad > alignment, etc) but misses most of the interesting errors (device was > unplugged or reported a PCI error). > > Checking the value returned from PCI is tricky because ~0 is a valid > value for some config registers, and only the driver knows for sure. > If the driver knows that ~0 is a possible value, it would have to do > something else, e.g., another config read of a register that *cannot* > be ~0, to see whether it's really an error. > > I suspect that if we had a single value to look at it would be easier > to get right. Error checking with current interface would look like > this: > > err = pci_read_config_word(dev, addr, ); > if (err) > return -EINVAL; > > if (PCI_POSSIBLE_ERROR(val)) { > /* if driver knows ~0 is invalid */ > return -EINVAL; > > /* if ~0 is potentially a valid value */ > err = pci_read_config_word(dev, PCI_VENDOR_ID, ); > if (err) > return -EINVAL; > > if (PCI_POSSIBLE_ERROR(val2)) > return -EINVAL; > } > > Error checking with a possible interface that returned only a single > value could look like this: > > val = pci_config_read_word(dev, addr); > if (PCI_POSSIBLE_ERROR(val)) { > /* if driver knows ~0 is invalid */ > return -EINVAL; > > /* if ~0 is potentially a valid value */ > val2 = pci_config_read_word(dev, PCI_VENDOR_ID); > if (PCI_POSSIBLE_ERROR(val2)) > return -EINVAL; > } > > Am I understanding you correctly? > > > For b), it might be nice to also change other aspects of the > > interface, e.g. passing a pci_host_bridge pointer plus bus number > > instead of a pci_bus pointer, or having the callback in the > > pci_host_bridge structure. > > I like this idea a lot, too. I think the fact that > pci_bus_read_config_word() requires a pci_bus * complicates things in > a few places. I've been looking at the various host implementations of config accessors as well as probe functions. Needing the pci_bus
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
On Tue, Jul 14, 2020 at 8:45 PM Bjorn Helgaas wrote: > On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote: > > On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa > > Starting with a), my first question is whether any high-level > > drivers even need to care about errors from these functions. I see > > 4913 callers that ignore the return code, and 576 that actually > > check it, and almost none care about the specific error (as you > > found as well). Unless we conclude that most PCI drivers are wrong, > > could we just change the return type to 'void' and assume they never > > fail for valid arguments on a valid pci_device* ? > > I really like this idea. > > pci_write_config_*() has one return value, and only 100ish of 2500 > callers check for errors. It's sometimes possible for config > accessors to detect PCI errors and return failure, e.g., device was > removed or didn't respond, but most of them don't, and detecting these > errors is not really that valuable. > > pci_read_config_*() is much more interesting because it returns two > things, the function return value and the value read from the PCI > device, and it's complicated to check both. > > Again it's sometimes possible for config read accessors to detect PCI > errors, but in most cases a PCI error means the accessor returns > success and the value from PCI is ~0. > > Checking the function return value catches programming errors (bad > alignment, etc) but misses most of the interesting errors (device was > unplugged or reported a PCI error). My thinking was more that most of the time the error checking may be completely bogus to start with, and I would just not check for errors at all. > Checking the value returned from PCI is tricky because ~0 is a valid > value for some config registers, and only the driver knows for sure. > If the driver knows that ~0 is a possible value, it would have to do > something else, e.g., another config read of a register that *cannot* > be ~0, to see whether it's really an error. > > I suspect that if we had a single value to look at it would be easier > to get right. Error checking with current interface would look like > this: > > err = pci_read_config_word(dev, addr, ); > if (err) > return -EINVAL; > > if (PCI_POSSIBLE_ERROR(val)) { > /* if driver knows ~0 is invalid */ > return -EINVAL; > > /* if ~0 is potentially a valid value */ > err = pci_read_config_word(dev, PCI_VENDOR_ID, ); > if (err) > return -EINVAL; > > if (PCI_POSSIBLE_ERROR(val2)) > return -EINVAL; > } > > Error checking with a possible interface that returned only a single > value could look like this: > > val = pci_config_read_word(dev, addr); > if (PCI_POSSIBLE_ERROR(val)) { > /* if driver knows ~0 is invalid */ > return -EINVAL; > > /* if ~0 is potentially a valid value */ > val2 = pci_config_read_word(dev, PCI_VENDOR_ID); > if (PCI_POSSIBLE_ERROR(val2)) > return -EINVAL; > } > > Am I understanding you correctly? That would require changing all callers of the function, which I think would involve changing some 700 files. What I was suggesting was to only change the return type to void and categorize all drivers that today check it as either a) checking the return code is not helpful, or possibly even wrong, so we just stop doing it. I expect those to be the vast majority of callers, but that could be wrong. b) Code that legitimately check the error code and need to take an appropriate action. These could be changed to calling a different interface such as 'pci_bus_read_config_word' or a new 'pci_device_last_error()' function. The reasons I suspect that most callers don't actually need to check for errors are: - Most error checking is static: PCIBIOS_BAD_REGISTER_NUMBER only happens if you pass an invalid register number, but most callers pass a compile-time constant register number that is known to be correct, or the driver would never work. Similarly, PCIBIOS_DEVICE_NOT_FOUND wouldn't normally happen since you pass a valid pci_device pointer that was already probed. - config space accesses are very rare compared to memory space access and on the hardware side the error handling would be similar, but readl/writel don't return errors, they just access wrong registers or return 0x. arch/powerpc/kernel/eeh.c has a ton extra code written to deal with it, but no other architectures do. - If we add code to detect errors in pci_read_config_* and do some of the stuff from powerpc's eeh_dev_check_failure(), we are more likely to catch intermittent failures when drivers don't check, or bugs with invalid arguments in device drivers than relying on drivers to get their error handling right when those code paths don't ever get covered in normal testing. Looking at a couple of random drivers that do check the return codes, I find: drivers/edac/amd8131_edac.c: prints the register number, then keeps
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
On 14/07/2020, Bjorn Helgaas wrote: >> >> a) callers of the high-level config space accessors >>pci_{write,read}_config_{byte,word,dword}, mostly in device >>drivers. >> b) low-level implementation of the config space accessors >> through struct pci_ops >> c) all other occurrences of these constants >> >> Starting with a), my first question is whether any high-level >> drivers even need to care about errors from these functions. I see >> 4913 callers that ignore the return code, and 576 that actually >> check it, and almost none care about the specific error (as you >> found as well). Unless we conclude that most PCI drivers are wrong, >> could we just change the return type to 'void' and assume they never >> fail for valid arguments on a valid pci_device* ? > > I really like this idea. > > pci_write_config_*() has one return value, and only 100ish of 2500 > callers check for errors. It's sometimes possible for config > accessors to detect PCI errors and return failure, e.g., device was > removed or didn't respond, but most of them don't, and detecting these > errors is not really that valuable. > > pci_read_config_*() is much more interesting because it returns two > things, the function return value and the value read from the PCI > device, and it's complicated to check both. > > Again it's sometimes possible for config read accessors to detect PCI > errors, but in most cases a PCI error means the accessor returns > success and the value from PCI is ~0. > > Checking the function return value catches programming errors (bad > alignment, etc) but misses most of the interesting errors (device was > unplugged or reported a PCI error). > > Checking the value returned from PCI is tricky because ~0 is a valid > value for some config registers, and only the driver knows for sure. > If the driver knows that ~0 is a possible value, it would have to do > something else, e.g., another config read of a register that *cannot* > be ~0, to see whether it's really an error. > > I suspect that if we had a single value to look at it would be easier > to get right. Error checking with current interface would look like > this: > > err = pci_read_config_word(dev, addr, ); > if (err) > return -EINVAL; > > if (PCI_POSSIBLE_ERROR(val)) { > /* if driver knows ~0 is invalid */ > return -EINVAL; > > /* if ~0 is potentially a valid value */ > err = pci_read_config_word(dev, PCI_VENDOR_ID, ); > if (err) > return -EINVAL; > > if (PCI_POSSIBLE_ERROR(val2)) > return -EINVAL; > } > > Error checking with a possible interface that returned only a single > value could look like this: > > val = pci_config_read_word(dev, addr); > if (PCI_POSSIBLE_ERROR(val)) { > /* if driver knows ~0 is invalid */ > return -EINVAL; > > /* if ~0 is potentially a valid value */ > val2 = pci_config_read_word(dev, PCI_VENDOR_ID); > if (PCI_POSSIBLE_ERROR(val2)) > return -EINVAL; > } > > Am I understanding you correctly? Let us not do this. Reading config space is really expensive on some architectures. Requiring a driver to do it twice on some values does not improve upon that situation. And is quite redundant if the Root Complex driver already knows that the first access has failed. Additionally since multiple config accesses to the same devices is not allowed in the spec, the hardware must block and wait for a timeout if a config access does not get a response. (Can happen if a intermediate link between the RC and endpoint has to retrain) Having to block twice is very much not ideal. And in the case with retraining the secondary access might even succeed. As the link might recover between reading the first config word and reading PCI_VENDOR_ID. Thus allowing the driver to accept invalid data from the device. > >> For b), it might be nice to also change other aspects of the >> interface, e.g. passing a pci_host_bridge pointer plus bus number >> instead of a pci_bus pointer, or having the callback in the >> pci_host_bridge structure. > > I like this idea a lot, too. I think the fact that > pci_bus_read_config_word() requires a pci_bus * complicates things in > a few places. > > I think it's completely separate, as you say, and we should defer it > for now because even part a) is a lot of work. I added it to my list > of possible future projects. > What about strange PCI devices such as Non-Transparent bridges? They will require their own PCI Config space accessors that is not connected to a host bridge if one wants to do some sort of punch-through enumeration. I guess the kernel doesn't care much about them? Best regards, Kjetil Oftedal
Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
[trimmed the cc list; it's still too large but maybe arch folks care] On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote: > On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa > wrote: > > This goal of these series is to move the definition of *all* > > PCIBIOS* from include/linux/pci.h to arch/x86 and limit their use > > within there. All other tree specific definition will be left for > > intact. Maybe they can be renamed. > > > > PCIBIOS* is an x86 concept as defined by the PCI spec. The > > returned error codes of PCIBIOS* are positive values and this > > introduces some complexities which other archs need not incur. > > I think the intention is good, but I find the series in its current > form very hard to review, in particular the way you touch some > functions three times with trivial changes. Instead of > > 1) replace PCIBIOS_SUCCESSFUL with 0 > 2) drop pointless 0-comparison > 3) reformat whitespace > > I would suggest to combine the first two steps into one patch per > subsystem and drop the third step. I agree. BUT please don't just run out and post new patches to do this. Let's talk about Arnd's further ideas below first. > ... > Maybe the work can be split up differently, with a similar end > result but fewer and easier reviewed patches. The way I'd look at > the problem, there are three main areas that can be dealt with one > at a time: > > a) callers of the high-level config space accessors >pci_{write,read}_config_{byte,word,dword}, mostly in device >drivers. > b) low-level implementation of the config space accessors > through struct pci_ops > c) all other occurrences of these constants > > Starting with a), my first question is whether any high-level > drivers even need to care about errors from these functions. I see > 4913 callers that ignore the return code, and 576 that actually > check it, and almost none care about the specific error (as you > found as well). Unless we conclude that most PCI drivers are wrong, > could we just change the return type to 'void' and assume they never > fail for valid arguments on a valid pci_device* ? I really like this idea. pci_write_config_*() has one return value, and only 100ish of 2500 callers check for errors. It's sometimes possible for config accessors to detect PCI errors and return failure, e.g., device was removed or didn't respond, but most of them don't, and detecting these errors is not really that valuable. pci_read_config_*() is much more interesting because it returns two things, the function return value and the value read from the PCI device, and it's complicated to check both. Again it's sometimes possible for config read accessors to detect PCI errors, but in most cases a PCI error means the accessor returns success and the value from PCI is ~0. Checking the function return value catches programming errors (bad alignment, etc) but misses most of the interesting errors (device was unplugged or reported a PCI error). Checking the value returned from PCI is tricky because ~0 is a valid value for some config registers, and only the driver knows for sure. If the driver knows that ~0 is a possible value, it would have to do something else, e.g., another config read of a register that *cannot* be ~0, to see whether it's really an error. I suspect that if we had a single value to look at it would be easier to get right. Error checking with current interface would look like this: err = pci_read_config_word(dev, addr, ); if (err) return -EINVAL; if (PCI_POSSIBLE_ERROR(val)) { /* if driver knows ~0 is invalid */ return -EINVAL; /* if ~0 is potentially a valid value */ err = pci_read_config_word(dev, PCI_VENDOR_ID, ); if (err) return -EINVAL; if (PCI_POSSIBLE_ERROR(val2)) return -EINVAL; } Error checking with a possible interface that returned only a single value could look like this: val = pci_config_read_word(dev, addr); if (PCI_POSSIBLE_ERROR(val)) { /* if driver knows ~0 is invalid */ return -EINVAL; /* if ~0 is potentially a valid value */ val2 = pci_config_read_word(dev, PCI_VENDOR_ID); if (PCI_POSSIBLE_ERROR(val2)) return -EINVAL; } Am I understanding you correctly? > For b), it might be nice to also change other aspects of the > interface, e.g. passing a pci_host_bridge pointer plus bus number > instead of a pci_bus pointer, or having the callback in the > pci_host_bridge structure. I like this idea a lot, too. I think the fact that pci_bus_read_config_word() requires a pci_bus * complicates things in a few places. I think it's completely separate, as you say, and we should defer it for now because even part a) is a lot of work. I added it to my list of possible future projects. Bjorn