Gitweb:     
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b09ff9d787911b0b46a4d286e68f1f84e8b0b94
Commit:     6b09ff9d787911b0b46a4d286e68f1f84e8b0b94
Parent:     4f4aeeabc061826376c9a72b4714d062664999ea
Author:     Bryan Boatright <[EMAIL PROTECTED]>
AuthorDate: Thu Feb 7 00:14:58 2008 -0800
Committer:  Linus Torvalds <[EMAIL PROTECTED]>
CommitDate: Thu Feb 7 08:42:23 2008 -0800

    drivers/edac: pci: broken parity regression
    
    Using the EDAC code in kernel.org kernel version 2.6.23.8 I am seeing the
    following problem:
    
        In the kernel there is a pci device attribute located in sysfs that is
        checked by the EDAC PCI scanning code. If that attribute is set,
        PCI parity/error scannining is skipped for that device. The attribute
        is:
    
                broken_parity_status
    
        as is located in /sys/devices/pci<XXX>/0000:XX:YY.Z directorys for
        PCI devices.
    
    I don't think this check was actually implemented.  I have a misbehaved card
    that reports a parity error every 1000 ms:
    
    Nov 25 07:28:43 beta kernel: EDAC PCI: Master Data Parity Error on 
0000:05:01.0
    Nov 25 07:28:44 beta kernel: EDAC PCI: Master Data Parity Error on 
0000:05:01.0
    Nov 25 07:28:45 beta kernel: EDAC PCI: Master Data Parity Error on 
0000:05:01.0
    
    Setting that card's broken_parity_status bit did not mask the error:
    
    echo "1" > /sys/bus/pci/devices/0000:05:01.0/broken_parity_status
    
    I looked through the EDAC code and did not readily see any reference to
    broken_parity_status at all (which makes sense based on the behavior I am
    seeing).  I applied the following patch as a proof-of-concept and now EDAC's
    PCI parity error reporting behaves as documented:
    
    bryan
    
    Good regression find, bryan. It used to work. sigh.
    I added more logic to your patch, for more coverage of the error.
    
    Doug T
    
    Signed-off-by: Bryan Boatright <[EMAIL PROTECTED]>
    Signed-off-by: Doug Thompson <[EMAIL PROTECTED]>
    Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
    Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
---
 drivers/edac/edac_pci_sysfs.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/edac_pci_sysfs.c b/drivers/edac/edac_pci_sysfs.c
index 5b075da..71c3195 100644
--- a/drivers/edac/edac_pci_sysfs.c
+++ b/drivers/edac/edac_pci_sysfs.c
@@ -558,8 +558,10 @@ static void edac_pci_dev_parity_test(struct pci_dev *dev)
 
        debugf4("PCI STATUS= 0x%04x %s\n", status, dev->dev.bus_id);
 
-       /* check the status reg for errors */
-       if (status) {
+       /* check the status reg for errors on boards NOT marked as broken
+        * if broken, we cannot trust any of the status bits
+        */
+       if (status && !dev->broken_parity_status) {
                if (status & (PCI_STATUS_SIG_SYSTEM_ERROR)) {
                        edac_printk(KERN_CRIT, EDAC_PCI,
                                "Signaled System Error on %s\n",
@@ -593,8 +595,10 @@ static void edac_pci_dev_parity_test(struct pci_dev *dev)
 
                debugf4("PCI SEC_STATUS= 0x%04x %s\n", status, dev->dev.bus_id);
 
-               /* check the secondary status reg for errors */
-               if (status) {
+               /* check the secondary status reg for errors,
+                * on NOT broken boards
+                */
+               if (status && !dev->broken_parity_status) {
                        if (status & (PCI_STATUS_SIG_SYSTEM_ERROR)) {
                                edac_printk(KERN_CRIT, EDAC_PCI, "Bridge "
                                        "Signaled System Error on %s\n",
-
To unsubscribe from this list: send the line "unsubscribe git-commits-head" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to