Re: [PATCH] natsemi: netpoll fixes
On Tue, Mar 13, 2007 at 04:53:54PM +0300, Sergei Shtylyov wrote: > Mark Brown wrote: > >confused and eventually locks up. Before locking up it will usually > >report one or more oversided packets so this is a useful hint that we > >should reset the recieve state machine in order to recover from this. >That's all good by why we need to completely lose TX and other interrupts > in the meantime? High inbound traffic doesn't necessarily mean a high > outbound one, does it? While the code in the driver can cope if the chip takes a while to respond to the reset as far as I have been able to tell in testing it does so close enough to immediately to avoid repeating the loop at all. The effect on transmit processing should be minimal. -- "You grabbed my hand and we fell into it, like a daydream - or a fever." signature.asc Description: Digital signature
Re: [PATCH] natsemi: netpoll fixes
Hello. Mark Brown wrote: Moving netdev_rx() would fix that one but there's some others too - there's one in the timer routine if the chip crashes. In the case you Erm, sorry, I'm not seeing it -- could you "point with finger" please? :-) In netdev_timer() when the device is using PORT_TP if the DspCfg read back from the chip differs from the one we think we programmed into it then the driver thinks the PHY fell over. It then goes through an init sequence, including init_registers() which will reset IntrEnable among other things. What's more important for us, it will also clear IntrStatus (and ignore all pending interrupts). Well, as it will also reinit the whole TX/RX rings, so that all packets will be lost... describe above the consequences shouldn't be too bad since it tends to only occur at high volume so further traffic will tend to occur and cause things to recover - all the testing of that patch was done with the bug present and no ill effects. Oversized packets occur only at high volume? Is it some errata? It's an errata - AN 1287 which you can get from the National web site. It's not actually that chip that's getting oversided packets, what happens is that the state machine which reads data off the wire gets confused and eventually locks up. Before locking up it will usually report one or more oversided packets so this is a useful hint that we should reset the recieve state machine in order to recover from this. That's all good by why we need to completely lose TX and other interrupts in the meantime? High inbound traffic doesn't necessarily mean a high outbound one, does it? WBR, Sergei - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] natsemi: netpoll fixes
Hello. Mark Brown wrote: Subject: natsemi: Fix NAPI for interrupt sharing The interrupt status register for the natsemi chips is clear on read and was read unconditionally from both the interrupt and from the NAPI poll routine, meaning that if the interrupt service routine was called (for example, due to a shared interrupt) while a NAPI poll was scheduled interrupts could be missed. This patch fixes that by ensuring that the interrupt status register is only read by the interrupt handler when interrupts are enabled from the chip. It also reverts a workaround for this problem from the netpoll hook and improves the trace for interrupt events. Thanks to Sergei Shtylyov <[EMAIL PROTECTED]> for spotting the issue, Mark Huth <[EMAIL PROTECTED]> for a simpler method and Simon Blake <[EMAIL PROTECTED]> for testing resources. Signed-Off-By: Mark Brown <[EMAIL PROTECTED]> Index: linux-2.6/drivers/net/natsemi.c === --- linux-2.6.orig/drivers/net/natsemi.c2007-03-11 02:32:43.0 + +++ linux-2.6/drivers/net/natsemi.c 2007-03-13 00:12:29.0 + [...] @@ -2131,17 +2133,23 @@ dev->name, np->intr_status, readl(ioaddr + IntrMask)); - if (!np->intr_status) - return IRQ_NONE; - - prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]); + if (np->intr_status) { + prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]); - if (netif_rx_schedule_prep(dev)) { /* Disable interrupts and register for poll */ - natsemi_irq_disable(dev); - __netif_rx_schedule(dev); + if (netif_rx_schedule_prep(dev)) { + natsemi_irq_disable(dev); + __netif_rx_schedule(dev); + } else + printk(KERN_WARNING + "%s: Ignoring interrupt, status %#08x, mask %#08x.\n", + dev->name, np->intr_status, + readl(ioaddr + IntrMask)); + + return IRQ_HANDLED; } - return IRQ_HANDLED; + + return IRQ_NONE; } The only "complaint" I have is that this "restructuring" seems unnecessary: the only real change it does is an addition of else to the if statement. WBR, Sergei - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] natsemi: netpoll fixes
On Mon, Mar 12, 2007 at 12:05:46PM -0700, Mark Huth wrote: > Since the interrupts are enabled as the NAPI-callback exits, and the > interrupts are disabled in the isr after the callback is scheduled, this > fully avoids the potential race conditions, and requires no locking. If I've benchmarked both approaches and they appear to be much of a muchness for performance in my tests so I've updated my patch to use this approach instead. Thanks for the suggestion, it is simpler. > If you would like me to prepare formal patches based on any of these > concepts, let me know. That's OK - unless any further suggestions I'll submit the patch below after further testing. The improvements to trace and correctness of the interupt handler seem useful. Subject: natsemi: Fix NAPI for interrupt sharing The interrupt status register for the natsemi chips is clear on read and was read unconditionally from both the interrupt and from the NAPI poll routine, meaning that if the interrupt service routine was called (for example, due to a shared interrupt) while a NAPI poll was scheduled interrupts could be missed. This patch fixes that by ensuring that the interrupt status register is only read by the interrupt handler when interrupts are enabled from the chip. It also reverts a workaround for this problem from the netpoll hook and improves the trace for interrupt events. Thanks to Sergei Shtylyov <[EMAIL PROTECTED]> for spotting the issue, Mark Huth <[EMAIL PROTECTED]> for a simpler method and Simon Blake <[EMAIL PROTECTED]> for testing resources. Signed-Off-By: Mark Brown <[EMAIL PROTECTED]> Index: linux-2.6/drivers/net/natsemi.c === --- linux-2.6.orig/drivers/net/natsemi.c2007-03-11 02:32:43.0 + +++ linux-2.6/drivers/net/natsemi.c 2007-03-13 00:12:29.0 + @@ -2119,10 +2119,12 @@ struct netdev_private *np = netdev_priv(dev); void __iomem * ioaddr = ns_ioaddr(dev); - if (np->hands_off) + /* Reading IntrStatus automatically acknowledges so don't do +* that while interrupts are disabled, (for example, while a +* poll is scheduled). */ + if (np->hands_off || !readl(ioaddr + IntrEnable)) return IRQ_NONE; - /* Reading automatically acknowledges. */ np->intr_status = readl(ioaddr + IntrStatus); if (netif_msg_intr(np)) @@ -2131,17 +2133,23 @@ dev->name, np->intr_status, readl(ioaddr + IntrMask)); - if (!np->intr_status) - return IRQ_NONE; - - prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]); + if (np->intr_status) { + prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]); - if (netif_rx_schedule_prep(dev)) { /* Disable interrupts and register for poll */ - natsemi_irq_disable(dev); - __netif_rx_schedule(dev); + if (netif_rx_schedule_prep(dev)) { + natsemi_irq_disable(dev); + __netif_rx_schedule(dev); + } else + printk(KERN_WARNING + "%s: Ignoring interrupt, status %#08x, mask %#08x.\n", + dev->name, np->intr_status, + readl(ioaddr + IntrMask)); + + return IRQ_HANDLED; } - return IRQ_HANDLED; + + return IRQ_NONE; } /* This is the NAPI poll routine. As well as the standard RX handling @@ -2156,6 +2164,12 @@ int work_done = 0; do { + if (netif_msg_intr(np)) + printk(KERN_DEBUG + "%s: Poll, status %#08x, mask %#08x.\n", + dev->name, np->intr_status, + readl(ioaddr + IntrMask)); + if (np->intr_status & (IntrTxDone | IntrTxIntr | IntrTxIdle | IntrTxErr)) { spin_lock(&np->lock); @@ -2399,19 +2413,8 @@ #ifdef CONFIG_NET_POLL_CONTROLLER static void natsemi_poll_controller(struct net_device *dev) { - struct netdev_private *np = netdev_priv(dev); - disable_irq(dev->irq); - - /* -* A real interrupt might have already reached us at this point -* but NAPI might still haven't called us back. As the interrupt -* status register is cleared by reading, we should prevent an -* interrupt loss in this case... -*/ - if (!np->intr_status) - intr_handler(dev->irq, dev); - + intr_handler(dev->irq, dev); enable_irq(dev->irq); } #endif -- "You grabbed my hand and we fell into it, like a daydream - or a fever." signature.asc Description: Digital signature
Re: [PATCH] natsemi: netpoll fixes
Hello, I wrote: Subject: natsemi: Fix NAPI for interrupt sharing To: Jeff Garzik <[EMAIL PROTECTED]> Cc: Sergei Shtylyov <[EMAIL PROTECTED]>, Simon Blake <[EMAIL PROTECTED]>, John Philips <[EMAIL PROTECTED]>, netdev@vger.kernel.org The interrupt status register for the natsemi chips is clear on read and was read unconditionally from both the interrupt and from the NAPI poll routine, meaning that if the interrupt service routine was called (for example, due to a shared interrupt) while a NAPI poll was scheduled interrupts could be missed. This patch fixes that by ensuring that the interrupt status register is only read when there is no poll scheduled. It also reverts a workaround for this problem from the netpoll hook. Thanks to Sergei Shtylyov <[EMAIL PROTECTED]> for spotting the Well, I've blithely overlooked it, and it's you who did spot it. :-) issue and Simon Blake <[EMAIL PROTECTED]> for testing resources. Thanks for the patch! (If I only knew somebody else was working on that issue, it could have saved my cycles, sigh... but well, I should have said that I was going to recast the patch. :-) Signed-Off-By: Mark Brown <[EMAIL PROTECTED]> Index: linux-2.6/drivers/net/natsemi.c === --- linux-2.6.orig/drivers/net/natsemi.c2007-03-11 02:32:43.0 + +++ linux-2.6/drivers/net/natsemi.c2007-03-11 12:09:14.0 + @@ -571,6 +571,8 @@ int oom; /* Interrupt status */ u32 intr_status; +int poll_active; +spinlock_t intr_lock; /* Do not touch the nic registers */ int hands_off; /* Don't pay attention to the reported link state. */ @@ -812,9 +814,11 @@ pci_set_drvdata(pdev, dev); np->iosize = iosize; spin_lock_init(&np->lock); +spin_lock_init(&np->intr_lock); np->msg_enable = (debug >= 0) ? (1intr_status = 0; +np->poll_active = 0; np->eeprom_size = natsemi_pci_info[chip_idx].eeprom_size; if (natsemi_pci_info[chip_idx].flags & NATSEMI_FLAG_IGNORE_PHY) np->ignore_phy = 1; @@ -1406,6 +1410,8 @@ writel(rfcr, ioaddr + RxFilterAddr); } +/* MUST be called so that both NAPI poll and ISR are excluded due to + * use of intr_status. */ static void reset_rx(struct net_device *dev) { int i; @@ -2118,30 +2124,45 @@ struct net_device *dev = dev_instance; struct netdev_private *np = netdev_priv(dev); void __iomem * ioaddr = ns_ioaddr(dev); +unsigned long flags; +irqreturn_t status = IRQ_NONE; if (np->hands_off) return IRQ_NONE; -/* Reading automatically acknowledges. */ -np->intr_status = readl(ioaddr + IntrStatus); - -if (netif_msg_intr(np)) -printk(KERN_DEBUG - "%s: Interrupt, status %#08x, mask %#08x.\n", - dev->name, np->intr_status, - readl(ioaddr + IntrMask)); +spin_lock_irqsave(&np->intr_lock, flags); Yeah, I've suspected that we need to grab np->lock here... but does that separate spinlock actually protect us from anything? I'm also not sure that we need to disable interrupts here. -if (!np->intr_status) -return IRQ_NONE; +/* Reading IntrStatus automatically acknowledges so don't do + * that while a poll is scheduled. */ +if (!np->poll_active) { +np->intr_status = readl(ioaddr + IntrStatus); -prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]); +if (netif_msg_intr(np)) +printk(KERN_DEBUG + "%s: Interrupt, status %#08x, mask %#08x.\n", + dev->name, np->intr_status, + readl(ioaddr + IntrMask)); + +if (np->intr_status) { +prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]); + +/* Disable interrupts and register for poll */ +if (netif_rx_schedule_prep(dev)) { +natsemi_irq_disable(dev); +__netif_rx_schedule(dev); +np->poll_active = 1; +} else +printk(KERN_WARNING + "%s: Ignoring interrupt, status %#08x, mask %#08x.\n", + dev->name, np->intr_status, + readl(ioaddr + IntrMask)); -if (netif_rx_schedule_prep(dev)) { -/* Disable interrupts and register for poll */ -natsemi_irq_disable(dev); -__netif_rx_schedule(dev); +status = IRQ_HANDLED; +} } -return IRQ_HANDLED; + +spin_unlock_irqrestore(&np->intr_lock, flags); +return status; } /* This is the NAPI poll routine. As well as the standard RX handling @@ -2154,8 +2175,15 @@ int work_to_do = min(*budget, dev->quota); int work_done = 0; +unsigned long flags; do { +if (netif_msg_intr(np)) +printk(KERN_DEBUG + "%s: Poll, status %#08x, mask %#08x.\n", + dev->
Re: [PATCH] natsemi: netpoll fixes
On Mon, Mar 12, 2007 at 04:05:48PM +0300, Sergei Shtylyov wrote: > Mark Brown wrote: > >hands_off is stronger than that - it's used for sync with some of the > >other code paths like suspend/resume and means "don't touch the chip". > >I've added a new driver local flag instead. >I'm not sure it was worth it -- we already had __LINK_STATE_RX_SCHED... It would be nice but as well as being a general layering violation it could cause problems when trying to quiesce the hardware since netif_poll_disable() sets this without actually scheduling the poll. > >> Yeah, it seems currently unjustified. However IntrEnable would have > >> been an ultimate criterion on taking or ignoring an interrupt > >> otherwise... >>>I guess the tradeoff depends on the probability >>>of getting the isr called when NAPI is active for the device. >I mean I didn't understand why there's tradeoff and how it depends on >the probability... Reading device registers means going over the PCI bus, which is expensive. Shadowing the interrupt mask state in the driver makes the driver more complicated but means that we avoid synchronous PCI accesses, reducing the number of cycles the CPU spends stalled doing those. The performance benefit from the extra code complexity depends on how often we end up doing the extra read. Since one of the things NAPI does is provide some interrupt mitigation the extra work is most likely to be noticable if some other device is generating interrupts that trigger the natsemi interrupt handler. > >Moving netdev_rx() would fix that one but there's some others too - > >there's one in the timer routine if the chip crashes. In the case you >Erm, sorry, I'm not seeing it -- could you "point with finger" please? >:-) In netdev_timer() when the device is using PORT_TP if the DspCfg read back from the chip differs from the one we think we programmed into it then the driver thinks the PHY fell over. It then goes through an init sequence, including init_registers() which will reset IntrEnable among other things. > >describe above the consequences shouldn't be too bad since it tends to > >only occur at high volume so further traffic will tend to occur and > >cause things to recover - all the testing of that patch was done with > >the bug present and no ill effects. >Oversized packets occur only at high volume? Is it some errata? It's an errata - AN 1287 which you can get from the National web site. It's not actually that chip that's getting oversided packets, what happens is that the state machine which reads data off the wire gets confused and eventually locks up. Before locking up it will usually report one or more oversided packets so this is a useful hint that we should reset the recieve state machine in order to recover from this. >(If I only knew somebody else was working on that issue, it could have > saved my cycles, sigh... but well, I should have said that I was going to > recast the patch. :-) Yeah, me too. I'll submit this one once I've finished testing, then audit other IntrStatus accesses. > >- readl(ioaddr + IntrMask)); > >+spin_lock_irqsave(&np->intr_lock, flags); >Yeah, I've suspected that we need to grab np->lock here... but does that > separate spinlock actually protect us from anything? ... > >+/* We need to ensure that the ISR doesn't run between telling > >+ * NAPI we're done and enabling the interrupt. */ > >Why? :-O This is to ensure that the interrupt handler can't run between us marking that the poll has complete and reenabling the interrupt from the chip. That would mean that the chip would end up with interrupts from the chip enabled while the poll is scheduled. -- "You grabbed my hand and we fell into it, like a daydream - or a fever." signature.asc Description: Digital signature
Re: [PATCH] natsemi: netpoll fixes
Mark Brown wrote: On Sat, Mar 10, 2007 at 11:25:05PM +0300, Sergei Shtylyov wrote: Oops, I was going to recast the patch but my attention switched elsewhere for couple of days, and it "slipped" into mainline. I'm now preparing a better patch to also protect... Ah, I was also looking at it. I enclose my current patch which appears to work although I have not finished testing it yet. interrupt handler, check the dev->state __LINK_STATE_SCHED flag - if it's set, leave immediately, it can't be our interrupt. If it's clear, read the irq enable hardware register. If enabled, do the rest of the interrupt handler. It seems that there's no need to read it, as it gets set to 0 "synchronously" with setting the 'hands_off' flag (except in NAPI callback)... hands_off is stronger than that - it's used for sync with some of the other code paths like suspend/resume and means "don't touch the chip". I've added a new driver local flag instead. Yeah, it seems currently unjustified. However IntrEnable would have been an ultimate criterion on taking or ignoring an interrupt otherwise... I guess the tradeoff depends on the probability of getting the isr called when NAPI is active for the device. Didn't get it... :-/ Some systems can provoke this fairly easily - Sokeris have some boards where at least three natsemis share a single interrupt line, for example (the model I'm looking at has three, they look to have another configuration where 5 would end up on the line). BTW, it seems I've found another interrupt lossage path in the driver: netdev_poll() -> netdev_rx() -> reset_rx() If the netdev_rx() detects an oversized packet, it will call reset_rx() which will spin in a loop "collecting" interrupt status until it sees RxResetDone there. The issue is 'intr_status' field will get overwritten and interrupt status lost after netdev_rx() returns to netdev_poll(). How do you think, is this corner case worth fixing (by moving netdev_rx() call to the top of a do/while loop)? Moving netdev_rx() would fix that one but there's some others too - there's one in the timer routine if the chip crashes. In the case you describe above the consequences shouldn't be too bad since it tends to only occur at high volume so further traffic will tend to occur and cause things to recover - all the testing of that patch was done with the bug present and no ill effects. The proposed patch would seem to be way overkill. The problem, as solved in this patch, at least is just the prevention of the ISR from reading the intr_status while another context (NAPI) is still active. The simplest fix is to revert the netpoll attempted fix and then modify the isr etnry criteria as follows: --- natsemi.c 2006-09-19 20:42:06.0 -0700 +++ natsemi.c.new 2007-03-12 11:35:28.0 -0700 @@ -2094,7 +2094,7 @@ struct netdev_private *np = netdev_priv(dev); void __iomem * ioaddr = ns_ioaddr(dev); -if (np->hands_off) + if (np->hands_off || !readl(ioaddr + IntrEnable)) return IRQ_NONE; /* Reading automatically acknowledges. */ Since the interrupts are enabled as the NAPI-callback exits, and the interrupts are disabled in the isr after the callback is scheduled, this fully avoids the potential race conditions, and requires no locking. If we want to avoid the ioaccess, then this could be augmented with a flag in the device private area which would be tested first, short-circuiting the ioaccess most of the time when the callback is already scheduled. But the IntrEnable still would need to be check to avoid the race on the enable/disable condition between the callback and the isr. It is possible to entirely avoid the IntrEnable access and use only a flag, but that requires a local_irqsave/restore around the calback's enabling of the interrupt and and clearing of the poll_active flag. That is fully up-safe, but may have a theoretical possibility of an unserviced (spurious) interrupt on SMP systems. Such a possibility would only exist if an architecture is able to accept an interrupt on a second processor and get to the natsemi isr before the the first proccesor can clear the poll_active flag. That can be further minimized by omitting the readback of the IntrEnable register, or deferring it until after the flag is cleared. If you would like me to prepare formal patches based on any of these concepts, let me know. Mark Huth Subject: natsemi: Fix NAPI for interrupt sharing To: Jeff Garzik <[EMAIL PROTECTED]> Cc: Sergei Shtylyov <[EMAIL PROTECTED]>, Simon Blake <[EMAIL PROTECTED]>, John Philips <[EMAIL PROTECTED]>, netdev@vger.kernel.org The interrupt status register for the natsemi chips is clear on read and was read unconditionally from both the interrupt and from the NAPI poll routine, meaning that if the interrupt service routine w
Re: [PATCH] natsemi: netpoll fixes
Hello. Mark Brown wrote: Oops, I was going to recast the patch but my attention switched elsewhere for couple of days, and it "slipped" into mainline. I'm now preparing a better patch to also protect... Ah, I was also looking at it. I enclose my current patch which appears to work although I have not finished testing it yet. interrupt handler, check the dev->state __LINK_STATE_SCHED flag - if it's set, leave immediately, it can't be our interrupt. If it's clear, read the irq enable hardware register. If enabled, do the rest of the interrupt handler. It seems that there's no need to read it, as it gets set to 0 "synchronously" with setting the 'hands_off' flag (except in NAPI callback)... hands_off is stronger than that - it's used for sync with some of the other code paths like suspend/resume and means "don't touch the chip". I've added a new driver local flag instead. I'm not sure it was worth it -- we already had __LINK_STATE_RX_SCHED... Yeah, it seems currently unjustified. However IntrEnable would have been an ultimate criterion on taking or ignoring an interrupt otherwise... I guess the tradeoff depends on the probability of getting the isr called when NAPI is active for the device. Didn't get it... :-/ I mean I didn't understand why there's tradeoff and how it depends on the probability... Some systems can provoke this fairly easily - Sokeris have some boards where at least three natsemis share a single interrupt line, for example (the model I'm looking at has three, they look to have another configuration where 5 would end up on the line). PCI means IRQ sharing, generally. So, it should have been fixed anyway. :-) BTW, it seems I've found another interrupt lossage path in the driver: netdev_poll() -> netdev_rx() -> reset_rx() If the netdev_rx() detects an oversized packet, it will call reset_rx() which will spin in a loop "collecting" interrupt status until it sees RxResetDone there. The issue is 'intr_status' field will get overwritten and interrupt status lost after netdev_rx() returns to netdev_poll(). How do you think, is this corner case worth fixing (by moving netdev_rx() call to the top of a do/while loop)? Moving netdev_rx() would fix that one but there's some others too - there's one in the timer routine if the chip crashes. In the case you Erm, sorry, I'm not seeing it -- could you "point with finger" please? :-) describe above the consequences shouldn't be too bad since it tends to only occur at high volume so further traffic will tend to occur and cause things to recover - all the testing of that patch was done with the bug present and no ill effects. Oversized packets occur only at high volume? Is it some errata? Subject: natsemi: Fix NAPI for interrupt sharing To: Jeff Garzik <[EMAIL PROTECTED]> Cc: Sergei Shtylyov <[EMAIL PROTECTED]>, Simon Blake <[EMAIL PROTECTED]>, John Philips <[EMAIL PROTECTED]>, netdev@vger.kernel.org The interrupt status register for the natsemi chips is clear on read and was read unconditionally from both the interrupt and from the NAPI poll routine, meaning that if the interrupt service routine was called (for example, due to a shared interrupt) while a NAPI poll was scheduled interrupts could be missed. This patch fixes that by ensuring that the interrupt status register is only read when there is no poll scheduled. It also reverts a workaround for this problem from the netpoll hook. Thanks to Sergei Shtylyov <[EMAIL PROTECTED]> for spotting the issue and Simon Blake <[EMAIL PROTECTED]> for testing resources. Thanks for the patch! (If I only knew somebody else was working on that issue, it could have saved my cycles, sigh... but well, I should have said that I was going to recast the patch. :-) Signed-Off-By: Mark Brown <[EMAIL PROTECTED]> Index: linux-2.6/drivers/net/natsemi.c === --- linux-2.6.orig/drivers/net/natsemi.c2007-03-11 02:32:43.0 + +++ linux-2.6/drivers/net/natsemi.c 2007-03-11 12:09:14.0 + @@ -571,6 +571,8 @@ int oom; /* Interrupt status */ u32 intr_status; + int poll_active; + spinlock_t intr_lock; /* Do not touch the nic registers */ int hands_off; /* Don't pay attention to the reported link state. */ @@ -812,9 +814,11 @@ pci_set_drvdata(pdev, dev); np->iosize = iosize; spin_lock_init(&np->lock); + spin_lock_init(&np->intr_lock); np->msg_enable = (debug >= 0) ? (1intr_status = 0; + np->poll_active = 0; np->eeprom_size = natsemi_pci_info[chip_idx].eeprom_size; if (natsemi_pci_info[chip_idx].flags & NATSEMI_FLAG_IGNORE_PHY) np->ignore_phy = 1; @@ -1406,6 +1410,8 @@ writel(rfcr, ioaddr + RxFilterAddr); } +/* MUST be called so that both NAPI poll and IS
Re: [PATCH] natsemi: netpoll fixes
On Sat, Mar 10, 2007 at 11:25:05PM +0300, Sergei Shtylyov wrote: >Oops, I was going to recast the patch but my attention switched >elsewhere for couple of days, and it "slipped" into mainline. I'm now > preparing a better patch to also protect... Ah, I was also looking at it. I enclose my current patch which appears to work although I have not finished testing it yet. > >interrupt handler, check the dev->state __LINK_STATE_SCHED flag - if > >it's set, leave immediately, it can't be our interrupt. If it's clear, > >read the irq enable hardware register. If enabled, do the rest of the > >interrupt handler. >It seems that there's no need to read it, as it gets set to 0 > "synchronously" with setting the 'hands_off' flag (except in NAPI > callback)... hands_off is stronger than that - it's used for sync with some of the other code paths like suspend/resume and means "don't touch the chip". I've added a new driver local flag instead. >Yeah, it seems currently unjustified. However IntrEnable would have >been an ultimate criterion on taking or ignoring an interrupt otherwise... > >I guess the tradeoff depends on the probability > >of getting the isr called when NAPI is active for the device. >Didn't get it... :-/ Some systems can provoke this fairly easily - Sokeris have some boards where at least three natsemis share a single interrupt line, for example (the model I'm looking at has three, they look to have another configuration where 5 would end up on the line). >BTW, it seems I've found another interrupt lossage path in the driver: > netdev_poll() -> netdev_rx() -> reset_rx() > If the netdev_rx() detects an oversized packet, it will call reset_rx() > which will spin in a loop "collecting" interrupt status until it sees > RxResetDone there. The issue is 'intr_status' field will get overwritten > and interrupt status lost after netdev_rx() returns to netdev_poll(). How > do you think, is this corner case worth fixing (by moving netdev_rx() call > to the top of a do/while loop)? Moving netdev_rx() would fix that one but there's some others too - there's one in the timer routine if the chip crashes. In the case you describe above the consequences shouldn't be too bad since it tends to only occur at high volume so further traffic will tend to occur and cause things to recover - all the testing of that patch was done with the bug present and no ill effects. Subject: natsemi: Fix NAPI for interrupt sharing To: Jeff Garzik <[EMAIL PROTECTED]> Cc: Sergei Shtylyov <[EMAIL PROTECTED]>, Simon Blake <[EMAIL PROTECTED]>, John Philips <[EMAIL PROTECTED]>, netdev@vger.kernel.org The interrupt status register for the natsemi chips is clear on read and was read unconditionally from both the interrupt and from the NAPI poll routine, meaning that if the interrupt service routine was called (for example, due to a shared interrupt) while a NAPI poll was scheduled interrupts could be missed. This patch fixes that by ensuring that the interrupt status register is only read when there is no poll scheduled. It also reverts a workaround for this problem from the netpoll hook. Thanks to Sergei Shtylyov <[EMAIL PROTECTED]> for spotting the issue and Simon Blake <[EMAIL PROTECTED]> for testing resources. Signed-Off-By: Mark Brown <[EMAIL PROTECTED]> Index: linux-2.6/drivers/net/natsemi.c === --- linux-2.6.orig/drivers/net/natsemi.c2007-03-11 02:32:43.0 + +++ linux-2.6/drivers/net/natsemi.c 2007-03-11 12:09:14.0 + @@ -571,6 +571,8 @@ int oom; /* Interrupt status */ u32 intr_status; + int poll_active; + spinlock_t intr_lock; /* Do not touch the nic registers */ int hands_off; /* Don't pay attention to the reported link state. */ @@ -812,9 +814,11 @@ pci_set_drvdata(pdev, dev); np->iosize = iosize; spin_lock_init(&np->lock); + spin_lock_init(&np->intr_lock); np->msg_enable = (debug >= 0) ? (1intr_status = 0; + np->poll_active = 0; np->eeprom_size = natsemi_pci_info[chip_idx].eeprom_size; if (natsemi_pci_info[chip_idx].flags & NATSEMI_FLAG_IGNORE_PHY) np->ignore_phy = 1; @@ -1406,6 +1410,8 @@ writel(rfcr, ioaddr + RxFilterAddr); } +/* MUST be called so that both NAPI poll and ISR are excluded due to + * use of intr_status. */ static void reset_rx(struct net_device *dev) { int i; @@ -2118,30 +2124,45 @@ struct net_device *dev = dev_instance; struct netdev_private *np = netdev_priv(dev); void __iomem * ioaddr = ns_ioaddr(dev); + unsigned long flags; + irqreturn_t status = IRQ_NONE; if (np->hands_off) return IRQ_NONE; - /* Reading automatically acknowledges. */ - np->intr_status = readl(ioaddr + IntrStatus); - - if (n
Re: [PATCH] natsemi: netpoll fixes
Hello. Mark Huth wrote: #ifdef CONFIG_NET_POLL_CONTROLLER static void natsemi_poll_controller(struct net_device *dev) { + struct netdev_private *np = netdev_priv(dev); + disable_irq(dev->irq); - intr_handler(dev->irq, dev); + + /* + * A real interrupt might have already reached us at this point + * but NAPI might still haven't called us back. As the interrupt + * status register is cleared by reading, we should prevent an + * interrupt loss in this case... + */ + if (!np->intr_status) + intr_handler(dev->irq, dev); + enable_irq(dev->irq); Oops, I was going to recast the patch but my attention switched elsewhere for couple of days, and it "slipped" into mainline. I'm now preparing a better patch to also protect... Is it possible for this to run at the same time as the NAPI poll? If so then it is possible for the netpoll poll to run between np->intr_status being cleared and netif_rx_complete() being called. If the hardware asserts an interrupt at the wrong moment then this could cause the Well, there is a whole task of analyzing the netpoll conditions under smp. There appears to me to be a race with netpoll and NAPI on another processor, given that netpoll can be called with virtually any system condition on a debug breakpoint or crash dump initiation. I'm spending some time looking into it, but don't have a smoking gun immediately. Regardless, if such a condition does exist, it is shared across many or all of the potential netpolled devices. Since that is exactly the condition the suggested patch purports to solve, it is pointless if the whole NAPI/netpoll race exists. Such a race would lead to various and imaginative failures in the system. So don't fix that problem in a particular driver. If it exists, fix it generally in the netpoll/NAPI infrastructure. Any takers? :-) In any case, this is a problem independently of netpoll if the chip shares an interrupt with anything so the interrupt handler should be fixed to cope with this situation instead. Yes, that would appear so. If an interrupt line is shared with this device, then the interrupt handler can be called again, even though the device's interrupts are disabled on the interface. So, in the actual interrupt handler, check the dev->state __LINK_STATE_SCHED flag - if it's set, leave immediately, it can't be our interrupt. If it's clear, read the irq enable hardware register. If enabled, do the rest of the interrupt handler. It seems that there's no need to read it, as it gets set to 0 "synchronously" with setting the 'hands_off' flag (except in NAPI callback)... Since the isr is disabled only by the interrupt handler, and enabled only by the poll routine, the race on the interrupt cause register is prevented. And, as a byproduct, the netpoll race is also prevented. You could just always read the isr enable hardware register, but that means you always do an operation to the chip, which can be painfully slow. Yeah, it seems currently unjustified. However IntrEnable would have been an ultimate criterion on taking or ignoring an interrupt otherwise... I guess the tradeoff depends on the probability of getting the isr called when NAPI is active for the device. Didn't get it... :-/ If this results in netpoll not getting a packet right away, that's okay, since the netpoll users should try again. Well, in certain stuations (like KGDBoE), netpoll callback being called *while* NAPI callback is being executed would mean a deadlock, I think (as NAPI callback will never complete)... BTW, it seems I've found another interrupt lossage path in the driver: netdev_poll() -> netdev_rx() -> reset_rx() If the netdev_rx() detects an oversized packet, it will call reset_rx() which will spin in a loop "collecting" interrupt status until it sees RxResetDone there. The issue is 'intr_status' field will get overwritten and interrupt status lost after netdev_rx() returns to netdev_poll(). How do you think, is this corner case worth fixing (by moving netdev_rx() call to the top of a do/while loop)? Mark Huth WBR, Sergei - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] natsemi: netpoll fixes
Sergei Shtylyov wrote: Fix two issues in this driver's netpoll path: one usual, with spin_unlock_irq() enabling interrupts which nobody asks it to do (that has been fixed recently in a number of drivers) and one unusual, with poll_controller() method possibly causing loss of interrupts due to the interrupt status register being cleared by a simple read and the interrpupt handler simply storing it, not accumulating. Signed-off-by: Sergei Shtylyov <[EMAIL PROTECTED]> --- drivers/net/natsemi.c | 24 +++- 1 files changed, 19 insertions(+), 5 deletions(-) applied - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] natsemi: netpoll fixes
Mark Brown wrote: [Once more with CCs] On Tue, Mar 06, 2007 at 12:10:08AM +0400, Sergei Shtylyov wrote: #ifdef CONFIG_NET_POLL_CONTROLLER static void natsemi_poll_controller(struct net_device *dev) { + struct netdev_private *np = netdev_priv(dev); + disable_irq(dev->irq); - intr_handler(dev->irq, dev); + + /* + * A real interrupt might have already reached us at this point + * but NAPI might still haven't called us back. As the interrupt + * status register is cleared by reading, we should prevent an + * interrupt loss in this case... + */ + if (!np->intr_status) + intr_handler(dev->irq, dev); + enable_irq(dev->irq); Is it possible for this to run at the same time as the NAPI poll? If so then it is possible for the netpoll poll to run between np->intr_status being cleared and netif_rx_complete() being called. If the hardware asserts an interrupt at the wrong moment then this could cause the Well, there is a whole task of analyzing the netpoll conditions under smp. There appears to me to be a race with netpoll and NAPI on another processor, given that netpoll can be called with virtually any system condition on a debug breakpoint or crash dump initiation. I'm spending some time looking into it, but don't have a smoking gun immediately. Regardless, if such a condition does exist, it is shared across many or all of the potential netpolled devices. Since that is exactly the condition the suggested patch purports to solve, it is pointless if the whole NAPI/netpoll race exists. Such a race would lead to various and imaginative failures in the system. So don't fix that problem in a particular driver. If it exists, fix it generally in the netpoll/NAPI infrastructure. In any case, this is a problem independently of netpoll if the chip shares an interrupt with anything so the interrupt handler should be fixed to cope with this situation instead. Yes, that would appear so. If an interrupt line is shared with this device, then the interrupt handler can be called again, even though the device's interrupts are disabled on the interface. So, in the actual interrupt handler, check the dev->state __LINK_STATE_SCHED flag - if it's set, leave immediately, it can't be our interrupt. If it's clear, read the irq enable hardware register. If enabled, do the rest of the interrupt handler. Since the isr is disabled only by the interrupt handler, and enabled only by the poll routine, the race on the interrupt cause register is prevented. And, as a byproduct, the netpoll race is also prevented. You could just always read the isr enable hardware register, but that means you always do an operation to the chip, which can be painfully slow. I guess the tradeoff depends on the probability of getting the isr called when NAPI is active for the device. If this results in netpoll not getting a packet right away, that's okay, since the netpoll users should try again. Mark Huth - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] natsemi: netpoll fixes
[Once more with CCs] On Tue, Mar 06, 2007 at 12:10:08AM +0400, Sergei Shtylyov wrote: > #ifdef CONFIG_NET_POLL_CONTROLLER > static void natsemi_poll_controller(struct net_device *dev) > { > + struct netdev_private *np = netdev_priv(dev); > + > disable_irq(dev->irq); > - intr_handler(dev->irq, dev); > + > + /* > + * A real interrupt might have already reached us at this point > + * but NAPI might still haven't called us back. As the > interrupt > + * status register is cleared by reading, we should prevent an > + * interrupt loss in this case... > + */ > + if (!np->intr_status) > + intr_handler(dev->irq, dev); > + > enable_irq(dev->irq); Is it possible for this to run at the same time as the NAPI poll? If so then it is possible for the netpoll poll to run between np->intr_status being cleared and netif_rx_complete() being called. If the hardware asserts an interrupt at the wrong moment then this could cause the In any case, this is a problem independently of netpoll if the chip shares an interrupt with anything so the interrupt handler should be fixed to cope with this situation instead. -- "You grabbed my hand and we fell into it, like a daydream - or a fever." signature.asc Description: Digital signature
Re: [PATCH] natsemi: netpoll fixes
On Tue, Mar 06, 2007 at 12:10:08AM +0400, Sergei Shtylyov wrote: > #ifdef CONFIG_NET_POLL_CONTROLLER > static void natsemi_poll_controller(struct net_device *dev) > { > + struct netdev_private *np = netdev_priv(dev); > + > disable_irq(dev->irq); > - intr_handler(dev->irq, dev); > + > + /* > + * A real interrupt might have already reached us at this point > + * but NAPI might still haven't called us back. As the interrupt > + * status register is cleared by reading, we should prevent an > + * interrupt loss in this case... > + */ > + if (!np->intr_status) > + intr_handler(dev->irq, dev); > + > enable_irq(dev->irq); Is it possible for this to run at the same time as the NAPI poll? If so then it is possible for the netpoll poll to run between np->intr_status being cleared and netif_rx_complete() being called. If the hardware asserts an interrupt at the wrong moment then this could cause the In any case, this is a problem independently of netpoll if the chip shares an interrupt with anything so the interrupt handler should be fixed to cope with this situation instead. -- "You grabbed my hand and we fell into it, like a daydream - or a fever." signature.asc Description: Digital signature
[PATCH] natsemi: netpoll fixes
Fix two issues in this driver's netpoll path: one usual, with spin_unlock_irq() enabling interrupts which nobody asks it to do (that has been fixed recently in a number of drivers) and one unusual, with poll_controller() method possibly causing loss of interrupts due to the interrupt status register being cleared by a simple read and the interrpupt handler simply storing it, not accumulating. Signed-off-by: Sergei Shtylyov <[EMAIL PROTECTED]> --- drivers/net/natsemi.c | 24 +++- 1 files changed, 19 insertions(+), 5 deletions(-) Index: linux-2.6/drivers/net/natsemi.c === --- linux-2.6.orig/drivers/net/natsemi.c +++ linux-2.6/drivers/net/natsemi.c @@ -2024,6 +2024,7 @@ static int start_tx(struct sk_buff *skb, struct netdev_private *np = netdev_priv(dev); void __iomem * ioaddr = ns_ioaddr(dev); unsigned entry; + unsigned long flags; /* Note: Ordering is important here, set the field with the "ownership" bit last, and only then increment cur_tx. */ @@ -2037,7 +2038,7 @@ static int start_tx(struct sk_buff *skb, np->tx_ring[entry].addr = cpu_to_le32(np->tx_dma[entry]); - spin_lock_irq(&np->lock); + spin_lock_irqsave(&np->lock, flags); if (!np->hands_off) { np->tx_ring[entry].cmd_status = cpu_to_le32(DescOwn | skb->len); @@ -2056,7 +2057,7 @@ static int start_tx(struct sk_buff *skb, dev_kfree_skb_irq(skb); np->stats.tx_dropped++; } - spin_unlock_irq(&np->lock); + spin_unlock_irqrestore(&np->lock, flags); dev->trans_start = jiffies; @@ -,6 +2223,8 @@ static void netdev_rx(struct net_device pkt_len = (desc_status & DescSizeMask) - 4; if ((desc_status&(DescMore|DescPktOK|DescRxLong)) != DescPktOK){ if (desc_status & DescMore) { + unsigned long flags; + if (netif_msg_rx_err(np)) printk(KERN_WARNING "%s: Oversized(?) Ethernet " @@ -2236,12 +2239,12 @@ static void netdev_rx(struct net_device * reset procedure documented in * AN-1287. */ - spin_lock_irq(&np->lock); + spin_lock_irqsave(&np->lock, flags); reset_rx(dev); reinit_rx(dev); writel(np->ring_dma, ioaddr + RxRingPtr); check_link(dev); - spin_unlock_irq(&np->lock); + spin_unlock_irqrestore(&np->lock, flags); /* We'll enable RX on exit from this * function. */ @@ -2396,8 +2399,19 @@ static struct net_device_stats *get_stat #ifdef CONFIG_NET_POLL_CONTROLLER static void natsemi_poll_controller(struct net_device *dev) { + struct netdev_private *np = netdev_priv(dev); + disable_irq(dev->irq); - intr_handler(dev->irq, dev); + + /* +* A real interrupt might have already reached us at this point +* but NAPI might still haven't called us back. As the interrupt +* status register is cleared by reading, we should prevent an +* interrupt loss in this case... +*/ + if (!np->intr_status) + intr_handler(dev->irq, dev); + enable_irq(dev->irq); } #endif - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html