Re: -rc5: e1000 resume weirdness
* Jesse Brandeburg <[EMAIL PROTECTED]> wrote: > was there a "NETDEV WATCHDOG" message that follows this? If not it is > a harmless debug print. Note the time_stamp and jiffies difference, > very large, consistent with a resume. I think we need to disable the > internal e1000 tx hang code that causes this debug print when we are > suspending. I'll work with auke to generate a short patch. there was no "NETDEV WATCHDOG" message. But still there was a ~30 seconds delay until i got the first few packets through the interface - while normally it's available almost instantly after resume. But ... this condition seems sporadic, i havent seen it on subsequent suspend+resume attempts. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -rc5: e1000 resume weirdness
Jesse Brandeburg wrote: On 3/26/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: hm, on a T60, after suspend/resume, i get an e1000 timeout: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue <0> TDH TDT next_to_use next_to_clean<82> buffer_info[next_to_clean] time_stamp next_to_watch<82> jiffies next_to_watch.status <1> it works fine after that reset. The e1000 driver didnt do this before after resume the network was always available immediately. So this appears to be a relatively new regression (post-rc3 or so). high-res timers was disabled. was there a "NETDEV WATCHDOG" message that follows this? If not it is a harmless debug print. Note the time_stamp and jiffies difference, very large, consistent with a resume. I think we need to disable the internal e1000 tx hang code that causes this debug print when we are suspending. I'll work with auke to generate a short patch. hmm, yeah, it appears that the patch I sent just a second ago isn't applicable in this case, since the irq handler is obviously enabled (the Link Up message proves that). thanks to Jesse for being awake :) Auke - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -rc5: e1000 resume weirdness
On 3/26/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: hm, on a T60, after suspend/resume, i get an e1000 timeout: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue <0> TDH TDT next_to_use next_to_clean<82> buffer_info[next_to_clean] time_stamp next_to_watch<82> jiffies next_to_watch.status <1> it works fine after that reset. The e1000 driver didnt do this before after resume the network was always available immediately. So this appears to be a relatively new regression (post-rc3 or so). high-res timers was disabled. was there a "NETDEV WATCHDOG" message that follows this? If not it is a harmless debug print. Note the time_stamp and jiffies difference, very large, consistent with a resume. I think we need to disable the internal e1000 tx hang code that causes this debug print when we are suspending. I'll work with auke to generate a short patch. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -rc5: e1000 resume weirdness
Ingo Molnar wrote: > hm, on a T60, after suspend/resume, i get an e1000 timeout: > > e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: RX/TX > e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang > Tx Queue <0> > TDH > TDT > next_to_use > next_to_clean<82> > buffer_info[next_to_clean] > time_stamp > next_to_watch<82> > jiffies > next_to_watch.status <1> > > it works fine after that reset. The e1000 driver didnt do this before > after resume the network was always available immediately. So this > appears to be a relatively new regression (post-rc3 or so). high-res > timers was disabled. > > Ingo THT == TDH -> this is a 'bogus' tx hang indicating that one or more parts in the TX patch is not properly enabled. Most likely, I suspect that we haven't enabled something because the ordering of irq free/alloc was messed up and nobody cared before, but with all the pci_save_state fixes going in we hit a bump. The reset kicks it all back up in order so it's something silly like this for sure. The attached patch fixes that and sitting in my queue for a few days. Can you see if that works? Auke --- e1000: Free interrupts symmetrically with resume From: Auke Kok <[EMAIL PROTECTED]> Free interrupts symmetrically with resume allocation to prevent pci save/restore state from possibly failing or warning. Signed-off-by: Auke Kok <[EMAIL PROTECTED]> --- drivers/net/e1000/e1000_main.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 55ef148..93d41f0 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -5190,6 +5190,7 @@ e1000_suspend(struct pci_dev *pdev, pm_message_t state) if (netif_running(netdev)) { WARN_ON(test_bit(__E1000_RESETTING, >flags)); e1000_down(adapter); + e1000_free_irq(adapter); } #ifdef CONFIG_PM @@ -5257,9 +5258,6 @@ e1000_suspend(struct pci_dev *pdev, pm_message_t state) if (adapter->hw.phy.type == e1000_phy_igp_3) e1000_igp3_phy_powerdown_workaround_ich8lan(>hw); - if (netif_running(netdev)) - e1000_free_irq(adapter); - /* Release control of h/w to f/w. If f/w is AMT enabled, this * would have already happened in close and is redundant. */ e1000_release_hw_control(adapter); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -rc5: e1000 resume weirdness
Ingo Molnar wrote: hm, on a T60, after suspend/resume, i get an e1000 timeout: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue 0 TDH ec TDT ec next_to_use ec next_to_clean82 buffer_info[next_to_clean] time_stamp fffcc3db next_to_watch82 jiffies fffd5da0 next_to_watch.status 1 it works fine after that reset. The e1000 driver didnt do this before after resume the network was always available immediately. So this appears to be a relatively new regression (post-rc3 or so). high-res timers was disabled. Ingo THT == TDH - this is a 'bogus' tx hang indicating that one or more parts in the TX patch is not properly enabled. Most likely, I suspect that we haven't enabled something because the ordering of irq free/alloc was messed up and nobody cared before, but with all the pci_save_state fixes going in we hit a bump. The reset kicks it all back up in order so it's something silly like this for sure. The attached patch fixes that and sitting in my queue for a few days. Can you see if that works? Auke --- e1000: Free interrupts symmetrically with resume From: Auke Kok [EMAIL PROTECTED] Free interrupts symmetrically with resume allocation to prevent pci save/restore state from possibly failing or warning. Signed-off-by: Auke Kok [EMAIL PROTECTED] --- drivers/net/e1000/e1000_main.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 55ef148..93d41f0 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -5190,6 +5190,7 @@ e1000_suspend(struct pci_dev *pdev, pm_message_t state) if (netif_running(netdev)) { WARN_ON(test_bit(__E1000_RESETTING, adapter-flags)); e1000_down(adapter); + e1000_free_irq(adapter); } #ifdef CONFIG_PM @@ -5257,9 +5258,6 @@ e1000_suspend(struct pci_dev *pdev, pm_message_t state) if (adapter-hw.phy.type == e1000_phy_igp_3) e1000_igp3_phy_powerdown_workaround_ich8lan(adapter-hw); - if (netif_running(netdev)) - e1000_free_irq(adapter); - /* Release control of h/w to f/w. If f/w is AMT enabled, this * would have already happened in close and is redundant. */ e1000_release_hw_control(adapter); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -rc5: e1000 resume weirdness
On 3/26/07, Ingo Molnar [EMAIL PROTECTED] wrote: hm, on a T60, after suspend/resume, i get an e1000 timeout: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue 0 TDH ec TDT ec next_to_use ec next_to_clean82 buffer_info[next_to_clean] time_stamp fffcc3db next_to_watch82 jiffies fffd5da0 next_to_watch.status 1 it works fine after that reset. The e1000 driver didnt do this before after resume the network was always available immediately. So this appears to be a relatively new regression (post-rc3 or so). high-res timers was disabled. was there a NETDEV WATCHDOG message that follows this? If not it is a harmless debug print. Note the time_stamp and jiffies difference, very large, consistent with a resume. I think we need to disable the internal e1000 tx hang code that causes this debug print when we are suspending. I'll work with auke to generate a short patch. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -rc5: e1000 resume weirdness
Jesse Brandeburg wrote: On 3/26/07, Ingo Molnar [EMAIL PROTECTED] wrote: hm, on a T60, after suspend/resume, i get an e1000 timeout: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue 0 TDH ec TDT ec next_to_use ec next_to_clean82 buffer_info[next_to_clean] time_stamp fffcc3db next_to_watch82 jiffies fffd5da0 next_to_watch.status 1 it works fine after that reset. The e1000 driver didnt do this before after resume the network was always available immediately. So this appears to be a relatively new regression (post-rc3 or so). high-res timers was disabled. was there a NETDEV WATCHDOG message that follows this? If not it is a harmless debug print. Note the time_stamp and jiffies difference, very large, consistent with a resume. I think we need to disable the internal e1000 tx hang code that causes this debug print when we are suspending. I'll work with auke to generate a short patch. hmm, yeah, it appears that the patch I sent just a second ago isn't applicable in this case, since the irq handler is obviously enabled (the Link Up message proves that). thanks to Jesse for being awake :) Auke - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -rc5: e1000 resume weirdness
* Jesse Brandeburg [EMAIL PROTECTED] wrote: was there a NETDEV WATCHDOG message that follows this? If not it is a harmless debug print. Note the time_stamp and jiffies difference, very large, consistent with a resume. I think we need to disable the internal e1000 tx hang code that causes this debug print when we are suspending. I'll work with auke to generate a short patch. there was no NETDEV WATCHDOG message. But still there was a ~30 seconds delay until i got the first few packets through the interface - while normally it's available almost instantly after resume. But ... this condition seems sporadic, i havent seen it on subsequent suspend+resume attempts. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/