Re: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter

2015-10-06 Thread Jarod Wilson

Alexander Duyck wrote:

On 09/21/2015 09:14 PM, Jarod Wilson wrote:

Alexander Duyck wrote:

On 09/21/2015 10:11 AM, Jarod Wilson wrote:

Some pci changes upcoming in 4.3 seem to cause additional disconnects,
which can happen at unfortuitous times for igb, leading to issues
such as
this, where the disconnect happened just before
igb_configure_tx_ring():


...

Note: this is a follow-up patch in addition to the previously submitted
"igb: don't unmap NULL hw_addr"

drivers/net/ethernet/intel/igb/igb_main.c | 15 +--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
b/drivers/net/ethernet/intel/igb/igb_main.c
index 6369f9e..7060edf 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -952,6 +952,11 @@ static int igb_request_msix(struct igb_adapter
*adapter)
if (err)
goto err_out;

+ if (E1000_REMOVED(hw->hw_addr)) {
+ err = -EIO;
+ goto err_free;
+ }
+
for (i = 0; i < adapter->num_q_vectors; i++) {
struct igb_q_vector *q_vector = adapter->q_vector[i];




Instead of using E1000_REMOVED we should just replace the
adapter->hw.hw_addr in the setup of the itr_register with
adapter->io_addr like you did for Rx/Tx below.


I just tried that, and it reliably blows up horrifically, wedging the
machine to the point where all I could get was a screen shot with my
phone thus far, when we jump from igb_request_msix() to
igb_configure_msix() to igb_assign_vector() and finally to
igb_write_ivar(), at least as best as I can tell from what I was able to
see in the trace remnants still on-screen.


Take a look at array_rd32, that is buggy and doesn't match up with the
rd32 implementation. If you fix that then the blow-up should go away.

You shouldn't need to worry about itr_register since it will only get
triggered if the interrupt can be enabled and that shouldn't be able to
happen if the device is not present.

...

Just switching to adapter->io_addr everywhere seems to not work as noted
above. :\ Note that I'm also chasing this from the other end with the
author of the pci patches that seem to have triggered this, so the real
bug might be over in pci-land, but hardening against explosions in igb
still seems like a worthwhile effort here.


I am pretty sure array_rd32 is the problem. If you fix that then I
suspect you should quit seeing any further issues.


Okay, I've reworked array_rd32 a bit to call igb_rd32 as well, and with 
that and some other minor tweaks, nothing blowing up. The hardware still 
doesn't work when hotplugged, due to disappearing off the pci bus 
temporarily, but I think that's a pci hotplug issue, not igb's fault. 
Things seem to behave just fine if the device is connected at boot. 
Updated patch forthcoming for dissection...


--
Jarod Wilson
ja...@redhat.com


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter

2015-09-21 Thread Mark Rustad
On 9/21/15 9:14 PM, Jarod Wilson wrote:
> Just switching to adapter->io_addr everywhere seems to not work as 
> noted above. :\ Note that I'm also chasing this from the other end 
> with the author of the pci patches that seem to have triggered this, 
> so the real bug might be over in pci-land, but hardening against 
> explosions in igb still seems like a worthwhile effort here.

My understanding is that there can be problems if too many writes to a
removed device happen. That is why ixgbe avoids doing that by testing
for removal in some places. The io_addr does get used in the transmit
path simply to avoid adding a test to that hot path. That approach
seems to be working well for ixgbe.



signature.asc
Description: OpenPGP digital signature


Re: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter

2015-09-21 Thread Jarod Wilson

Alexander Duyck wrote:

On 09/21/2015 10:11 AM, Jarod Wilson wrote:

Some pci changes upcoming in 4.3 seem to cause additional disconnects,
which can happen at unfortuitous times for igb, leading to issues such as
this, where the disconnect happened just before igb_configure_tx_ring():

[ 414.440115] igb :15:00.0: enabling device ( -> 0002)
[ 414.474934] pps pps0: new PPS source ptp1
[ 414.474937] igb :15:00.0: added PHC on eth0
[ 414.474938] igb :15:00.0: Intel(R) Gigabit Ethernet Network
Connection
[ 414.474940] igb :15:00.0: eth0: (PCIe:2.5Gb/s:Width x1)
e8:ea:6a:00:1b:2a
[ 414.475072] igb :15:00.0: eth0: PBA No: 000200-000
[ 414.475073] igb :15:00.0: Using MSI-X interrupts. 4 rx queue(s),
4 tx queue(s)
[ 414.478453] igb :15:00.0 enp21s0: renamed from eth0
[ 414.497747] IPv6: ADDRCONF(NETDEV_UP): enp21s0: link is not ready
[ 414.536745] igb :15:00.0 enp21s0: PCIe link lost, device now
detached
[ 414.854808] BUG: unable to handle kernel paging request at
3818
[ 414.854827] IP: []
igb_configure_tx_ring+0x14c/0x250 [igb]
[ 414.854846] PGD 0
[ 414.854849] Oops: 0002 [#1] SMP
[ 414.854856] Modules linked in: firewire_ohci firewire_core crc_itu_t
igb dca ctr ccm arc4 iwlmvm mac80211 fuse xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute
bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
iptable_security iptable_raw iptable_filter bnep dm_mirror
dm_region_hash dm_log dm_mod snd_hda_codec_hdmi coretemp
x86_pkg_temp_thermal intel_powerclamp kvm_intel iTCO_wdt ppdev kvm
iTCO_vendor_support hp_wmi sparse_keymap crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel
[ 414.855073] drbg ansi_cprng snd_hda_codec_realtek
snd_hda_codec_generic aesni_intel aes_x86_64 lrw gf128mul glue_helper
ablk_helper cryptd snd_hda_intel snd_hda_codec microcode snd_hda_core
snd_hwdep snd_seq snd_seq_device snd_pcm iwlwifi uvcvideo btusb
cfg80211 videobuf2_vmalloc videobuf2_memops btrtl btbcm videobuf2_core
btintel bluetooth v4l2_common snd_timer videodev snd parport_pc
rtsx_pci_ms joydev pcspkr input_leds i2c_i801 media sg memstick rfkill
soundcore lpc_ich 8250_fintek parport mei_me hp_accel ie31200_edac
shpchp lis3lv02d mei edac_core input_polldev hp_wireless tpm_infineon
sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs
libcrc32c sr_mod sd_mod cdrom rtsx_pci_sdmmc mmc_core crc32c_intel
serio_raw rtsx_pci nouveau mxm_wmi ahci hwmon libahci e1000e
drm_kms_helper
[ 414.855309] ptp xhci_pci pps_core ttm xhci_hcd wmi video ipv6 autofs4
[ 414.855331] CPU: 2 PID: 875 Comm: NetworkManager Not tainted
4.2.0-5.el7_UNSUPPORTED.x86_64 #1
[ 414.855348] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS
M70 Ver. 01.07 02/26/2015
[ 414.855365] task: 880484698c00 ti: 88005859c000 task.ti:
88005859c000
[ 414.855380] RIP: 0010:[] []
igb_configure_tx_ring+0x14c/0x250 [igb]
[ 414.855401] RSP: 0018:88005859f608 EFLAGS: 00010246
[ 414.855410] RAX: 3818 RBX:  RCX:
3818
[ 414.855424] RDX:  RSI: 0008 RDI:
002a9fe6
[ 414.855437] RBP: 88005859f638 R08: 03030300 R09:
ffe7
[ 414.855451] R10: 81fa91b4 R11: 07e3 R12:

[ 414.855464] R13: 880471c98840 R14: 8804670a1180 R15:
000483cce000
[ 414.855478] FS: 7f389c6fb8c0() GS:88049dc8()
knlGS:
[ 414.855493] CS: 0010 DS:  ES:  CR0: 80050033
[ 414.855504] CR2: 3818 CR3: 0004875da000 CR4:
001406e0
[ 414.855518] Stack:
[ 414.855520] 88005859f638 880471c98840 880471c98df8
0001
[ 414.855538] 880471c98848 0001 88005859f698
a0b99cb0
[ 414.85] 88005859f678 59ab02179a7fe4d0 f3ce6b27ad46225f
f5454218094e72d1
[ 414.855572] Call Trace:
[ 414.855577] [] igb_configure+0x240/0x400 [igb]
[ 414.855590] [] __igb_open+0xc2/0x560 [igb]
[ 414.855602] [] ? notifier_call_chain+0x4d/0x80
[ 414.855614] [] igb_open+0x10/0x20 [igb]
[ 414.855625] [] __dev_open+0xb1/0x130
[ 414.855636] [] __dev_change_flags+0xa1/0x160
[ 414.855647] [] dev_change_flags+0x29/0x60
[ 414.855658] [] do_setlink+0x5d3/0xaa0
[ 414.855679] [] ? nla_parse+0xa3/0x100
[ 414.855689] [] rtnl_newlink+0x4f0/0x880
[ 414.855700] [] ? rtnl_newlink+0xf3/0x880
[ 414.855721] [] ? netlink_unicast+0x1ae/0x220
[ 414.855734] [] ? security_capable+0x48/0x60
[ 414.855746] [] ? ns_capable+0x2d/0x60
[ 414.855756] [] rtnetlink_rcv_msg+0x95/0x240
[ 414.855768] [] ? sock_has_perm+0x70/0x90
[ 414.855779] [] ? rtnetlink_rcv+0x40/0x40
[ 414.855789] [] netlink_rcv_skb+0xaf/0xc0
[ 414.855800] [] rtnetlink_rcv+0x2c/0x40
[ 

Re: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter

2015-09-21 Thread Alexander Duyck

On 09/21/2015 10:11 AM, Jarod Wilson wrote:

Some pci changes upcoming in 4.3 seem to cause additional disconnects,
which can happen at unfortuitous times for igb, leading to issues such as
this, where the disconnect happened just before igb_configure_tx_ring():

[  414.440115] igb :15:00.0: enabling device ( -> 0002)
[  414.474934] pps pps0: new PPS source ptp1
[  414.474937] igb :15:00.0: added PHC on eth0
[  414.474938] igb :15:00.0: Intel(R) Gigabit Ethernet Network Connection
[  414.474940] igb :15:00.0: eth0: (PCIe:2.5Gb/s:Width x1) e8:ea:6a:00:1b:2a
[  414.475072] igb :15:00.0: eth0: PBA No: 000200-000
[  414.475073] igb :15:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx 
queue(s)
[  414.478453] igb :15:00.0 enp21s0: renamed from eth0
[  414.497747] IPv6: ADDRCONF(NETDEV_UP): enp21s0: link is not ready
[  414.536745] igb :15:00.0 enp21s0: PCIe link lost, device now detached
[  414.854808] BUG: unable to handle kernel paging request at 3818
[  414.854827] IP: [] igb_configure_tx_ring+0x14c/0x250 [igb]
[  414.854846] PGD 0
[  414.854849] Oops: 0002 [#1] SMP
[  414.854856] Modules linked in: firewire_ohci firewire_core crc_itu_t igb dca 
ctr ccm arc4 iwlmvm mac80211 fuse xt_CHECKSUM ipt_MASQUERADE 
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT 
nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc 
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter bnep 
dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_hdmi coretemp 
x86_pkg_temp_thermal intel_powerclamp kvm_intel iTCO_wdt ppdev kvm 
iTCO_vendor_support hp_wmi sparse_keymap crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel
[  414.855073]  drbg ansi_cprng snd_hda_codec_realtek snd_hda_codec_generic 
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd 
snd_hda_intel snd_hda_codec microcode snd_hda_core snd_hwdep snd_seq 
snd_seq_device snd_pcm iwlwifi uvcvideo btusb cfg80211 videobuf2_vmalloc 
videobuf2_memops btrtl btbcm videobuf2_core btintel bluetooth v4l2_common 
snd_timer videodev snd parport_pc rtsx_pci_ms joydev pcspkr input_leds i2c_i801 
media sg memstick rfkill soundcore lpc_ich 8250_fintek parport mei_me hp_accel 
ie31200_edac shpchp lis3lv02d mei edac_core input_polldev hp_wireless 
tpm_infineon sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables 
xfs libcrc32c sr_mod sd_mod cdrom rtsx_pci_sdmmc mmc_core crc32c_intel 
serio_raw rtsx_pci nouveau mxm_wmi ahci hwmon libahci e1000e drm_kms_helper
[  414.855309]  ptp xhci_pci pps_core ttm xhci_hcd wmi video ipv6 autofs4
[  414.855331] CPU: 2 PID: 875 Comm: NetworkManager Not tainted 
4.2.0-5.el7_UNSUPPORTED.x86_64 #1
[  414.855348] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70 
Ver. 01.07 02/26/2015
[  414.855365] task: 880484698c00 ti: 88005859c000 task.ti: 
88005859c000
[  414.855380] RIP: 0010:[]  [] 
igb_configure_tx_ring+0x14c/0x250 [igb]
[  414.855401] RSP: 0018:88005859f608  EFLAGS: 00010246
[  414.855410] RAX: 3818 RBX:  RCX: 3818
[  414.855424] RDX:  RSI: 0008 RDI: 002a9fe6
[  414.855437] RBP: 88005859f638 R08: 03030300 R09: ffe7
[  414.855451] R10: 81fa91b4 R11: 07e3 R12: 
[  414.855464] R13: 880471c98840 R14: 8804670a1180 R15: 000483cce000
[  414.855478] FS:  7f389c6fb8c0() GS:88049dc8() 
knlGS:
[  414.855493] CS:  0010 DS:  ES:  CR0: 80050033
[  414.855504] CR2: 3818 CR3: 0004875da000 CR4: 001406e0
[  414.855518] Stack:
[  414.855520]  88005859f638 880471c98840 880471c98df8 
0001
[  414.855538]  880471c98848 0001 88005859f698 
a0b99cb0
[  414.85]  88005859f678 59ab02179a7fe4d0 f3ce6b27ad46225f 
f5454218094e72d1
[  414.855572] Call Trace:
[  414.855577]  [] igb_configure+0x240/0x400 [igb]
[  414.855590]  [] __igb_open+0xc2/0x560 [igb]
[  414.855602]  [] ? notifier_call_chain+0x4d/0x80
[  414.855614]  [] igb_open+0x10/0x20 [igb]
[  414.855625]  [] __dev_open+0xb1/0x130
[  414.855636]  [] __dev_change_flags+0xa1/0x160
[  414.855647]  [] dev_change_flags+0x29/0x60
[  414.855658]  [] do_setlink+0x5d3/0xaa0
[  414.855679]  [] ? nla_parse+0xa3/0x100
[  414.855689]  [] rtnl_newlink+0x4f0/0x880
[  414.855700]  [] ? rtnl_newlink+0xf3/0x880
[  414.855721]  [] ? netlink_unicast+0x1ae/0x220
[  414.855734]  [] ? security_capable+0x48/0x60
[  414.855746]  [] ? ns_capable+0x2d/0x60
[  414.855756]  [] rtnetlink_rcv_msg+0x95/0x240
[  414.855768]  [] ? sock_has_perm+0x70/0x90
[  414.855779]  [] ? rtnetlink_rcv+0x40/0x40
[ 

Re: [Intel-wired-lan] [PATCH] igb: add more checks for disconnected adapter

2015-09-21 Thread Alexander Duyck

On 09/21/2015 09:14 PM, Jarod Wilson wrote:

Alexander Duyck wrote:

On 09/21/2015 10:11 AM, Jarod Wilson wrote:

Some pci changes upcoming in 4.3 seem to cause additional disconnects,
which can happen at unfortuitous times for igb, leading to issues
such as
this, where the disconnect happened just before igb_configure_tx_ring():

[ 414.440115] igb :15:00.0: enabling device ( -> 0002)
[ 414.474934] pps pps0: new PPS source ptp1
[ 414.474937] igb :15:00.0: added PHC on eth0
[ 414.474938] igb :15:00.0: Intel(R) Gigabit Ethernet Network
Connection
[ 414.474940] igb :15:00.0: eth0: (PCIe:2.5Gb/s:Width x1)
e8:ea:6a:00:1b:2a
[ 414.475072] igb :15:00.0: eth0: PBA No: 000200-000
[ 414.475073] igb :15:00.0: Using MSI-X interrupts. 4 rx queue(s),
4 tx queue(s)
[ 414.478453] igb :15:00.0 enp21s0: renamed from eth0
[ 414.497747] IPv6: ADDRCONF(NETDEV_UP): enp21s0: link is not ready
[ 414.536745] igb :15:00.0 enp21s0: PCIe link lost, device now
detached
[ 414.854808] BUG: unable to handle kernel paging request at
3818
[ 414.854827] IP: []
igb_configure_tx_ring+0x14c/0x250 [igb]
[ 414.854846] PGD 0
[ 414.854849] Oops: 0002 [#1] SMP
[ 414.854856] Modules linked in: firewire_ohci firewire_core crc_itu_t
igb dca ctr ccm arc4 iwlmvm mac80211 fuse xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute
bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
iptable_security iptable_raw iptable_filter bnep dm_mirror
dm_region_hash dm_log dm_mod snd_hda_codec_hdmi coretemp
x86_pkg_temp_thermal intel_powerclamp kvm_intel iTCO_wdt ppdev kvm
iTCO_vendor_support hp_wmi sparse_keymap crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel
[ 414.855073] drbg ansi_cprng snd_hda_codec_realtek
snd_hda_codec_generic aesni_intel aes_x86_64 lrw gf128mul glue_helper
ablk_helper cryptd snd_hda_intel snd_hda_codec microcode snd_hda_core
snd_hwdep snd_seq snd_seq_device snd_pcm iwlwifi uvcvideo btusb
cfg80211 videobuf2_vmalloc videobuf2_memops btrtl btbcm videobuf2_core
btintel bluetooth v4l2_common snd_timer videodev snd parport_pc
rtsx_pci_ms joydev pcspkr input_leds i2c_i801 media sg memstick rfkill
soundcore lpc_ich 8250_fintek parport mei_me hp_accel ie31200_edac
shpchp lis3lv02d mei edac_core input_polldev hp_wireless tpm_infineon
sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs
libcrc32c sr_mod sd_mod cdrom rtsx_pci_sdmmc mmc_core crc32c_intel
serio_raw rtsx_pci nouveau mxm_wmi ahci hwmon libahci e1000e
drm_kms_helper
[ 414.855309] ptp xhci_pci pps_core ttm xhci_hcd wmi video ipv6 autofs4
[ 414.855331] CPU: 2 PID: 875 Comm: NetworkManager Not tainted
4.2.0-5.el7_UNSUPPORTED.x86_64 #1
[ 414.855348] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS
M70 Ver. 01.07 02/26/2015
[ 414.855365] task: 880484698c00 ti: 88005859c000 task.ti:
88005859c000
[ 414.855380] RIP: 0010:[] []
igb_configure_tx_ring+0x14c/0x250 [igb]
[ 414.855401] RSP: 0018:88005859f608 EFLAGS: 00010246
[ 414.855410] RAX: 3818 RBX:  RCX:
3818
[ 414.855424] RDX:  RSI: 0008 RDI:
002a9fe6
[ 414.855437] RBP: 88005859f638 R08: 03030300 R09:
ffe7
[ 414.855451] R10: 81fa91b4 R11: 07e3 R12:

[ 414.855464] R13: 880471c98840 R14: 8804670a1180 R15:
000483cce000
[ 414.855478] FS: 7f389c6fb8c0() GS:88049dc8()
knlGS:
[ 414.855493] CS: 0010 DS:  ES:  CR0: 80050033
[ 414.855504] CR2: 3818 CR3: 0004875da000 CR4:
001406e0
[ 414.855518] Stack:
[ 414.855520] 88005859f638 880471c98840 880471c98df8
0001
[ 414.855538] 880471c98848 0001 88005859f698
a0b99cb0
[ 414.85] 88005859f678 59ab02179a7fe4d0 f3ce6b27ad46225f
f5454218094e72d1
[ 414.855572] Call Trace:
[ 414.855577] [] igb_configure+0x240/0x400 [igb]
[ 414.855590] [] __igb_open+0xc2/0x560 [igb]
[ 414.855602] [] ? notifier_call_chain+0x4d/0x80
[ 414.855614] [] igb_open+0x10/0x20 [igb]
[ 414.855625] [] __dev_open+0xb1/0x130
[ 414.855636] [] __dev_change_flags+0xa1/0x160
[ 414.855647] [] dev_change_flags+0x29/0x60
[ 414.855658] [] do_setlink+0x5d3/0xaa0
[ 414.855679] [] ? nla_parse+0xa3/0x100
[ 414.855689] [] rtnl_newlink+0x4f0/0x880
[ 414.855700] [] ? rtnl_newlink+0xf3/0x880
[ 414.855721] [] ? netlink_unicast+0x1ae/0x220
[ 414.855734] [] ? security_capable+0x48/0x60
[ 414.855746] [] ? ns_capable+0x2d/0x60
[ 414.855756] [] rtnetlink_rcv_msg+0x95/0x240
[ 414.855768] [] ? sock_has_perm+0x70/0x90
[ 414.855779] [] ? rtnetlink_rcv+0x40/0x40
[ 414.855789] []