[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
** Changed in: linux (Ubuntu) Status: In Progress => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander DuyckDate: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
@stefan-n1, please move discussion over to bug 1723127, no more comments should be added to this bug. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: Fix Released Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander DuyckDate: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
H there. I can confirm this problem still exists in newest kernels and with the latest intel drivers as of today: Jan 19 16:05:19 osd9 kernel: [511271.581413] i40e :02:00.1: TX driver issue detected, PF reset issued Jan 19 16:09:08 osd9 kernel: [511500.919380] i40e :02:00.0: TX driver issue detected, PF reset issued driver: i40e-2.4.3 (and xenial / 4.13 shipped driver: 2.1.14-k) kernel: 4.13.0-25-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 12:16:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux. Kernel loaded with nopti noibrs noibpb (Meltdown / Spetre mitigation disabled). We can trigger the issue with high load (benchmarking Ceph cluster with fio: 4 clients, 8 threads, iodepth 256, 100% random write, 64K block size). Only when we use relatively large block size (64K) do we hit this problem. With 4K blocks we do not hit this issue. We haven't tested large random reads (that test is still to be done). When using openvswitch port-channel (as we do) with jumbo frames ... this port-channel will not come back online after the reset. rmmod i40e / modprobe i40e does the trick though. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: Fix Released Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander DuyckDate: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
> however we're still experiencing problems with the newest kernel well, I was afraid of that. As this problem is the NIC firmware complaining but not actually telling us what it's unhappy with, there's a bit of trial-and-error here figuring out what exactly it's complaining about. Since this bug is already 'fix released', I opened a new bug 1723127 to track continuing work on this, let's move the discussion over there. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: Fix Released Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander DuyckDate: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
Hi, Thanks for your efforts with this issue, however we're still experiencing problems with the newest kernel. Sorry about missing the patch-testing-window, we should have been there for you :) After only 20 minutes of runtime with the new kernel, we saw the following, and networking is basically useless: [2.410644] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 1.4.25-k [2.419791] i40e: Copyright (c) 2013 - 2014 Intel Corporation. [2.483362] i40e :02:00.0: fw 5.40.47690 api 1.5 nvm 5.40 0x80002d35 18.0.16 [2.896678] i40e :02:00.0: MAC address: 3c:fd:fe:1a:b5:e0 [2.903768] i40e :02:00.0: SAN MAC: 3c:fd:fe:1a:b5:e1 [3.189818] i40e :02:00.0: PCI-Express: Speed 8.0GT/s Width x4 [3.193934] i40e :02:00.0: PCI-Express bandwidth available for this device may be insufficient for optimal performance. [3.202198] i40e :02:00.0: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. [3.241095] i40e :02:00.0: Features: PF-id[0] VFs: 64 VSIs: 2 QP: 4 RX: 1BUF RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA [3.279202] i40e :02:00.1: fw 5.40.47690 api 1.5 nvm 5.40 0x80002d35 18.0.16 [3.531346] i40e :02:00.1: MAC address: 3c:fd:fe:1a:b5:e2 [3.539557] i40e :02:00.1: SAN MAC: 3c:fd:fe:1a:b5:e3 [3.761719] i40e :02:00.1: PCI-Express: Speed 8.0GT/s Width x4 [3.765721] i40e :02:00.1: PCI-Express bandwidth available for this device may be insufficient for optimal performance. [3.773539] i40e :02:00.1: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. [3.812022] i40e :02:00.1: Features: PF-id[1] VFs: 64 VSIs: 2 QP: 4 RX: 1BUF RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA [3.855168] i40e :02:00.0 p1p1: renamed from eth2 [3.895278] i40e :02:00.1 p1p2: renamed from eth0 [7.205832] i40e :02:00.1 p1p2: already using mac address 3c:fd:fe:1a:b5:e2 [7.208378] i40e :02:00.1 p1p2: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None [7.208401] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e2 vid=0 [7.208453] i40e :02:00.0 p1p1: set new mac address 3c:fd:fe:1a:b5:e2 [7.217191] i40e :02:00.0 p1p1: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None [7.217215] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e2 vid=0 [7.240919] i40e :02:00.1 p1p2: set new mac address 3c:fd:fe:1a:b5:e0 [7.252720] i40e :02:00.0 p1p1: returning to hw mac address 3c:fd:fe:1a:b5:e0 [7.324791] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5 [7.324798] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e0 vid=5 [ 1109.574733] i40e :02:00.1: TX driver issue detected, PF reset issued [ 1110.011152] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=0 [ 1110.011155] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5 [ 1110.013749] i40e :02:00.1: TX driver issue detected, PF reset issued [ 1110.013773] i40e :02:00.1 p1p2: speed changed to 0 for port p1p2 [ 1110.013954] bond0: link status up again after 0 ms for interface p1p2 [ 1110.983823] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=0 [ 1110.983825] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5 [ 1110.985836] bond0: link status up again after 0 ms for interface p1p2 [ .432231] i40e :02:00.0: TX driver issue detected, PF reset issued [ .981828] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e0 vid=0 [ .981835] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e0 vid=5 [ .984816] i40e :02:00.0: TX driver issue detected, PF reset issued [ .987007] bond0: link status up again after 0 ms for interface p1p1 [ 1112.981796] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e0 vid=0 [ 1112.981803] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e0 vid=5 [ 1112.985812] bond0: link status up again after 0 ms for interface p1p1 [ 1114.204548] i40e :02:00.1: TX driver issue detected, PF reset issued [ 1114.983686] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=0 [ 1114.983688] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5 [ 1114.985692] bond0: link status up again after 0 ms for interface p1p2 [ 1115.752686] i40e :02:00.1: TX driver issue detected, PF reset issued [ 1116.985619] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=0 [ 1116.985624] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5 [ 1116.988361] i40e :02:00.1 p1p2: speed changed to 0 for port p1p2 [ 1116.989607] bond0: link status up again after 0 ms for interface p1p2 # uname -a Linux lb05 4.4.0-97-generic #120-Ubuntu SMP Tue Sep 19 17:28:18 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux # modinfo i40e filename: /lib/modules/4.4.0-97-generic/kernel/drivers/net/ethernet/intel/i40e/i40e.ko version:1.4.25-k As a workaround we're using i40e driver v2.0.30 via dkms, which does works fine without any issues so far, but it would be nice to have this
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
This bug was fixed in the package linux - 4.4.0-97.120 --- linux (4.4.0-97.120) xenial; urgency=low * linux: 4.4.0-97.120 -proposed tracker (LP: #1718149) * blk-mq: possible deadlock on CPU hot(un)plug (LP: #1670634) - [Config] s390x -- disable CONFIG_{DM, SCSI}_MQ_DEFAULT * Xenial update to 4.4.87 stable release (LP: #1715678) - irqchip: mips-gic: SYNC after enabling GIC region - i2c: ismt: Don't duplicate the receive length for block reads - i2c: ismt: Return EMSGSIZE for block reads with bogus length - ceph: fix readpage from fscache - cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs - cpuset: Fix incorrect memory_pressure control file mapping - alpha: uapi: Add support for __SANE_USERSPACE_TYPES__ - CIFS: remove endian related sparse warning - wl1251: add a missing spin_lock_init() - xfrm: policy: check policy direction value - drm/ttm: Fix accounting error when fail to get pages for pool - kvm: arm/arm64: Fix race in resetting stage2 PGD - kvm: arm/arm64: Force reading uncached stage2 PGD - epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove() - crypto: algif_skcipher - only call put_page on referenced and used pages - Linux 4.4.87 * Xenial update to 4.4.86 stable release (LP: #1715430) - scsi: isci: avoid array subscript warning - ALSA: au88x0: Fix zero clear of stream->resources - btrfs: remove duplicate const specifier - i2c: jz4780: drop superfluous init - gcov: add support for gcc version >= 6 - gcov: support GCC 7.1 - lightnvm: initialize ppa_addr in dev_to_generic_addr() - p54: memset(0) whole array - lpfc: Fix Device discovery failures during switch reboot test. - arm64: mm: abort uaccess retries upon fatal signal - x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl - arm64: fpsimd: Prevent registers leaking across exec - scsi: sg: protect accesses to 'reserved' page array - scsi: sg: reset 'res_in_use' after unlinking reserved array - drm/i915: fix compiler warning in drivers/gpu/drm/i915/intel_uncore.c - Linux 4.4.86 * Xenial update to 4.4.85 stable release (LP: #1714298) - af_key: do not use GFP_KERNEL in atomic contexts - dccp: purge write queue in dccp_destroy_sock() - dccp: defer ccid_hc_tx_delete() at dismantle time - ipv4: fix NULL dereference in free_fib_info_rcu() - net_sched/sfq: update hierarchical backlog when drop packet - ipv4: better IP_MAX_MTU enforcement - sctp: fully initialize the IPv6 address in sctp_v6_to_addr() - tipc: fix use-after-free - ipv6: reset fn->rr_ptr when replacing route - ipv6: repair fib6 tree in failure case - tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP - irda: do not leak initialized list.dev to userspace - net: sched: fix NULL pointer dereference when action calls some targets - net_sched: fix order of queue length updates in qdisc_replace() - mei: me: add broxton pci device ids - mei: me: add lewisburg device ids - Input: trackpoint - add new trackpoint firmware ID - Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310 - ALSA: core: Fix unexpected error at replacing user TLV - ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978) - ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses - i2c: designware: Fix system suspend - drm: Release driver tracking before making the object available again - drm/atomic: If the atomic check fails, return its value first - drm: rcar-du: lvds: Fix PLL frequency-related configuration - drm: rcar-du: lvds: Rename PLLEN bit to PLLON - drm: rcar-du: Fix crash in encoder failure error path - drm: rcar-du: Fix display timing controller parameter - drm: rcar-du: Fix H/V sync signal polarity configuration - tracing: Fix freeing of filter in create_filter() when set_str is false - cifs: Fix df output for users with quota limits - cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup() - nfsd: Limit end of page list when decoding NFSv4 WRITE - perf/core: Fix group {cpu,task} validation - Bluetooth: hidp: fix possible might sleep error in hidp_session_thread - Bluetooth: cmtp: fix possible might sleep error in cmtp_session - Bluetooth: bnep: fix possible might sleep error in bnep_session - binder: use group leader instead of open thread - binder: Use wake up hint for synchronous transactions. - ANDROID: binder: fix proc->tsk check. - iio: imu: adis16480: Fix acceleration scale factor for adis16480 - iio: hid-sensor-trigger: Fix the race with user space powering up sensors - staging: rtl8188eu: add RNX-N150NUB support - ASoC: simple-card: don't fail if sysclk setting is not supported - ASoC: rsnd: disable SRC.out only when stop timing - ASoC: rsnd: avoid
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
The original reporter to me verified that with the patch the problem does not reoccur for several days, when previously they could reproduce it within a day; unfortunately as this problem is hard to reproduce that is the best verification possible from me currently. ** Tags removed: verification-needed-xenial ** Tags added: verification-done-xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: Fix Committed Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander DuyckDate: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed- xenial'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: Fix Committed Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander DuyckDate: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
** Changed in: linux (Ubuntu Xenial) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: Fix Committed Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander DuyckDate: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: New Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander DuyckDate: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
Re: my last comment, testing confirmed that commit 5c4654 is *not* needed to fix this bug, so I am not including it. Only commit 841493a3 as listed in the bug description is required to fix this. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: In Progress Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander DuyckDate: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
Note there is one additional upstream commit that improves performance by allowing up to 12k per tx descriptor, instead of 8k per descriptor (the current code in Xenial 4.4 kernel), and its changes are related to the fixes for this issue. However, from my reading of the code, I don't think that commit is actually required to fix this problem, so I am not including it in this bug (yet). commit 5c4654daf2e2f25dfbd7fa572c59937ea6d4198b Author: Alexander DuyckDate: Fri Feb 19 12:17:08 2016 -0800 i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8K -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: In Progress Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander Duyck Date: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection
** Changed in: linux (Ubuntu) Status: Incomplete => In Progress ** Changed in: linux (Ubuntu) Importance: Undecided => Medium ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Dan Streetman (ddstreet) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: In Progress Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander DuyckDate: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp