[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2018-05-25 Thread Dan Streetman
** Changed in: linux (Ubuntu)
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713553

Title:
  Intel i40e PF reset due to incorrect MDD detection

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released

Bug description:
  [Impact]

  Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued

   This causes a full reset of the PF, which causes an interruption
  in traffic flow.

  This was partially fixed by Xenial commit
  12f8cc59d5886b86372f45290166deca57a60d7a, however there is one
  additional upstream commit required to fully fix the issue:

  commit 841493a3f64395b60554afbcaa17f4350f90e764
  Author: Alexander Duyck 
  Date:   Tue Sep 6 18:05:04 2016 -0700

  i40e: Limit TX descriptor count in cases where frag size is
  greater than 16K

   This fix was never backported into the Xenial 4.4 kernel series, but
  is already present in the Xenial HWE (and Zesty) 4.10 kernel.

  [Testcase]

   In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled. Under heavy load, the card will
  reset itself as described.

  [Regression Potential]

  As with any change to a network card driver, this may cause
  regressions with network I/O through i40e card(s).  However, this
  specific change only increases the likelyhood that any specific large
  TSO tx will need to be linearized, which will avoid the PF reset.
  Linearizing a TSO tx that did not need to be linearized will not cause
  any failures, it may only decrease performance slightly.  However this
  patch should only cause linearization when required to avoid the MDD
  detection and PF reset.

  [Other Info]

  The previous bug for this issue is bug 1700834.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2018-01-19 Thread Dan Streetman
@stefan-n1, please move discussion over to bug 1723127, no more comments
should be added to this bug.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713553

Title:
  Intel i40e PF reset due to incorrect MDD detection

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Released

Bug description:
  [Impact]

  Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued

   This causes a full reset of the PF, which causes an interruption
  in traffic flow.

  This was partially fixed by Xenial commit
  12f8cc59d5886b86372f45290166deca57a60d7a, however there is one
  additional upstream commit required to fully fix the issue:

  commit 841493a3f64395b60554afbcaa17f4350f90e764
  Author: Alexander Duyck 
  Date:   Tue Sep 6 18:05:04 2016 -0700

  i40e: Limit TX descriptor count in cases where frag size is
  greater than 16K

   This fix was never backported into the Xenial 4.4 kernel series, but
  is already present in the Xenial HWE (and Zesty) 4.10 kernel.

  [Testcase]

   In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled. Under heavy load, the card will
  reset itself as described.

  [Regression Potential]

  As with any change to a network card driver, this may cause
  regressions with network I/O through i40e card(s).  However, this
  specific change only increases the likelyhood that any specific large
  TSO tx will need to be linearized, which will avoid the PF reset.
  Linearizing a TSO tx that did not need to be linearized will not cause
  any failures, it may only decrease performance slightly.  However this
  patch should only cause linearization when required to avoid the MDD
  detection and PF reset.

  [Other Info]

  The previous bug for this issue is bug 1700834.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2018-01-19 Thread Stefan Kooman
H there. I can confirm this problem still exists in newest kernels and
with the latest intel drivers as of today:

Jan 19 16:05:19 osd9 kernel: [511271.581413] i40e :02:00.1: TX driver issue 
detected, PF reset issued
Jan 19 16:09:08 osd9 kernel: [511500.919380] i40e :02:00.0: TX driver issue 
detected, PF reset issued


driver: i40e-2.4.3 (and xenial / 4.13 shipped driver: 2.1.14-k)
kernel: 4.13.0-25-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 12:16:39 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux. Kernel loaded with nopti noibrs noibpb 
(Meltdown / Spetre mitigation disabled). 

We can trigger the issue with high load (benchmarking Ceph cluster with
fio: 4 clients, 8 threads, iodepth 256, 100% random write, 64K block
size).

Only when we use relatively large block size (64K) do we hit this
problem. With 4K blocks we do not hit this issue. We haven't tested
large random reads (that test is still to be done).

When using openvswitch port-channel (as we do) with jumbo frames ...
this port-channel will not come back online after the reset. rmmod i40e
/ modprobe i40e does the trick though.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713553

Title:
  Intel i40e PF reset due to incorrect MDD detection

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Released

Bug description:
  [Impact]

  Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued

   This causes a full reset of the PF, which causes an interruption
  in traffic flow.

  This was partially fixed by Xenial commit
  12f8cc59d5886b86372f45290166deca57a60d7a, however there is one
  additional upstream commit required to fully fix the issue:

  commit 841493a3f64395b60554afbcaa17f4350f90e764
  Author: Alexander Duyck 
  Date:   Tue Sep 6 18:05:04 2016 -0700

  i40e: Limit TX descriptor count in cases where frag size is
  greater than 16K

   This fix was never backported into the Xenial 4.4 kernel series, but
  is already present in the Xenial HWE (and Zesty) 4.10 kernel.

  [Testcase]

   In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled. Under heavy load, the card will
  reset itself as described.

  [Regression Potential]

  As with any change to a network card driver, this may cause
  regressions with network I/O through i40e card(s).  However, this
  specific change only increases the likelyhood that any specific large
  TSO tx will need to be linearized, which will avoid the PF reset.
  Linearizing a TSO tx that did not need to be linearized will not cause
  any failures, it may only decrease performance slightly.  However this
  patch should only cause linearization when required to avoid the MDD
  detection and PF reset.

  [Other Info]

  The previous bug for this issue is bug 1700834.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2017-10-12 Thread Dan Streetman
> however we're still experiencing problems with the newest kernel

well, I was afraid of that.  As this problem is the NIC firmware
complaining but not actually telling us what it's unhappy with, there's
a bit of trial-and-error here figuring out what exactly it's complaining
about.

Since this bug is already 'fix released', I opened a new bug 1723127 to
track continuing work on this, let's move the discussion over there.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713553

Title:
  Intel i40e PF reset due to incorrect MDD detection

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Released

Bug description:
  [Impact]

  Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued

   This causes a full reset of the PF, which causes an interruption
  in traffic flow.

  This was partially fixed by Xenial commit
  12f8cc59d5886b86372f45290166deca57a60d7a, however there is one
  additional upstream commit required to fully fix the issue:

  commit 841493a3f64395b60554afbcaa17f4350f90e764
  Author: Alexander Duyck 
  Date:   Tue Sep 6 18:05:04 2016 -0700

  i40e: Limit TX descriptor count in cases where frag size is
  greater than 16K

   This fix was never backported into the Xenial 4.4 kernel series, but
  is already present in the Xenial HWE (and Zesty) 4.10 kernel.

  [Testcase]

   In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled. Under heavy load, the card will
  reset itself as described.

  [Regression Potential]

  As with any change to a network card driver, this may cause
  regressions with network I/O through i40e card(s).  However, this
  specific change only increases the likelyhood that any specific large
  TSO tx will need to be linearized, which will avoid the PF reset.
  Linearizing a TSO tx that did not need to be linearized will not cause
  any failures, it may only decrease performance slightly.  However this
  patch should only cause linearization when required to avoid the MDD
  detection and PF reset.

  [Other Info]

  The previous bug for this issue is bug 1700834.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2017-10-11 Thread Björn Zettergren
Hi,

Thanks for your efforts with this issue, however we're still
experiencing problems with the newest kernel. Sorry about missing the
patch-testing-window, we should have been there for you :)

After only 20 minutes of runtime with the new kernel, we saw the
following, and networking is basically useless:

[2.410644] i40e: Intel(R) Ethernet Connection XL710 Network Driver - 
version 1.4.25-k
[2.419791] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
[2.483362] i40e :02:00.0: fw 5.40.47690 api 1.5 nvm 5.40 0x80002d35 
18.0.16
[2.896678] i40e :02:00.0: MAC address: 3c:fd:fe:1a:b5:e0
[2.903768] i40e :02:00.0: SAN MAC: 3c:fd:fe:1a:b5:e1
[3.189818] i40e :02:00.0: PCI-Express: Speed 8.0GT/s Width x4
[3.193934] i40e :02:00.0: PCI-Express bandwidth available for this 
device may be insufficient for optimal performance.
[3.202198] i40e :02:00.0: Please move the device to a different PCI-e 
link with more lanes and/or higher transfer rate.
[3.241095] i40e :02:00.0: Features: PF-id[0] VFs: 64 VSIs: 2 QP: 4 RX: 
1BUF RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[3.279202] i40e :02:00.1: fw 5.40.47690 api 1.5 nvm 5.40 0x80002d35 
18.0.16
[3.531346] i40e :02:00.1: MAC address: 3c:fd:fe:1a:b5:e2
[3.539557] i40e :02:00.1: SAN MAC: 3c:fd:fe:1a:b5:e3
[3.761719] i40e :02:00.1: PCI-Express: Speed 8.0GT/s Width x4
[3.765721] i40e :02:00.1: PCI-Express bandwidth available for this 
device may be insufficient for optimal performance.
[3.773539] i40e :02:00.1: Please move the device to a different PCI-e 
link with more lanes and/or higher transfer rate.
[3.812022] i40e :02:00.1: Features: PF-id[1] VFs: 64 VSIs: 2 QP: 4 RX: 
1BUF RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[3.855168] i40e :02:00.0 p1p1: renamed from eth2
[3.895278] i40e :02:00.1 p1p2: renamed from eth0
[7.205832] i40e :02:00.1 p1p2: already using mac address 
3c:fd:fe:1a:b5:e2
[7.208378] i40e :02:00.1 p1p2: NIC Link is Up 10 Gbps Full Duplex, Flow 
Control: None
[7.208401] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e2 vid=0
[7.208453] i40e :02:00.0 p1p1: set new mac address 3c:fd:fe:1a:b5:e2
[7.217191] i40e :02:00.0 p1p1: NIC Link is Up 10 Gbps Full Duplex, Flow 
Control: None
[7.217215] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e2 vid=0
[7.240919] i40e :02:00.1 p1p2: set new mac address 3c:fd:fe:1a:b5:e0
[7.252720] i40e :02:00.0 p1p1: returning to hw mac address 
3c:fd:fe:1a:b5:e0
[7.324791] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5
[7.324798] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e0 vid=5
[ 1109.574733] i40e :02:00.1: TX driver issue detected, PF reset issued
[ 1110.011152] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=0
[ 1110.011155] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5
[ 1110.013749] i40e :02:00.1: TX driver issue detected, PF reset issued
[ 1110.013773] i40e :02:00.1 p1p2: speed changed to 0 for port p1p2
[ 1110.013954] bond0: link status up again after 0 ms for interface p1p2
[ 1110.983823] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=0
[ 1110.983825] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5
[ 1110.985836] bond0: link status up again after 0 ms for interface p1p2
[ .432231] i40e :02:00.0: TX driver issue detected, PF reset issued
[ .981828] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e0 vid=0
[ .981835] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e0 vid=5
[ .984816] i40e :02:00.0: TX driver issue detected, PF reset issued
[ .987007] bond0: link status up again after 0 ms for interface p1p1
[ 1112.981796] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e0 vid=0
[ 1112.981803] i40e :02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e0 vid=5
[ 1112.985812] bond0: link status up again after 0 ms for interface p1p1
[ 1114.204548] i40e :02:00.1: TX driver issue detected, PF reset issued
[ 1114.983686] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=0
[ 1114.983688] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5
[ 1114.985692] bond0: link status up again after 0 ms for interface p1p2
[ 1115.752686] i40e :02:00.1: TX driver issue detected, PF reset issued
[ 1116.985619] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=0
[ 1116.985624] i40e :02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5
[ 1116.988361] i40e :02:00.1 p1p2: speed changed to 0 for port p1p2
[ 1116.989607] bond0: link status up again after 0 ms for interface p1p2

# uname -a
Linux lb05 4.4.0-97-generic #120-Ubuntu SMP Tue Sep 19 17:28:18 UTC 2017 x86_64 
x86_64 x86_64 GNU/Linux

# modinfo i40e
filename:   
/lib/modules/4.4.0-97-generic/kernel/drivers/net/ethernet/intel/i40e/i40e.ko
version:1.4.25-k

As a workaround we're using i40e driver v2.0.30 via dkms, which does
works fine without any issues so far, but it would be nice to have this

[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2017-10-10 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.4.0-97.120

---
linux (4.4.0-97.120) xenial; urgency=low

  * linux: 4.4.0-97.120 -proposed tracker (LP: #1718149)

  * blk-mq: possible deadlock on CPU hot(un)plug (LP: #1670634)
- [Config] s390x -- disable CONFIG_{DM, SCSI}_MQ_DEFAULT

  * Xenial update to 4.4.87 stable release (LP: #1715678)
- irqchip: mips-gic: SYNC after enabling GIC region
- i2c: ismt: Don't duplicate the receive length for block reads
- i2c: ismt: Return EMSGSIZE for block reads with bogus length
- ceph: fix readpage from fscache
- cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs
- cpuset: Fix incorrect memory_pressure control file mapping
- alpha: uapi: Add support for __SANE_USERSPACE_TYPES__
- CIFS: remove endian related sparse warning
- wl1251: add a missing spin_lock_init()
- xfrm: policy: check policy direction value
- drm/ttm: Fix accounting error when fail to get pages for pool
- kvm: arm/arm64: Fix race in resetting stage2 PGD
- kvm: arm/arm64: Force reading uncached stage2 PGD
- epoll: fix race between ep_poll_callback(POLLFREE) and 
ep_free()/ep_remove()
- crypto: algif_skcipher - only call put_page on referenced and used pages
- Linux 4.4.87

  * Xenial update to 4.4.86 stable release (LP: #1715430)
- scsi: isci: avoid array subscript warning
- ALSA: au88x0: Fix zero clear of stream->resources
- btrfs: remove duplicate const specifier
- i2c: jz4780: drop superfluous init
- gcov: add support for gcc version >= 6
- gcov: support GCC 7.1
- lightnvm: initialize ppa_addr in dev_to_generic_addr()
- p54: memset(0) whole array
- lpfc: Fix Device discovery failures during switch reboot test.
- arm64: mm: abort uaccess retries upon fatal signal
- x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl
- arm64: fpsimd: Prevent registers leaking across exec
- scsi: sg: protect accesses to 'reserved' page array
- scsi: sg: reset 'res_in_use' after unlinking reserved array
- drm/i915: fix compiler warning in drivers/gpu/drm/i915/intel_uncore.c
- Linux 4.4.86

  * Xenial update to 4.4.85 stable release (LP: #1714298)
- af_key: do not use GFP_KERNEL in atomic contexts
- dccp: purge write queue in dccp_destroy_sock()
- dccp: defer ccid_hc_tx_delete() at dismantle time
- ipv4: fix NULL dereference in free_fib_info_rcu()
- net_sched/sfq: update hierarchical backlog when drop packet
- ipv4: better IP_MAX_MTU enforcement
- sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
- tipc: fix use-after-free
- ipv6: reset fn->rr_ptr when replacing route
- ipv6: repair fib6 tree in failure case
- tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
- irda: do not leak initialized list.dev to userspace
- net: sched: fix NULL pointer dereference when action calls some targets
- net_sched: fix order of queue length updates in qdisc_replace()
- mei: me: add broxton pci device ids
- mei: me: add lewisburg device ids
- Input: trackpoint - add new trackpoint firmware ID
- Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310
- ALSA: core: Fix unexpected error at replacing user TLV
- ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978)
- ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
- i2c: designware: Fix system suspend
- drm: Release driver tracking before making the object available again
- drm/atomic: If the atomic check fails, return its value first
- drm: rcar-du: lvds: Fix PLL frequency-related configuration
- drm: rcar-du: lvds: Rename PLLEN bit to PLLON
- drm: rcar-du: Fix crash in encoder failure error path
- drm: rcar-du: Fix display timing controller parameter
- drm: rcar-du: Fix H/V sync signal polarity configuration
- tracing: Fix freeing of filter in create_filter() when set_str is false
- cifs: Fix df output for users with quota limits
- cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()
- nfsd: Limit end of page list when decoding NFSv4 WRITE
- perf/core: Fix group {cpu,task} validation
- Bluetooth: hidp: fix possible might sleep error in hidp_session_thread
- Bluetooth: cmtp: fix possible might sleep error in cmtp_session
- Bluetooth: bnep: fix possible might sleep error in bnep_session
- binder: use group leader instead of open thread
- binder: Use wake up hint for synchronous transactions.
- ANDROID: binder: fix proc->tsk check.
- iio: imu: adis16480: Fix acceleration scale factor for adis16480
- iio: hid-sensor-trigger: Fix the race with user space powering up sensors
- staging: rtl8188eu: add RNX-N150NUB support
- ASoC: simple-card: don't fail if sysclk setting is not supported
- ASoC: rsnd: disable SRC.out only when stop timing
- ASoC: rsnd: avoid 

[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2017-09-26 Thread Dan Streetman
The original reporter to me verified that with the patch the problem
does not reoccur for several days, when previously they could reproduce
it within a day; unfortunately as this problem is hard to reproduce that
is the best verification possible from me currently.

** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713553

Title:
  Intel i40e PF reset due to incorrect MDD detection

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  [Impact]

  Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued

   This causes a full reset of the PF, which causes an interruption
  in traffic flow.

  This was partially fixed by Xenial commit
  12f8cc59d5886b86372f45290166deca57a60d7a, however there is one
  additional upstream commit required to fully fix the issue:

  commit 841493a3f64395b60554afbcaa17f4350f90e764
  Author: Alexander Duyck 
  Date:   Tue Sep 6 18:05:04 2016 -0700

  i40e: Limit TX descriptor count in cases where frag size is
  greater than 16K

   This fix was never backported into the Xenial 4.4 kernel series, but
  is already present in the Xenial HWE (and Zesty) 4.10 kernel.

  [Testcase]

   In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled. Under heavy load, the card will
  reset itself as described.

  [Regression Potential]

  As with any change to a network card driver, this may cause
  regressions with network I/O through i40e card(s).  However, this
  specific change only increases the likelyhood that any specific large
  TSO tx will need to be linearized, which will avoid the PF reset.
  Linearizing a TSO tx that did not need to be linearized will not cause
  any failures, it may only decrease performance slightly.  However this
  patch should only cause linearization when required to avoid the MDD
  detection and PF reset.

  [Other Info]

  The previous bug for this issue is bug 1700834.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2017-09-25 Thread Kleber Sacilotto de Souza
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'. If the problem still exists,
change the tag 'verification-needed-xenial' to 'verification-failed-
xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713553

Title:
  Intel i40e PF reset due to incorrect MDD detection

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  [Impact]

  Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued

   This causes a full reset of the PF, which causes an interruption
  in traffic flow.

  This was partially fixed by Xenial commit
  12f8cc59d5886b86372f45290166deca57a60d7a, however there is one
  additional upstream commit required to fully fix the issue:

  commit 841493a3f64395b60554afbcaa17f4350f90e764
  Author: Alexander Duyck 
  Date:   Tue Sep 6 18:05:04 2016 -0700

  i40e: Limit TX descriptor count in cases where frag size is
  greater than 16K

   This fix was never backported into the Xenial 4.4 kernel series, but
  is already present in the Xenial HWE (and Zesty) 4.10 kernel.

  [Testcase]

   In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled. Under heavy load, the card will
  reset itself as described.

  [Regression Potential]

  As with any change to a network card driver, this may cause
  regressions with network I/O through i40e card(s).  However, this
  specific change only increases the likelyhood that any specific large
  TSO tx will need to be linearized, which will avoid the PF reset.
  Linearizing a TSO tx that did not need to be linearized will not cause
  any failures, it may only decrease performance slightly.  However this
  patch should only cause linearization when required to avoid the MDD
  detection and PF reset.

  [Other Info]

  The previous bug for this issue is bug 1700834.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2017-09-15 Thread Stefan Bader
** Changed in: linux (Ubuntu Xenial)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713553

Title:
  Intel i40e PF reset due to incorrect MDD detection

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  [Impact]

  Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued

   This causes a full reset of the PF, which causes an interruption
  in traffic flow.

  This was partially fixed by Xenial commit
  12f8cc59d5886b86372f45290166deca57a60d7a, however there is one
  additional upstream commit required to fully fix the issue:

  commit 841493a3f64395b60554afbcaa17f4350f90e764
  Author: Alexander Duyck 
  Date:   Tue Sep 6 18:05:04 2016 -0700

  i40e: Limit TX descriptor count in cases where frag size is
  greater than 16K

   This fix was never backported into the Xenial 4.4 kernel series, but
  is already present in the Xenial HWE (and Zesty) 4.10 kernel.

  [Testcase]

   In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled. Under heavy load, the card will
  reset itself as described.

  [Regression Potential]

  As with any change to a network card driver, this may cause
  regressions with network I/O through i40e card(s).  However, this
  specific change only increases the likelyhood that any specific large
  TSO tx will need to be linearized, which will avoid the PF reset.
  Linearizing a TSO tx that did not need to be linearized will not cause
  any failures, it may only decrease performance slightly.  However this
  patch should only cause linearization when required to avoid the MDD
  detection and PF reset.

  [Other Info]

  The previous bug for this issue is bug 1700834.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2017-09-15 Thread Stefan Bader
** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713553

Title:
  Intel i40e PF reset due to incorrect MDD detection

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  New

Bug description:
  [Impact]

  Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued

   This causes a full reset of the PF, which causes an interruption
  in traffic flow.

  This was partially fixed by Xenial commit
  12f8cc59d5886b86372f45290166deca57a60d7a, however there is one
  additional upstream commit required to fully fix the issue:

  commit 841493a3f64395b60554afbcaa17f4350f90e764
  Author: Alexander Duyck 
  Date:   Tue Sep 6 18:05:04 2016 -0700

  i40e: Limit TX descriptor count in cases where frag size is
  greater than 16K

   This fix was never backported into the Xenial 4.4 kernel series, but
  is already present in the Xenial HWE (and Zesty) 4.10 kernel.

  [Testcase]

   In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled. Under heavy load, the card will
  reset itself as described.

  [Regression Potential]

  As with any change to a network card driver, this may cause
  regressions with network I/O through i40e card(s).  However, this
  specific change only increases the likelyhood that any specific large
  TSO tx will need to be linearized, which will avoid the PF reset.
  Linearizing a TSO tx that did not need to be linearized will not cause
  any failures, it may only decrease performance slightly.  However this
  patch should only cause linearization when required to avoid the MDD
  detection and PF reset.

  [Other Info]

  The previous bug for this issue is bug 1700834.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2017-09-13 Thread Dan Streetman
Re: my last comment, testing confirmed that commit 5c4654 is *not*
needed to fix this bug, so I am not including it. Only commit 841493a3
as listed in the bug description is required to fix this.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713553

Title:
  Intel i40e PF reset due to incorrect MDD detection

Status in linux package in Ubuntu:
  In Progress

Bug description:
  [Impact]

  Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued

   This causes a full reset of the PF, which causes an interruption
  in traffic flow.

  This was partially fixed by Xenial commit
  12f8cc59d5886b86372f45290166deca57a60d7a, however there is one
  additional upstream commit required to fully fix the issue:

  commit 841493a3f64395b60554afbcaa17f4350f90e764
  Author: Alexander Duyck 
  Date:   Tue Sep 6 18:05:04 2016 -0700

  i40e: Limit TX descriptor count in cases where frag size is
  greater than 16K

   This fix was never backported into the Xenial 4.4 kernel series, but
  is already present in the Xenial HWE (and Zesty) 4.10 kernel.

  [Testcase]

   In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled. Under heavy load, the card will
  reset itself as described.

  [Regression Potential]

  As with any change to a network card driver, this may cause
  regressions with network I/O through i40e card(s).  However, this
  specific change only increases the likelyhood that any specific large
  TSO tx will need to be linearized, which will avoid the PF reset.
  Linearizing a TSO tx that did not need to be linearized will not cause
  any failures, it may only decrease performance slightly.  However this
  patch should only cause linearization when required to avoid the MDD
  detection and PF reset.

  [Other Info]

  The previous bug for this issue is bug 1700834.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2017-08-28 Thread Dan Streetman
Note there is one additional upstream commit that improves performance
by allowing up to 12k per tx descriptor, instead of 8k per descriptor
(the current code in Xenial 4.4 kernel), and its changes are related to
the fixes for this issue.  However, from my reading of the code, I don't
think that commit is actually required to fix this problem, so I am not
including it in this bug (yet).

commit 5c4654daf2e2f25dfbd7fa572c59937ea6d4198b
Author: Alexander Duyck 
Date:   Fri Feb 19 12:17:08 2016 -0800

i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead
of 8K

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713553

Title:
  Intel i40e PF reset due to incorrect MDD detection

Status in linux package in Ubuntu:
  In Progress

Bug description:
  [Impact]

  Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued

   This causes a full reset of the PF, which causes an interruption
  in traffic flow.

  This was partially fixed by Xenial commit
  12f8cc59d5886b86372f45290166deca57a60d7a, however there is one
  additional upstream commit required to fully fix the issue:

  commit 841493a3f64395b60554afbcaa17f4350f90e764
  Author: Alexander Duyck 
  Date:   Tue Sep 6 18:05:04 2016 -0700

  i40e: Limit TX descriptor count in cases where frag size is
  greater than 16K

   This fix was never backported into the Xenial 4.4 kernel series, but
  is already present in the Xenial HWE (and Zesty) 4.10 kernel.

  [Testcase]

   In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled. Under heavy load, the card will
  reset itself as described.

  [Regression Potential]

  As with any change to a network card driver, this may cause
  regressions with network I/O through i40e card(s).  However, this
  specific change only increases the likelyhood that any specific large
  TSO tx will need to be linearized, which will avoid the PF reset.
  Linearizing a TSO tx that did not need to be linearized will not cause
  any failures, it may only decrease performance slightly.  However this
  patch should only cause linearization when required to avoid the MDD
  detection and PF reset.

  [Other Info]

  The previous bug for this issue is bug 1700834.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1713553] Re: Intel i40e PF reset due to incorrect MDD detection

2017-08-28 Thread Dan Streetman
** Changed in: linux (Ubuntu)
   Status: Incomplete => In Progress

** Changed in: linux (Ubuntu)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Dan Streetman (ddstreet)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713553

Title:
  Intel i40e PF reset due to incorrect MDD detection

Status in linux package in Ubuntu:
  In Progress

Bug description:
  [Impact]

  Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued

   This causes a full reset of the PF, which causes an interruption
  in traffic flow.

  This was partially fixed by Xenial commit
  12f8cc59d5886b86372f45290166deca57a60d7a, however there is one
  additional upstream commit required to fully fix the issue:

  commit 841493a3f64395b60554afbcaa17f4350f90e764
  Author: Alexander Duyck 
  Date:   Tue Sep 6 18:05:04 2016 -0700

  i40e: Limit TX descriptor count in cases where frag size is
  greater than 16K

   This fix was never backported into the Xenial 4.4 kernel series, but
  is already present in the Xenial HWE (and Zesty) 4.10 kernel.

  [Testcase]

   In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled. Under heavy load, the card will
  reset itself as described.

  [Regression Potential]

  As with any change to a network card driver, this may cause
  regressions with network I/O through i40e card(s).  However, this
  specific change only increases the likelyhood that any specific large
  TSO tx will need to be linearized, which will avoid the PF reset.
  Linearizing a TSO tx that did not need to be linearized will not cause
  any failures, it may only decrease performance slightly.  However this
  patch should only cause linearization when required to avoid the MDD
  detection and PF reset.

  [Other Info]

  The previous bug for this issue is bug 1700834.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp