[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-05-22 Thread Dan Streetman
> Hi, we still encounter this error in the latest 4.4.0 kernel.

yes, unfortunately, the last patch seems to have helped reduce the
frequency, but i did recently get another report of it happening again.
So it seems to not be completely fixed.

To clarify, as I said before, this is an event generated by the i40e nic
firmware, and is entirely undocumented, and the firmware event provides
no (useful/documented) information about what exactly happened that it
didn't like.  So, there is literally nothing that I, or any non-Intel
person, can do to fix this.  The only possible way this can be fixed is
to let Intel know (which I have done) and hope they can either point me
to another upstream patch that we have not yet backported, or in the
case that it's still not fixed upstream (which is possible), provide a
new upstream patch to fix it.  Or, new firmware, of course.

At this point, please don't add any more comments to this bug, since an
upstream commit was backported and released for this bug (Intel pointed
me to the upstrema commit).

I have opened a new bug to continue this, bug 1772675.  Please add new
comments to that bug.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Artful:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [impact]

  The i40e driver sometimes causes a "malicious device" event that the
  firmware detects, which causes the firmware to reset the nic, causing
  an interruption in the network connection - which can cause further
  problems, e.g. if the interface is in a bond; the reset will at least
  cause a temporary interruption in network traffic.

  [fix]

  The upstream patch to fix this adjusts how the driver fragments TX
  data; the "malicious driver" detected by the firmware is a result of
  incorrectly crafted TX fragment descriptors (the firmware has specific
  complicated restrictions on this).  The patch is from Intel, and they
  suggested this specific patch to address the problem; additionally I
  have checked with someone who reported this to me and provided a test
  kernel with the patch to them, and they have been able to run ~6 weeks
  so far without reproducing the issue; previously they could reproduce
  it as quickly as a day, but usually within 2-3 weeks.

  [test case]

  the bug is unfortunately very difficult to reproduce, but as shown in
  this (and previous) bug comments, some users of the i40e have traffic
  that can consistently reproduce the problem (although usually on the
  order of days, or longer, to reproduce).  Reproducing is easily
  detected, as the nw traffic will be interrupted and the system logs
  will contain a message like:

  i40e :02:00.1: TX driver issue detected, PF reset issued

  [regression potential]

  the patch for this alters how tx is fragmented by the driver, so a
  possible regression would likely cause problems in TX traffic and/or
  additional "malicious device detection" events.


  [original description]

  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-05-21 Thread haosdent
Hi, we still encounter this error in the latest 4.4.0 kernel. Our kernel
version is

```
$ uname -r
4.4.0-122

$ dpkg -l|grep linux-image-4.4.0-122-generic
ii  linux-image-4.4.0-122-generic   4.4.0-122.146   
   amd64Linux kernel image for version 4.4.0 on 64 bit x86 SMP
```

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Artful:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [impact]

  The i40e driver sometimes causes a "malicious device" event that the
  firmware detects, which causes the firmware to reset the nic, causing
  an interruption in the network connection - which can cause further
  problems, e.g. if the interface is in a bond; the reset will at least
  cause a temporary interruption in network traffic.

  [fix]

  The upstream patch to fix this adjusts how the driver fragments TX
  data; the "malicious driver" detected by the firmware is a result of
  incorrectly crafted TX fragment descriptors (the firmware has specific
  complicated restrictions on this).  The patch is from Intel, and they
  suggested this specific patch to address the problem; additionally I
  have checked with someone who reported this to me and provided a test
  kernel with the patch to them, and they have been able to run ~6 weeks
  so far without reproducing the issue; previously they could reproduce
  it as quickly as a day, but usually within 2-3 weeks.

  [test case]

  the bug is unfortunately very difficult to reproduce, but as shown in
  this (and previous) bug comments, some users of the i40e have traffic
  that can consistently reproduce the problem (although usually on the
  order of days, or longer, to reproduce).  Reproducing is easily
  detected, as the nw traffic will be interrupted and the system logs
  will contain a message like:

  i40e :02:00.1: TX driver issue detected, PF reset issued

  [regression potential]

  the patch for this alters how tx is fragmented by the driver, so a
  possible regression would likely cause problems in TX traffic and/or
  additional "malicious device detection" events.


  [original description]

  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-04-23 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.13.0-39.44

---
linux (4.13.0-39.44) artful; urgency=medium

  * linux: 4.13.0-39.44 -proposed tracker (LP: #1761456)

  * intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-
image-4.13.0-37-generic) (LP: #1759920) // CVE-2017-5715 (Spectre v2
Intel) // CVE-2017-5754
- x86/mm: Reinitialize TLB state on hotplug and resume

  * intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-
image-4.13.0-37-generic) (LP: #1759920) // CVE-2017-5715 (Spectre v2 Intel)
- Revert "x86/mm: Only set IBPB when the new thread cannot ptrace current
  thread"
- x86/speculation: Use Indirect Branch Prediction Barrier in context switch

  * DKMS driver builds fail with: Cannot use CONFIG_STACK_VALIDATION=y, please
install libelf-dev, libelf-devel or elfutils-libelf-devel (LP: #1760876)
- [Packaging] include the retpoline extractor in the headers

  * retpoline hints: primary infrastructure and initial hints (LP: #1758856)
- [Packaging] retpoline-extract: flag *0xNNN(%reg) branches
- x86/speculation, objtool: Annotate indirect calls/jumps for objtool
- x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 
32bit
- x86/paravirt, objtool: Annotate indirect calls
- [Packaging] retpoline -- add safe usage hint support
- [Packaging] retpoline-check -- only report additions
- [Packaging] retpoline -- widen indirect call/jmp detection
- [Packaging] retpoline -- elide %rip relative indirections
- [Packaging] retpoline -- clear hint information from packages
- KVM: x86: Make indirect calls in emulator speculation safe
- KVM: VMX: Make indirect call speculation safe
- x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
- SAUCE: early/late -- annotate indirect calls in early/late initialisation
  code
- SAUCE: vga_set_mode -- avoid jump tables
- [Config] retpoline -- switch to new format
- [Packaging] retpoline hints -- handle missing files when RETPOLINE not
  enabled
- [Packaging] final-checks -- remove check for empty retpoline files

  * retpoline: ignore %cs:0xNNN constant indirections (LP: #1752655)
- [Packaging] retpoline -- elide %cs:0x constants on i386

  * zfs system process hung on container stop/delete (LP: #1754584)
- SAUCE: Fix non-prefaulted page deadlock (LP: #1754584)

  * zfs-linux 0.6.5.11-1ubuntu5 ADT test failure with linux 4.15.0-1.2
(LP: #1737761)
- SAUCE: (noup) Update zfs to 0.6.5.11-1ubuntu3.2

  * AT_BASE_PLATFORM in AUXV is absent on kernels available on Ubuntu 17.10
(LP: #1759312)
- powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features

  * btrfs and tar sparse truncate archives (LP: #1757565)
- Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
- Btrfs: fix reported number of inode blocks after buffered append writes

  * efifb broken on ThunderX-based Gigabyte nodes (LP: #1758375)
- drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it

  * Intel i40e PF reset due to incorrect MDD detection (continues...)
(LP: #1723127)
- i40e/i40evf: Account for frags split over multiple descriptors in check
  linearize

  * Fix an issue that when system in S3, USB keyboard can't wake up the system.
(LP: #1759511)
- ACPI / PM: Allow deeper wakeup power states with no _SxD nor _SxW

  * [8086:3e92] display becomes blank after S3 (LP: #1759188)
- drm/i915: Apply Display WA #1183 on skl, kbl, and cfl

  * add audio kernel patches for Raven (LP: #1758364)
- ALSA: hda: Add Raven PCI ID
- ALSA: hda/realtek - Fix ALC700 family no sound issue

  * Cpu utilization showing system time for kvm guests (performance) (sysstat)
(LP: #1755979)
- KVM: PPC: Book3S HV: Fix guest time accounting with 
VIRT_CPU_ACCOUNTING_GEN

  * Kernel panic on a nfsroot system (LP: #1734327)
- Revert "UBUNTU: SAUCE: LSM stacking: add stacking support to apparmor
  network hooks"
- Revert "UBUNTU: SAUCE: LSM stacking: LSM: Infrastructure management of the
  remaining blobs"

  * can't record sound via front headset port on the Dell Precision 3630
(LP: #1759088)
- ALSA: hda/realtek - Fix Dell headset Mic can't record

  * speaker can't output sound anymore after system resumes from S3 on a lenovo
machine with alc257 (LP: #1758829)
- ALSA: hda/realtek - Fix speaker no sound after system resume

  * hda driver initialization takes too much time on the machine with coffeelake
audio controller [8086:a348] (LP: #1758800)
- ALSA: hda - Force polling mode on CFL for fixing codec communication

  * Let headset-mode initialization be called on Dell Precision 3930
(LP: #1757584)
- ALSA: hda/realtek - Add headset mode support for Dell laptop

  * ubuntu_zram_smoke test will cause soft lockup on Artful ThunderX ARM64
(LP: #1755073)
- SAUCE: crypto: thunderx_zip: Fix 

[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-04-23 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.13.0-39.44

---
linux (4.13.0-39.44) artful; urgency=medium

  * linux: 4.13.0-39.44 -proposed tracker (LP: #1761456)

  * intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-
image-4.13.0-37-generic) (LP: #1759920) // CVE-2017-5715 (Spectre v2
Intel) // CVE-2017-5754
- x86/mm: Reinitialize TLB state on hotplug and resume

  * intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-
image-4.13.0-37-generic) (LP: #1759920) // CVE-2017-5715 (Spectre v2 Intel)
- Revert "x86/mm: Only set IBPB when the new thread cannot ptrace current
  thread"
- x86/speculation: Use Indirect Branch Prediction Barrier in context switch

  * DKMS driver builds fail with: Cannot use CONFIG_STACK_VALIDATION=y, please
install libelf-dev, libelf-devel or elfutils-libelf-devel (LP: #1760876)
- [Packaging] include the retpoline extractor in the headers

  * retpoline hints: primary infrastructure and initial hints (LP: #1758856)
- [Packaging] retpoline-extract: flag *0xNNN(%reg) branches
- x86/speculation, objtool: Annotate indirect calls/jumps for objtool
- x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 
32bit
- x86/paravirt, objtool: Annotate indirect calls
- [Packaging] retpoline -- add safe usage hint support
- [Packaging] retpoline-check -- only report additions
- [Packaging] retpoline -- widen indirect call/jmp detection
- [Packaging] retpoline -- elide %rip relative indirections
- [Packaging] retpoline -- clear hint information from packages
- KVM: x86: Make indirect calls in emulator speculation safe
- KVM: VMX: Make indirect call speculation safe
- x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
- SAUCE: early/late -- annotate indirect calls in early/late initialisation
  code
- SAUCE: vga_set_mode -- avoid jump tables
- [Config] retpoline -- switch to new format
- [Packaging] retpoline hints -- handle missing files when RETPOLINE not
  enabled
- [Packaging] final-checks -- remove check for empty retpoline files

  * retpoline: ignore %cs:0xNNN constant indirections (LP: #1752655)
- [Packaging] retpoline -- elide %cs:0x constants on i386

  * zfs system process hung on container stop/delete (LP: #1754584)
- SAUCE: Fix non-prefaulted page deadlock (LP: #1754584)

  * zfs-linux 0.6.5.11-1ubuntu5 ADT test failure with linux 4.15.0-1.2
(LP: #1737761)
- SAUCE: (noup) Update zfs to 0.6.5.11-1ubuntu3.2

  * AT_BASE_PLATFORM in AUXV is absent on kernels available on Ubuntu 17.10
(LP: #1759312)
- powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features

  * btrfs and tar sparse truncate archives (LP: #1757565)
- Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
- Btrfs: fix reported number of inode blocks after buffered append writes

  * efifb broken on ThunderX-based Gigabyte nodes (LP: #1758375)
- drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it

  * Intel i40e PF reset due to incorrect MDD detection (continues...)
(LP: #1723127)
- i40e/i40evf: Account for frags split over multiple descriptors in check
  linearize

  * Fix an issue that when system in S3, USB keyboard can't wake up the system.
(LP: #1759511)
- ACPI / PM: Allow deeper wakeup power states with no _SxD nor _SxW

  * [8086:3e92] display becomes blank after S3 (LP: #1759188)
- drm/i915: Apply Display WA #1183 on skl, kbl, and cfl

  * add audio kernel patches for Raven (LP: #1758364)
- ALSA: hda: Add Raven PCI ID
- ALSA: hda/realtek - Fix ALC700 family no sound issue

  * Cpu utilization showing system time for kvm guests (performance) (sysstat)
(LP: #1755979)
- KVM: PPC: Book3S HV: Fix guest time accounting with 
VIRT_CPU_ACCOUNTING_GEN

  * Kernel panic on a nfsroot system (LP: #1734327)
- Revert "UBUNTU: SAUCE: LSM stacking: add stacking support to apparmor
  network hooks"
- Revert "UBUNTU: SAUCE: LSM stacking: LSM: Infrastructure management of the
  remaining blobs"

  * can't record sound via front headset port on the Dell Precision 3630
(LP: #1759088)
- ALSA: hda/realtek - Fix Dell headset Mic can't record

  * speaker can't output sound anymore after system resumes from S3 on a lenovo
machine with alc257 (LP: #1758829)
- ALSA: hda/realtek - Fix speaker no sound after system resume

  * hda driver initialization takes too much time on the machine with coffeelake
audio controller [8086:a348] (LP: #1758800)
- ALSA: hda - Force polling mode on CFL for fixing codec communication

  * Let headset-mode initialization be called on Dell Precision 3930
(LP: #1757584)
- ALSA: hda/realtek - Add headset mode support for Dell laptop

  * ubuntu_zram_smoke test will cause soft lockup on Artful ThunderX ARM64
(LP: #1755073)
- SAUCE: crypto: thunderx_zip: Fix 

[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-04-23 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.4.0-121.145

---
linux (4.4.0-121.145) xenial; urgency=medium

  * linux: 4.4.0-121.145 -proposed tracker (LP: #1763687)

  * Ubuntu-4.4.0-120.144 fails to boot on arm64* hardware (LP: #1763644)
- [Config] arm64: disable BPF_JIT_ALWAYS_ON

linux (4.4.0-120.144) xenial; urgency=medium

  * linux: 4.4.0-120.144 -proposed tracker (LP: #1761438)

  * intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-
image-4.13.0-37-generic) (LP: #1759920) // CVE-2017-5715 (Spectre v2 Intel)
- Revert "x86/mm: Only set IBPB when the new thread cannot ptrace current
  thread"
- x86/speculation: Use Indirect Branch Prediction Barrier in context switch

  * DKMS driver builds fail with: Cannot use CONFIG_STACK_VALIDATION=y, please
install libelf-dev, libelf-devel or elfutils-libelf-devel (LP: #1760876)
- [Packaging] include the retpoline extractor in the headers

  * retpoline hints: primary infrastructure and initial hints (LP: #1758856)
- [Packaging] retpoline-extract: flag *0xNNN(%reg) branches
- x86/speculation, objtool: Annotate indirect calls/jumps for objtool
- x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 
32bit
- x86/paravirt, objtool: Annotate indirect calls
- x86/asm: Stop depending on ptrace.h in alternative.h
- [Packaging] retpoline -- add safe usage hint support
- [Packaging] retpoline-check -- only report additions
- [Packaging] retpoline -- widen indirect call/jmp detection
- [Packaging] retpoline -- elide %rip relative indirections
- [Packaging] retpoline -- clear hint information from packages
- SAUCE: modpost: add discard to non-allocatable whitelist
- KVM: x86: Make indirect calls in emulator speculation safe
- KVM: VMX: Make indirect call speculation safe
- x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
- SAUCE: early/late -- annotate indirect calls in early/late initialisation
  code
- SAUCE: vga_set_mode -- avoid jump tables
- [Config] retpoline -- switch to new format
- [Packaging] final-checks -- remove check for empty retpoline files

  * Xenial update to 4.4.117 stable release (LP: #1756860)
- IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH 
ports
- PM / devfreq: Propagate error from devfreq_add_device()
- s390: fix handling of -1 in set{,fs}[gu]id16 syscalls
- ARM: dts: STi: Add gpio polarity for "hdmi,hpd-gpio" property
- arm: spear600: Add missing interrupt-parent of rtc
- arm: spear13xx: Fix dmas cells
- arm: spear13xx: Fix spics gpio controller's warning
- ALSA: seq: Fix regression by incorrect ioctl_mutex usages
- KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(),
  by always inlining iterator helper methods
- x86/cpu: Change type of x86_cache_size variable to unsigned int
- drm/radeon: adjust tested variable
- rtc-opal: Fix handling of firmware error codes, prevent busy loops
- ext4: save error to disk in __ext4_grp_locked_error()
- ext4: correct documentation for grpid mount option
- mm: hide a #warning for COMPILE_TEST
- video: fbdev: atmel_lcdfb: fix display-timings lookup
- console/dummy: leave .con_font_get set to NULL
- rtlwifi: rtl8821ae: Fix connection lost problem correctly
- Btrfs: fix deadlock in run_delalloc_nocow
- Btrfs: fix crash due to not cleaning up tree log block's dirty bits
- Btrfs: fix unexpected -EEXIST when creating new inode
- ALSA: hda - Fix headset mic detection problem for two Dell machines
- ALSA: usb-audio: Fix UAC2 get_ctl request with a RANGE attribute
- ALSA: hda/realtek: PCI quirk for Fujitsu U7x7
- ALSA: usb-audio: add implicit fb quirk for Behringer UFX1204
- ALSA: seq: Fix racy pool initializations
- mvpp2: fix multicast address filter
- dm: correctly handle chained bios in dec_pending()
- x86: fix build warnign with 32-bit PAE
- vfs: don't do RCU lookup of empty pathnames
- ARM: pxa/tosa-bt: add MODULE_LICENSE tag
- ARM: dts: s5pv210: add interrupt-parent for ohci
- media: r820t: fix r820t_write_reg for KASAN
- Linux 4.4.117

  * zfs system process hung on container stop/delete (LP: #1754584)
- SAUCE: (noup) zfs to 0.6.5.6-0ubuntu19
- SAUCE: Fix non-prefaulted page deadlock (LP: #1754584)

  * apparmor: fix bad __initdata tagging on, apparmor_initialized (LP: #1758471)
- SAUCE: apparmor: fix bad __initdata tagging on, apparmor_initialized

  * Xenial update to 4.4.116 stable release (LP: #1756121)
- powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
- powerpc/64: Fix flush_(d|i)cache_range() called from modules
- powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
- powerpc: Simplify module TOC handling
- ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
- usbip: vhci_hcd: clear just the 

[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-04-23 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.4.0-121.145

---
linux (4.4.0-121.145) xenial; urgency=medium

  * linux: 4.4.0-121.145 -proposed tracker (LP: #1763687)

  * Ubuntu-4.4.0-120.144 fails to boot on arm64* hardware (LP: #1763644)
- [Config] arm64: disable BPF_JIT_ALWAYS_ON

linux (4.4.0-120.144) xenial; urgency=medium

  * linux: 4.4.0-120.144 -proposed tracker (LP: #1761438)

  * intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-
image-4.13.0-37-generic) (LP: #1759920) // CVE-2017-5715 (Spectre v2 Intel)
- Revert "x86/mm: Only set IBPB when the new thread cannot ptrace current
  thread"
- x86/speculation: Use Indirect Branch Prediction Barrier in context switch

  * DKMS driver builds fail with: Cannot use CONFIG_STACK_VALIDATION=y, please
install libelf-dev, libelf-devel or elfutils-libelf-devel (LP: #1760876)
- [Packaging] include the retpoline extractor in the headers

  * retpoline hints: primary infrastructure and initial hints (LP: #1758856)
- [Packaging] retpoline-extract: flag *0xNNN(%reg) branches
- x86/speculation, objtool: Annotate indirect calls/jumps for objtool
- x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 
32bit
- x86/paravirt, objtool: Annotate indirect calls
- x86/asm: Stop depending on ptrace.h in alternative.h
- [Packaging] retpoline -- add safe usage hint support
- [Packaging] retpoline-check -- only report additions
- [Packaging] retpoline -- widen indirect call/jmp detection
- [Packaging] retpoline -- elide %rip relative indirections
- [Packaging] retpoline -- clear hint information from packages
- SAUCE: modpost: add discard to non-allocatable whitelist
- KVM: x86: Make indirect calls in emulator speculation safe
- KVM: VMX: Make indirect call speculation safe
- x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
- SAUCE: early/late -- annotate indirect calls in early/late initialisation
  code
- SAUCE: vga_set_mode -- avoid jump tables
- [Config] retpoline -- switch to new format
- [Packaging] final-checks -- remove check for empty retpoline files

  * Xenial update to 4.4.117 stable release (LP: #1756860)
- IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH 
ports
- PM / devfreq: Propagate error from devfreq_add_device()
- s390: fix handling of -1 in set{,fs}[gu]id16 syscalls
- ARM: dts: STi: Add gpio polarity for "hdmi,hpd-gpio" property
- arm: spear600: Add missing interrupt-parent of rtc
- arm: spear13xx: Fix dmas cells
- arm: spear13xx: Fix spics gpio controller's warning
- ALSA: seq: Fix regression by incorrect ioctl_mutex usages
- KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(),
  by always inlining iterator helper methods
- x86/cpu: Change type of x86_cache_size variable to unsigned int
- drm/radeon: adjust tested variable
- rtc-opal: Fix handling of firmware error codes, prevent busy loops
- ext4: save error to disk in __ext4_grp_locked_error()
- ext4: correct documentation for grpid mount option
- mm: hide a #warning for COMPILE_TEST
- video: fbdev: atmel_lcdfb: fix display-timings lookup
- console/dummy: leave .con_font_get set to NULL
- rtlwifi: rtl8821ae: Fix connection lost problem correctly
- Btrfs: fix deadlock in run_delalloc_nocow
- Btrfs: fix crash due to not cleaning up tree log block's dirty bits
- Btrfs: fix unexpected -EEXIST when creating new inode
- ALSA: hda - Fix headset mic detection problem for two Dell machines
- ALSA: usb-audio: Fix UAC2 get_ctl request with a RANGE attribute
- ALSA: hda/realtek: PCI quirk for Fujitsu U7x7
- ALSA: usb-audio: add implicit fb quirk for Behringer UFX1204
- ALSA: seq: Fix racy pool initializations
- mvpp2: fix multicast address filter
- dm: correctly handle chained bios in dec_pending()
- x86: fix build warnign with 32-bit PAE
- vfs: don't do RCU lookup of empty pathnames
- ARM: pxa/tosa-bt: add MODULE_LICENSE tag
- ARM: dts: s5pv210: add interrupt-parent for ohci
- media: r820t: fix r820t_write_reg for KASAN
- Linux 4.4.117

  * zfs system process hung on container stop/delete (LP: #1754584)
- SAUCE: (noup) zfs to 0.6.5.6-0ubuntu19
- SAUCE: Fix non-prefaulted page deadlock (LP: #1754584)

  * apparmor: fix bad __initdata tagging on, apparmor_initialized (LP: #1758471)
- SAUCE: apparmor: fix bad __initdata tagging on, apparmor_initialized

  * Xenial update to 4.4.116 stable release (LP: #1756121)
- powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
- powerpc/64: Fix flush_(d|i)cache_range() called from modules
- powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
- powerpc: Simplify module TOC handling
- ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
- usbip: vhci_hcd: clear just the 

[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-04-10 Thread Dan Streetman
Due to the nature of this bug, being very difficult to reproduce, real 
verification could take weeks instead of only days.  However, one reporter has 
been running with a test kernel I built here
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1723127

which is the base 4.4.0-112 kernel plus the two patches from this bug.
In their testing, running on 6 weeks now, the problem has not reproduced
and they have seen no other issues.  Of course, that test kernel doesn't
have all the other patches that the -proposed kernel has, but that
testing is likely the best verification we can get for this particular
bug.  I have also asked the same reporter to switch their testing from
my test kernel over to the -proposed kernel, and to report any
unexpected issues they see.  If they do report any regression, I'll
communicate that here.

Based on that justification, I'll mark this bug as verified.

** Tags removed: verification-needed-artful verification-needed-xenial
** Tags added: verification-done-artful verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Artful:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [impact]

  The i40e driver sometimes causes a "malicious device" event that the
  firmware detects, which causes the firmware to reset the nic, causing
  an interruption in the network connection - which can cause further
  problems, e.g. if the interface is in a bond; the reset will at least
  cause a temporary interruption in network traffic.

  [fix]

  The upstream patch to fix this adjusts how the driver fragments TX
  data; the "malicious driver" detected by the firmware is a result of
  incorrectly crafted TX fragment descriptors (the firmware has specific
  complicated restrictions on this).  The patch is from Intel, and they
  suggested this specific patch to address the problem; additionally I
  have checked with someone who reported this to me and provided a test
  kernel with the patch to them, and they have been able to run ~6 weeks
  so far without reproducing the issue; previously they could reproduce
  it as quickly as a day, but usually within 2-3 weeks.

  [test case]

  the bug is unfortunately very difficult to reproduce, but as shown in
  this (and previous) bug comments, some users of the i40e have traffic
  that can consistently reproduce the problem (although usually on the
  order of days, or longer, to reproduce).  Reproducing is easily
  detected, as the nw traffic will be interrupted and the system logs
  will contain a message like:

  i40e :02:00.1: TX driver issue detected, PF reset issued

  [regression potential]

  the patch for this alters how tx is fragmented by the driver, so a
  possible regression would likely cause problems in TX traffic and/or
  additional "malicious device detection" events.


  [original description]

  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-04-10 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
artful' to 'verification-done-artful'. If the problem still exists,
change the tag 'verification-needed-artful' to 'verification-failed-
artful'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Artful:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [impact]

  The i40e driver sometimes causes a "malicious device" event that the
  firmware detects, which causes the firmware to reset the nic, causing
  an interruption in the network connection - which can cause further
  problems, e.g. if the interface is in a bond; the reset will at least
  cause a temporary interruption in network traffic.

  [fix]

  The upstream patch to fix this adjusts how the driver fragments TX
  data; the "malicious driver" detected by the firmware is a result of
  incorrectly crafted TX fragment descriptors (the firmware has specific
  complicated restrictions on this).  The patch is from Intel, and they
  suggested this specific patch to address the problem; additionally I
  have checked with someone who reported this to me and provided a test
  kernel with the patch to them, and they have been able to run ~6 weeks
  so far without reproducing the issue; previously they could reproduce
  it as quickly as a day, but usually within 2-3 weeks.

  [test case]

  the bug is unfortunately very difficult to reproduce, but as shown in
  this (and previous) bug comments, some users of the i40e have traffic
  that can consistently reproduce the problem (although usually on the
  order of days, or longer, to reproduce).  Reproducing is easily
  detected, as the nw traffic will be interrupted and the system logs
  will contain a message like:

  i40e :02:00.1: TX driver issue detected, PF reset issued

  [regression potential]

  the patch for this alters how tx is fragmented by the driver, so a
  possible regression would likely cause problems in TX traffic and/or
  additional "malicious device detection" events.


  [original description]

  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-04-10 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'. If the problem still exists,
change the tag 'verification-needed-xenial' to 'verification-failed-
xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-xenial

** Tags added: verification-needed-artful

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Artful:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [impact]

  The i40e driver sometimes causes a "malicious device" event that the
  firmware detects, which causes the firmware to reset the nic, causing
  an interruption in the network connection - which can cause further
  problems, e.g. if the interface is in a bond; the reset will at least
  cause a temporary interruption in network traffic.

  [fix]

  The upstream patch to fix this adjusts how the driver fragments TX
  data; the "malicious driver" detected by the firmware is a result of
  incorrectly crafted TX fragment descriptors (the firmware has specific
  complicated restrictions on this).  The patch is from Intel, and they
  suggested this specific patch to address the problem; additionally I
  have checked with someone who reported this to me and provided a test
  kernel with the patch to them, and they have been able to run ~6 weeks
  so far without reproducing the issue; previously they could reproduce
  it as quickly as a day, but usually within 2-3 weeks.

  [test case]

  the bug is unfortunately very difficult to reproduce, but as shown in
  this (and previous) bug comments, some users of the i40e have traffic
  that can consistently reproduce the problem (although usually on the
  order of days, or longer, to reproduce).  Reproducing is easily
  detected, as the nw traffic will be interrupted and the system logs
  will contain a message like:

  i40e :02:00.1: TX driver issue detected, PF reset issued

  [regression potential]

  the patch for this alters how tx is fragmented by the driver, so a
  possible regression would likely cause problems in TX traffic and/or
  additional "malicious device detection" events.


  [original description]

  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-04-03 Thread Kleber Sacilotto de Souza
** Changed in: linux (Ubuntu Xenial)
   Status: In Progress => Fix Committed

** Changed in: linux (Ubuntu Artful)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Artful:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [impact]

  The i40e driver sometimes causes a "malicious device" event that the
  firmware detects, which causes the firmware to reset the nic, causing
  an interruption in the network connection - which can cause further
  problems, e.g. if the interface is in a bond; the reset will at least
  cause a temporary interruption in network traffic.

  [fix]

  The upstream patch to fix this adjusts how the driver fragments TX
  data; the "malicious driver" detected by the firmware is a result of
  incorrectly crafted TX fragment descriptors (the firmware has specific
  complicated restrictions on this).  The patch is from Intel, and they
  suggested this specific patch to address the problem; additionally I
  have checked with someone who reported this to me and provided a test
  kernel with the patch to them, and they have been able to run ~6 weeks
  so far without reproducing the issue; previously they could reproduce
  it as quickly as a day, but usually within 2-3 weeks.

  [test case]

  the bug is unfortunately very difficult to reproduce, but as shown in
  this (and previous) bug comments, some users of the i40e have traffic
  that can consistently reproduce the problem (although usually on the
  order of days, or longer, to reproduce).  Reproducing is easily
  detected, as the nw traffic will be interrupted and the system logs
  will contain a message like:

  i40e :02:00.1: TX driver issue detected, PF reset issued

  [regression potential]

  the patch for this alters how tx is fragmented by the driver, so a
  possible regression would likely cause problems in TX traffic and/or
  additional "malicious device detection" events.


  [original description]

  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-03-21 Thread Dan Streetman
** Description changed:

+ [impact]
+ 
+ The i40e driver sometimes causes a "malicious device" event that the
+ firmware detects, which causes the firmware to reset the nic, causing an
+ interruption in the network connection - which can cause further
+ problems, e.g. if the interface is in a bond; the reset will at least
+ cause a temporary interruption in network traffic.
+ 
+ [fix]
+ 
+ The upstream patch to fix this adjusts how the driver fragments TX data;
+ the "malicious driver" detected by the firmware is a result of
+ incorrectly crafted TX fragment descriptors (the firmware has specific
+ complicated restrictions on this).  The patch is from Intel, and they
+ suggested this specific patch to address the problem; additionally I
+ have checked with someone who reported this to me and provided a test
+ kernel with the patch to them, and they have been able to run ~6 weeks
+ so far without reproducing the issue; previously they could reproduce it
+ as quickly as a day, but usually within 2-3 weeks.
+ 
+ [test case]
+ 
+ the bug is unfortunately very difficult to reproduce, but as shown in
+ this (and previous) bug comments, some users of the i40e have traffic
+ that can consistently reproduce the problem (although usually on the
+ order of days, or longer, to reproduce).  Reproducing is easily
+ detected, as the nw traffic will be interrupted and the system logs will
+ contain a message like:
+ 
+ i40e :02:00.1: TX driver issue detected, PF reset issued
+ 
+ [regression potential]
+ 
+ the patch for this alters how tx is fragmented by the driver, so a
+ possible regression would likely cause problems in TX traffic and/or
+ additional "malicious device detection" events.
+ 
+ 
+ [original description]
+ 
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.
  
  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.
  
  See bug 1713553 for more details.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Artful:
  In Progress
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [impact]

  The i40e driver sometimes causes a "malicious device" event that the
  firmware detects, which causes the firmware to reset the nic, causing
  an interruption in the network connection - which can cause further
  problems, e.g. if the interface is in a bond; the reset will at least
  cause a temporary interruption in network traffic.

  [fix]

  The upstream patch to fix this adjusts how the driver fragments TX
  data; the "malicious driver" detected by the firmware is a result of
  incorrectly crafted TX fragment descriptors (the firmware has specific
  complicated restrictions on this).  The patch is from Intel, and they
  suggested this specific patch to address the problem; additionally I
  have checked with someone who reported this to me and provided a test
  kernel with the patch to them, and they have been able to run ~6 weeks
  so far without reproducing the issue; previously they could reproduce
  it as quickly as a day, but usually within 2-3 weeks.

  [test case]

  the bug is unfortunately very difficult to reproduce, but as shown in
  this (and previous) bug comments, some users of the i40e have traffic
  that can consistently reproduce the problem (although usually on the
  order of days, or longer, to reproduce).  Reproducing is easily
  detected, as the nw traffic will be interrupted and the system logs
  will contain a message like:

  i40e :02:00.1: TX driver issue detected, PF reset issued

  [regression potential]

  the patch for this alters how tx is fragmented by the driver, so a
  possible regression would likely cause problems in TX traffic and/or
  additional "malicious device detection" events.


  [original description]

  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" 

[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-03-20 Thread Dan Streetman
As mentioned, upstream commit 248de22e638f10bd5bfc7624a357f940f66ba137
("i40e/i40evf: Account for frags split over multiple descriptors in
check linearize") appears to finally fix this.  This commit is already
included in bionic, but is required in artful and earlier.

In xenial, the commit 5c4654daf2e2f25dfbd7fa572c59937ea6d4198b
("i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead
of 8K") is also required.


** Also affects: linux (Ubuntu Artful)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Artful)
 Assignee: (unassigned) => Dan Streetman (ddstreet)

** Changed in: linux (Ubuntu Artful)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Artful)
   Status: New => Incomplete

** Changed in: linux (Ubuntu Artful)
   Status: Incomplete => In Progress

** Also affects: linux (Ubuntu Bionic)
   Importance: Medium
 Assignee: Dan Streetman (ddstreet)
   Status: In Progress

** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Fix Released

** Also affects: linux (Ubuntu Trusty)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Trusty)
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Artful:
  In Progress
Status in linux source package in Bionic:
  Fix Released

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-02-21 Thread Dan Streetman
Hello,

can anyone still experiencing this on the 4.4 kernel please test with the 
kernel from this PPA:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1723127
Test kernel version is 4.4.0-112.135+hf1723127v20180206b2


If anyone would like to test with the 4.13 kernel please let me know and I can 
build it with the recent upstream patch 
(248de22e638f10bd5bfc7624a357f940f66ba137) that may finally fix this.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2018-01-24 Thread Stefan Kooman
Hi there. I can confirm this problem still exists in newest kernels and
with the latest intel drivers as of today:

Jan 19 16:05:19 osd9 kernel: [511271.581413] i40e :02:00.1: TX driver issue 
detected, PF reset issued
Jan 19 16:09:08 osd9 kernel: [511500.919380] i40e :02:00.0: TX driver issue 
detected, PF reset issued

driver: i40e-2.4.3 (and xenial / 4.13 shipped driver: 2.1.14-k)
kernel: 4.13.0-25-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 12:16:39 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux. Kernel loaded with nopti noibrs noibpb 
(Meltdown / Spetre mitigation disabled).

We can trigger the issue with high load (benchmarking Ceph cluster with
fio: 4 clients, 8 threads, iodepth 256, 100% random write, 64K block
size).

Only when we use relatively large block size (64K) do we hit this
problem. With 4K blocks we do not hit this issue. We haven't tested
large random reads (that test is still to be done).

When using openvswitch port-channel (as we do) with jumbo frames ...
this port-channel will not come back online after the reset. rmmod i40e
/ modprobe i40e does the trick though.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-12-06 Thread Björn Zettergren
Sorry for the delay, I've not forgotten about this, just been swamped
with other things. Will hopefully have time to do the tests next week.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-11-07 Thread Dan Streetman
> I'd say we go with option #2. Please provide information on how to proceed, 
> and how to
> undo any changes we test :)

ok, so first, these instructions may cause the card to hang; the system
may need to be rebooted or the driver reloaded.  The changes here can be
undone by resetting the card; rebooting or reloading the driver.

Also please note these instructions are ONLY FOR i40e NICs!

The process here is to clear all the nic's hardware asserts, and then
enable each of them one-by-one and try to reproduce the MDD event.  That
way, when it reproduces, we know exactly which hw assert triggered it.

First, find your nic's pci address, e.g. ethtool -i NIC | grep bus-info

Then (as root) cd to "/sys/kernel/debug/i40e/BUSID" (replace BUSID with
your nic's actual pci addr).  You should see a "command" file there.

Now zero out the registers:

$ echo write 0xe648c 0 > command
$ echo write 0x442f4 0 > command

Then, set a single bit; starting with 0x1 on the first register:

$ echo write 0xe648c 0x1 > command

Do normal testing.  There are 3 possibilities at this step:

a) you test long enough to be sure the problem was avoided
b) your system and/or nic hangs due to an "uncaught" MDD event
c) you reproduce the problem, and see the TX error and PF reset

For either (a) or (b), that means this bit isn't the one we're looking
for, so move to the next bit:

$ echo write 0xe648c 0 > command
$ echo write 0x442f4 0 > command
$ echo write 0xe648c 0x2 > command

Then retest.  Replace "0x2" with incrementing bits, as you test each
bit.  Note this is setting individual bits, so the sequence to test is
(in hex) 1, 2, 4, 8, 10, 20, 40, 80, 100, etc.  This is a 32 bit
register so the highest bit to test is 0x8000.  If you test all bits
in register 0xe648c without reproducing the problem, then move on to
register 0x442f4 testing bit-by-bit again starting at 0x1 again.  You
should be able to reproduce the problem with one of the bits set in one
of these two registers, according to what I've been told by Intel.

As you set each bit, you should get output in your dmesg and/or syslog
or kern.log, indicating the current value of the registers, e.g.:

write: 0xe648c = 0x1

You can also manually read the registers at any time with:

$ echo read 0xe648c > command
$ echo read 0x442f4 > command

you should see the results in dmesg/logs, e.g.:

read: 0xe648c = 0x1


Once/if you do reproduce the problem, make note of the values for both 
registers (i.e. what bit was set), and report that back here.  I'll check with 
Intel to find what the specific bit indicates the problem was.

Thanks!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-11-01 Thread Björn Zettergren
No worries, we're not in a hurry.

I'd say we go with option #2. Please provide information on how to
proceed, and how to undo any changes we test :)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-10-27 Thread Dan Streetman
Sorry for the delay.

So we have 2 options on how to continue debugging here:

1. we can try a traditional git bisect.  This would involve testing
various kernel builds, to try to eventually narrow down the issue to
being fixed by a specific commit.  It's a long-ish process, depending on
how long testing each build takes, and it's critical that verification
of 'good' or 'bad' at each step is correct - otherwise the bisect ends
at the wrong commit.  Each step will involve me building a new kernel,
you test with the kernel until it fails or you've tested long enough to
be sure that kernel build is 'good'.  With hard-to-reproduce problems
like this, bisecting can be tough, because if a build doesn't fail for a
long time, that doesn't necessarily mean it's "good", it may just not
have failed yet, in which case the bisect will end at the wrong commit,
which doesn't help with figuring out how to fix anything.

2. Intel has provided me some undocumented commands that will allow
controlling what MDD events the nic triggers on.  I can provide those
instructions, and you can test with each MDD event bit set individually,
until the problem reproduces - then we know exactly which MDD source
triggered the event, which should help identify what the driver did to
cause the MDD event.  This way has a much better chance of finding the
specific problem, but the downside is you'll need to run undocumented
commands with your hardware.  I believe there should not be any risk in
doing that since the info came from Intel, but I can't personally verify
it, as I don't currently have access to this specific NIC.

If you're willing to try #2, I'll add the specific commands/instructions
and you can get started testing.  Otherwise if you would prefer not to
run the undocumented commands, I can start a kernel bisect.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-10-19 Thread Björn Zettergren
kernel v4.10, (4.10.0-041000-generic) has been running fine, without any
issues since 24 hours. I'd say it's OK, as you suspected.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-10-18 Thread Björn Zettergren
> one bug at a time, please.

Absolutely! I just mentioned the "GRO implementation" because I wondered
if it might have been related. I should have googled up better on it
beforehand, that would have enlightened me that it wasn't.

I've tested the v4.4-wily kernel in the first link
(4.4.0-040400-generic), and it failed miserably directly after the
machine came online. I'm attaching a redacted syslog with relevant
messages in it. One thing you'll note is that the i40e driver (1.3.x)
complains that the firmware is too new, this might be a problem(?), but
there's also a message, just before the "TX driver issue detected":

i40e :02:00.1: FD filter programming failed due to incorrect filter
parameters

See the attached file for more details.

We're currently running the second kernel v4.10,
(4.10.0-041000-generic), and it's running fine so far, but the machine
has only been up for 30 minutes, i'll let it run 24 hours, and report
back tomorrow, or as soon as status changes, if at all.

** Attachment added: "redacted_i40e_syslog.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+attachment/4974593/+files/redacted_i40e_syslog.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-10-13 Thread Dan Streetman
> How do we proceed? :-)

one bug at a time, please.  As this NIC's "MDD" behavior doesn't
indicate what happened that it disliked, I can't tell if that is related
or not to the MDD events, but I suspect not, especially if you have not
seen that happen for kernels when you did get MDD events.

since the Ubuntu 4.4.0 isn't an ancestor of the Ubuntu 4.10.0 kernel, to
bisect we would need to start at the merge base anyway (mainline 4.4
kernel); and since there are no changes to the i40e driver between
mainline 4.10 and Ubuntu 4.10.0, a bisect will be a lot easier if we
shift over to the mainline kernel series.

Are you able to test various kernel versions during the bisect process?
It may take a while, and it's important to make sure at each step to
determine for certain if the kernel is 'good' or 'bad' - an incorrect
evaluation at any step leads to an incorrect endpoint.

If you are able to help with a kernel bisect by testing, can you test
each of these kernels:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-wily/

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/

I expect the v4.4 to be 'bad' (encounter the MDD event) and 4.10 to be
'good' (no MDD event), based on your evaluation of the Ubuntu kernels
based on those versions.  If those are good/bad as expected, we can
start the bisection between them.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-10-13 Thread Björn Zettergren
As of now, we've been running HWE 4.10 for little more than 16 hours and
no problems so far. Previously we'd hit the problem within the hour.

There is however one new logmessage that we haven't seen before, neither
with 1.4.x driver or 2.0.x. But it might be unrelated, we can't see any
particular performance-issues in any of our monitoring/graphs. And the
message is:

TCP: bond0.5: Driver has suspect GRO implementation, TCP performance may
be compromised.

How do we proceed? :-)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-10-12 Thread Dan Streetman
> We'll test the regular hwe 4.10 also if you think that narrows the
bisect.

yes please it will help to look just between 4.4 and 4.10.  thanks!

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Xenial)
   Status: New => In Progress

** Changed in: linux (Ubuntu Xenial)
 Assignee: (unassigned) => Dan Streetman (ddstreet)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-10-12 Thread Eric Desrochers
** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-10-12 Thread Björn Zettergren
We've been using hwe-edge 4.11 for almost 24 hours without problems.
We'll test the regular hwe 4.10 also if you think that narrows the
bisect.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1723127] Re: Intel i40e PF reset due to incorrect MDD detection (continues...)

2017-10-12 Thread Dan Streetman
continuing conversation from previous (fix released) bug.

@bjozet, it would help a lot of you could test with the hwe 4.10 kernel
and let me know if that fails also, or if it seems to be fixed there.
If it works, I can review the changes and possibly find something,
and/or work with you on a bisect.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  In Progress

Bug description:
  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp