from:"Jay Vosburgh"

[Kernel-packages] [Bug 1919154] Re: Enable CONFIG_NO_HZ_FULL on supported architectures

2023-10-27 Thread Jay Vosburgh

Gerald,

Using gettimeofday for testing the effects of NO_HZ_FULL on context
switch duration may not be measuring anything that changes with regards
to NO_HZ_FULL.  gettimeofday is implemented via VDSO, and is not an
actual system call that requires a context switch.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1919154

Title:
  Enable CONFIG_NO_HZ_FULL on supported architectures

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  Won't Fix
Status in linux source package in Hirsute:
  In Progress
Status in linux source package in Jammy:
  In Progress
Status in linux source package in Lunar:
  In Progress
Status in linux source package in Mantic:
  In Progress

Bug description:
  [Impact]

  The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid
  sending scheduling-clock interrupts to CPUs with a single runnable task,
  and such CPUs are said to be "adaptive-ticks CPUs".  This is important
  for applications with aggressive real-time response constraints because
  it allows them to improve their worst-case response times by the maximum
  duration of a scheduling-clock interrupt.  It is also important for
  computationally intensive short-iteration workloads:  If any CPU is
  delayed during a given iteration, all the other CPUs will be forced to
  wait idle while the delayed CPU finishes.  Thus, the delay is multiplied
  by one less than the number of CPUs.  In these situations, there is
  again strong motivation to avoid sending scheduling-clock interrupts.

  [Test Plan]

  In order to verify the change will not cause performance issues in
  context switch we should compare the results for:

  ./stress-ng --seq 0 --metrics-brief -t 15

  Running on a dedicated machine and with the following services
  disabled: smartd.service, iscsid.service, apport.service,
  cron.service, anacron.timer, apt-daily.timer, apt-daily-upgrade.timer,
  fstrim.timer, logrotate.timer, motd-news.timer, man-db.timer.

  The results didn't show any performance regression:

  https://kernel.ubuntu.com/~mhcerri/lp1919154/

  [Where problems could occur]

  Performance degradation might happen for workloads with intensive
  context switching.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1919154/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2036675] [NEW] 5.15.0-85 live migration regression

2023-09-19 Thread Jay Vosburgh

Public bug reported:


Fixes added for LP 2032164 [0] to resolve an issue in live migration have 
unfortunately introduced a regression, causing a previously working live 
migration pattern to fail when tested with the 5.15.0-85 kernel from -proposed.

Specifically, live migration from a PKRU-enabled host running a kernel older
than 5.15.0-85 to a host running the 5.15.0-85 kernel will fail.  The
destination can be either with or without PKRU; both cases fail, although
in different ways (one hangs, the other fails due to a PCID flag issue).

The commits in question are

commit fa9225d64f215e8109de10f6b6c7a08f033d0ec0
Author: Dr. David Alan Gilbert 
Date:   Mon Aug 21 14:47:28 2023 +0800

KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES

commit 27a189b881278c8ad9c16b0ee05668d724352733
Author: Leonardo Bras 
Date:   Mon Aug 21 14:47:27 2023 +0800

x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0


[0]   https://bugs.launchpad.net/bugs/2032164

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2036675

Title:
  5.15.0-85 live migration regression

Status in linux package in Ubuntu:
  New

Bug description:
  
  Fixes added for LP 2032164 [0] to resolve an issue in live migration have 
  unfortunately introduced a regression, causing a previously working live 
  migration pattern to fail when tested with the 5.15.0-85 kernel from 
-proposed.

  Specifically, live migration from a PKRU-enabled host running a kernel older
  than 5.15.0-85 to a host running the 5.15.0-85 kernel will fail.  The
  destination can be either with or without PKRU; both cases fail, although
  in different ways (one hangs, the other fails due to a PCID flag issue).

  The commits in question are

  commit fa9225d64f215e8109de10f6b6c7a08f033d0ec0
  Author: Dr. David Alan Gilbert 
  Date:   Mon Aug 21 14:47:28 2023 +0800

  KVM: x86: Always enable legacy FP/SSE in allowed user XFEATURES

  commit 27a189b881278c8ad9c16b0ee05668d724352733
  Author: Leonardo Bras 
  Date:   Mon Aug 21 14:47:27 2023 +0800

  x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0

  
  [0]   https://bugs.launchpad.net/bugs/2032164

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036675/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2004262] Re: Intel E810 NICs driver in causing hangs when booting and bonds configured

2023-01-31 Thread Jay Vosburgh

https://lore.kernel.org/netdev/20230131213703.1347761-2-anthony.l.ngu...@intel.com/T/#u

A possible fix for this problem.  The patch was posted on intel-wired-
lan a couple weeks ago and just hit netdev today.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2004262

Title:
  Intel E810 NICs driver in causing hangs when booting and bonds
  configured

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  jammy 22.04.1
  linux-image-generic 5.15.0-58-generic
  Intel E810-XXV Dual Port NICs in Dell PowerEdge 650

  After beonding is enabled on switch and server side, the system will
  hang at initialing ubuntu.  The kernel loads but around starting the
  Network Services the system can hang for sometimes 5 minutes, and in
  other cases, indefinitely.

  The message of:

  echo 0 > /proc/sys/kernel/hung_task_timeout_sec”  systemd-resolve
  blocked for more than 120 seconds

  appears, and eventually the Network services just attempts to start
  and never does.  This is with or without DHCP enabled.

  Tried this same setup with the hwe-22.04, hwe-20.04, hwe-22.04-ege and
  linux-oem kernels and all exhibit the same failure.

  To work around this. installing the Intel 'ice' driver of version
  1.10.1.2.2 works.  The system doesn't even remotely hang at startup
  and all networking functions remain working (ping, DNS, general
  accessibility).

  The driver can be found at 
https://downloadmirror.intel.com/763930/ice-1.10.1.2.2.tar.gz
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Jan 31 13:08 seq
   crw-rw 1 root audio 116, 33 Jan 31 13:08 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  CasperMD5json:
   {
 "result": "skip"
   }
  DistroRelease: Ubuntu 22.04
  InstallationDate: Installed on 2023-01-27 (3 days ago)
  InstallationMedia: Ubuntu-Server 22.04.1 LTS "Jammy Jellyfish" - Release 
amd64 (20220809)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  MachineType: Dell Inc. PowerEdge R650
  Package: linux (not installed)
  PciMultimedia:
   
  ProcFB: 0 mgag200drmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-58-generic 
root=UUID=668aab7c-abe9-434b-a810-acc6eab76cbc ro fsck.mode=skip
  ProcVersionSignature: Ubuntu 5.15.0-58.64-generic 5.15.74
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-58-generic N/A
   linux-backports-modules-5.15.0-58-generic  N/A
   linux-firmware 20220329.git681281e4-0ubuntu3.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  jammy uec-images
  Uname: Linux 5.15.0-58-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: True
  dmi.bios.date: 09/14/2022
  dmi.bios.release: 1.8
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 1.8.2
  dmi.board.name: 0PJ7YJ
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A01
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvr1.8.2:bd09/14/2022:br1.8:svnDellInc.:pnPowerEdgeR650:pvr:rvnDellInc.:rn0PJ7YJ:rvrA01:cvnDellInc.:ct23:cvr:skuSKU=0912;ModelName=PowerEdgeR650:
  dmi.product.family: PowerEdge
  dmi.product.name: PowerEdge R650
  dmi.product.sku: SKU=0912;ModelName=PowerEdge R650
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2004262/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1959702] Re: Regression: ip6 ndp broken, host bridge doesn't add vlan guest entry to mdb

2022-02-09 Thread Jay Vosburgh

Harry,

I'm still working to reproduce this, without success.  I have set
the .autoconf sysctl to 0 (which controls creation of local addresses in
response to received Router Advertisements), as well as setting
.addr_gen_mode to 1 (to disable SLAAC (fe80::) addresses).

In any event, .autoconf=0 and .addr_gen_mode=1 still fails to
reproduce the issue on my test system.

I find that if I disable mcast_flood on the relevant bridge ports
(i.e., bridge link set dev vnet1 mcast_flood off) I do see the behavior
you describe, but in that case no variant that I've tried (no vid, and all
vids in use) of "bridge mdb add ... grp ff02::1:ff00:2" appears to permit
ND traffic to pass to the VM destination.

Can you provide more specifics of how exactly the bridge and ports
are configured?  Ideally, both the method to set it up, as well as the
configuration details when failing (i.e., "ip -s -d link show" for the
bridge and relevant bridge ports, "bridge vlan show", "bridge mdb show",
"bridge fdb show br [bridgename]")

Also, to answer a question from your original report, the default
setting in the kernel for multicast_snooping (enabled, i.e., 1) hasn't
changed recently (and quite possibly ever).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1959702

Title:
  Regression: ip6 ndp broken, host bridge doesn't add vlan guest entry
  to mdb

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Starting at the end: I believe as the bug presently requires each of
  the host's bridge ports to be ipv6 addressable to enable ipv6 to
  function in the guest, and most admins won't think to add special
  entries into their host's nftables.conf to allow for it 'because who
  knew?' it represents what you might call a 'passive security
  vulnerability'.

  A recent kernel upgrade has broken ipv6/ip6 ndp in a host/kvm setup
  using a bridge on the host and vlans for some guests.  I've tracked
  the problem to a failure of the mcast code to add entries to the
  host's mdb table.  Manually adding the entries to the mdb on the
  bridge corrects the problem.

  It's very easy to demonstrate the bug in an all ubuntu setup.

  1. On an ubuntu host, create two vms, I used libvirt, as set up below.

  2. On the host, create a bridge and vlan with two ports, each with the
  chosen vlan as PVID and egress untagged. Assign those ports one each
  to the guests as the interface, use e1000. Be sure to NOT
  autoconfigure the host side of the bridge ports with any ip4 or ip6
  address (including fe80::), it's just an avoidable security risk.  We
  don't want to allow the host any sort of ip access / exposure to the
  vlan.  In other words, treat the host's bridge ports as if a 'real
  off-host switch' without expectation of making each bridge's port
  being ip6 addressable on the bridge itself.   (FWIW: Worth checking if
  the vlan is left tagged and not pvid, and the vlan is decoded in the
  guest as a separate interface, does the problem go away? It imposes
  the burden of vlan management awareness to the guest and so is not
  acceptable as a solution.)

  3. On the host, assign a physical NIC to the bridge and the vlan to
  the nic.  The egress is tagged for the chosen vlan and not PVID.
  Optionally set up an off-host gateway for the vlan, but it isn't
  necessary to show the bug.

  4. On each guest, manually assign a unique ip4 and ip6 address on the
  same subnet (you'll see though dhcp4 could work if there was an off-
  host router providing related services, the bug prevents dhcp6 from
  working).

  5. On one vm, ping the other.  Notice ip4 pings work, ip6 pings do
  not.

  6. Manually add the fe02::ffxx: entries for each vm to the vlan to
  the host bridge's multicast table. Use 'temp' if you're quick enough,
  otherwise perm.

  7. Notice pings between the guests now work on ip6 and ipv4.

  Using tcpdump and watching icmp6 traffic, you'll notice the packets
  making it across the various bridge ports the moment you manually add
  the appropriate fe02::ff... multicast address to the mdb table.
  Beware a false sense of security: Once the ndp completes and the link
  addresses are in the fdb, it can 'seem like' everything is fine until
  the fdb times out and the required mdb entry again must be used to
  allow ndp to refresh the address.

  Setting mcast_querier doesn't help.  Perhaps previous kernels turned
  off the multicast snooping by default and just flooded all the bridge
  ports with all multicast traffic so this bug was avoided.

  It's my hunch the reason there hasn't been more complaint about this
  is it takes an extra step to not autoconfigure the vm ports with
  fe80:: link local addresses on the host.  I believe the existence of
  the fe80 address on the host ports engages ndp code on the host to
  load the mdb as if preparing for the host's side of the

[Kernel-packages] [Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Jay Vosburgh

Thimo,

Thanks for the update; just to clarify, for your "procedure to recover,"
are you saying that that procedure will always resolve the damage, or
that even after that procedure, there may be corruption?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  Seems to be closely related to
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578

  After updating the Ubuntu 18.04 kernel from 4.15.0-124 to 4.15.0-126
  the fstrim command triggered by fstrim.timer causes a severe number of
  mismatches between two RAID10 component devices.

  This bug affects several machines in our company with different HW
  configurations (All using ECC RAM). Both, NVMe and SATA SSDs are
  affected.

  How to reproduce:
   - Create a RAID10 LVM and filesystem on two SSDs
  mdadm -C -v -l10 -n2 -N "lv-raid" -R /dev/md0 /dev/nvme0n1p2 
/dev/nvme1n1p2
  pvcreate -ff -y /dev/md0
  vgcreate -f -y VolGroup /dev/md0
  lvcreate -n root-L 100G -ay -y VolGroup
  mkfs.ext4 /dev/VolGroup/root
  mount /dev/VolGroup/root /mnt
   - Write some data, sync and delete it
  dd if=/dev/zero of=/mnt/data.raw bs=4K count=1M
  sync
  rm /mnt/data.raw
   - Check the RAID device
  echo check >/sys/block/md0/md/sync_action
   - After finishing (see /proc/mdstat), check the mismatch_cnt (should be 0):
  cat /sys/block/md0/md/mismatch_cnt
   - Trigger the bug
  fstrim /mnt
   - Re-Check the RAID device
  echo check >/sys/block/md0/md/sync_action
   - After finishing (see /proc/mdstat), check the mismatch_cnt (probably in 
the range of N*1):
  cat /sys/block/md0/md/mismatch_cnt

  After investigating this issue on several machines it *seems* that the
  first drive does the trim correctly while the second one goes wild. At
  least the number and severity of errors found by a  USB stick live
  session fsck.ext4 suggests this.

  To perform the single drive evaluation the RAID10 was started using a single 
drive at once:
mdadm --assemble /dev/md127 /dev/nvme0n1p2
mdadm --run /dev/md127
fsck.ext4 -n -f /dev/VolGroup/root

vgchange -a n /dev/VolGroup
mdadm --stop /dev/md127

mdadm --assemble /dev/md127 /dev/nvme1n1p2
mdadm --run /dev/md127
fsck.ext4 -n -f /dev/VolGroup/root

  When starting these fscks without -n, on the first device it seems the
  directory structure is OK while on the second device there is only the
  lost+found folder left.

  Side-note: Another machine using HWE kernel 5.4.0-56 (after using -53
  before) seems to have a quite similar issue.

  Unfortunately the risk/regression assessment in the aforementioned bug
  is not complete: the workaround only mitigates the issues during FS
  creation. This bug on the other hand is triggered by a weekly service
  (fstrim) causing severe file system corruption.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-09 Thread Jay Vosburgh

wgrant, you said:

That :a-152 is meant to be /sys/kernel/slab/:a-152. Even a
working kernel shows some trouble there:

  $ uname -a
  Linux  5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  $ ls -l /sys/kernel/slab | grep a-152
  lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-152

Are you saying that the symlink is "some trouble" here?  Because that
part isn't an error, that's the effect of slab merge (that the kernel
normally treats all slabs of the same size as one big slab with multiple
references, more or less).

Slab merge can be disabled via "slab_nomerge" on the command line.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Focal:
  New

Bug description:
  One of my bionic servers with HWE 5.4.0 hangs on boot (apparently
  while starting LVM snapshots) after upgrading from Linux 5.4.0-42 to
  5.4.0-47, with the following trace:

    [   29.126292] kobject_add_internal failed for :a-152 with -EEXIST, 
don't try to register things with the same name in the same directory.
    [   29.138854] BUG: kernel NULL pointer dereference, address: 
0020
    [   29.145977] #PF: supervisor read access in kernel mode
    [   29.145979] #PF: error_code(0x) - not-present page
    [   29.145981] PGD 0 P4D 0
    [   29.158800] Oops:  [#1] SMP NOPTI
    [   29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic 
#50~18.04.1-Ubuntu
    [   29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3 
07/15/2019
    [   29.178038] RIP: 0010:free_percpu+0x120/0x1f0
    [   29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 
4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 
58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45
    [   29.202530] RSP: 0018:a2f69c3d38e8 EFLAGS: 00010046
    [   29.209204] RAX:  RBX: 92202ff397c0 RCX: 
a880a000
    [   29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI: 

    [   29.223469] RBP: a2f69c3d3918 R08:  R09: 
a74a5300
    [   29.230609] R10: a2f69c3d3820 R11:  R12: 
cf35c0f24f14c3c0
    [   29.237745] R13: cf362fb2a054c3c0 R14: 0287 R15: 
0008
    [   29.244878] FS:  7f93a04b0900() GS:913faed8() 
knlGS:
    [   29.252961] CS:  0010 DS:  ES:  CR0: 80050033
    [   29.258707] CR2: 0020 CR3: 003fa9d9 CR4: 
003406e0
    [   29.265883] Call Trace:
    [   29.268346]  __kmem_cache_release+0x1a/0x30
    [   29.273913]  __kmem_cache_create+0x4f9/0x550
    [   29.278192]  ? __kmalloc_node+0x1eb/0x320
    [   29.282205]  ? kvmalloc_node+0x31/0x80
    [   29.285962]  create_cache+0x120/0x1f0
    [   29.291003]  kmem_cache_create_usercopy+0x17d/0x270
    [   29.295882]  kmem_cache_create+0x16/0x20
    [   29.300152]  dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]
    [   29.305644]  ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]
    [   29.310693]  persistent_read_metadata+0x1ed/0x500 [dm_snapshot]
    [   29.316627]  ? _cond_resched+0x19/0x40
    [   29.320384]  snapshot_ctr+0x79e/0x910 [dm_snapshot]
    [   29.325276]  dm_table_add_target+0x18d/0x370
    [   29.329552]  table_load+0x12a/0x370
    [   29.333045]  ctl_ioctl+0x1e2/0x590
    [   29.336450]  ? retrieve_status+0x1c0/0x1c0
    [   29.340551]  dm_ctl_ioctl+0xe/0x20
    [   29.343958]  do_vfs_ioctl+0xa9/0x640
    [   29.347547]  ? ksys_semctl.constprop.19+0xf7/0x190
    [   29.352337]  ksys_ioctl+0x75/0x80
    [   29.355663]  __x64_sys_ioctl+0x1a/0x20
    [   29.359421]  do_syscall_64+0x57/0x190
    [   29.363094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [   29.368144] RIP: 0033:0x7f939f0286d7
    [   29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 
c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 
01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
    [   29.390478] RSP: 002b:7ffe918df168 EFLAGS: 0202 ORIG_RAX: 
0010
    [   29.398045] RAX: ffda RBX: 561c107f672c RCX: 
7f939f0286d7
    [   29.405175] RDX: 561c1107c610 RSI: c138fd09 RDI: 
0009
    [   29.412309] RBP: 7ffe918df220 R08: 7f939f59d120 R09: 
7ffe918defd0
    [   29.419442] R10: 561c1107c6c0 R11: 0202 R12: 
7f939f59c4e6
    [   29.426623] R13: 7f939f59c4e6 R14: 7f939f59c4e6 R15: 
7f939f59c4e6
    [   29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero 
nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd 
kvm ccp k10temp ipmi_si

[Kernel-packages] [Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot

2020-07-14 Thread Jay Vosburgh

** Changed in: linux (Ubuntu)
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1834322

Title:
  Losing port aggregate with 802.3ad port-channel/bonding aggregation on
  reboot

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  We are losing port channel aggregation on reboot.

  After the reboot, /var/log/syslog contains the entries:
  [  250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports
  [  282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports

  Aggregator IDs of the slave interfaces are different:
  ubuntu@node-6:~$ cat /proc/net/bonding/bond2 
  Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

  Bonding Mode: IEEE 802.3ad Dynamic link aggregation
  Transmit Hash Policy: layer3+4 (1)
  MII Status: up
  MII Polling Interval (ms): 100
  Up Delay (ms): 0
  Down Delay (ms): 0

  802.3ad info
  LACP rate: fast
  Min links: 0
  Aggregator selection policy (ad_select): stable

  Slave Interface: enp24s0f1np1
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:51
  Slave queue ID: 0
  Aggregator ID: 1
  Actor Churn State: none
  Partner Churn State: none
  Actor Churned Count: 0
  Partner Churned Count: 0

  Slave Interface: enp24s0f0np0
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:50
  Slave queue ID: 0
  Aggregator ID: 2
  Actor Churn State: churned
  Partner Churn State: churned
  Actor Churned Count: 1
  Partner Churned Count: 1

  The mismatch in "Aggregator ID" on the port is a symptom of the issue.
  If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up',
  the port with the mismatched ID appears to renegotiate with the port-
  channel and becomes aggregated.

  The other way to workaround this issue is to put bond ports down and
  bring up port enp24s0f0np0 first and port enp24s0f1np1 second.

  When I change the order of bringing the ports up (first enp24s0f1np1,
  and second enp24s0f0np0), the issue is still there.

  When the issue occurs, a port on the switch, corresponding to
  interface enp24s0f0np0 is in Suspended state. After applying the
  workaround the port is no longer in Suspended state and Aggregator IDs
  in /proc/net/bonding/bond2 are equal.

  I installed 5.0.0 kernel, the issue is still there.

  Operating System: 
  Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)

  ubuntu@node-6:~$ uname -a
  Linux node-6 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

  ubuntu@node-6:~$ sudo lspci -vnvn
  https://pastebin.ubuntu.com/p/Dy2CKDbySC/

  Hardware: Dell PowerEdge R740xd
  BIOS version: 2.1.7

  sosreport: https://drive.google.com/open?id=1-eN7cZJIeu-
  AQBEU7Gw8a_AJTuq0AOZO

  ubuntu@node-6:~$ lspci | grep Ethernet | grep 10G
  https://pastebin.ubuntu.com/p/sqCx79vZWM/

  ubuntu@node-6:~$ lspci -n | grep 18:00
  18:00.0 0200: 14e4:16d8 (rev 01)
  18:00.1 0200: 14e4:16d8 (rev 01)

  ubuntu@node-6:~$ modinfo bnx2x
  https://pastebin.ubuntu.com/p/pkmzsFjK8M/

  ubuntu@node-6:~$ ip -o l 
  https://pastebin.ubuntu.com/p/QpW7TjnT2v/

  ubuntu@node-6:~$ ip -o a
  https://pastebin.ubuntu.com/p/MczKtrnmDR/

  ubuntu@node-6:~$ cat /etc/netplan/98-juju.yaml
  https://pastebin.ubuntu.com/p/9cZpPc7C6P/

  ubuntu@node-6:~$ sudo lshw -c network
  https://pastebin.ubuntu.com/p/gmfgZptzDT/
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Jun 26 10:21 seq
   crw-rw 1 root audio 116, 33 Jun 26 10:21 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 18.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 004: ID 1604:10c0 Tascam 
   Bus 001 Device 003: ID 1604:10c0 Tascam 
   Bus 001 Device 002: ID 1604:10c0 Tascam 
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Dell Inc. PowerEdge R740xd
  Package: linux (not installed)
  PciMultimedia:

[Kernel-packages] [Bug 1873537] [NEW] PCIe AER device recovery failed due to logic flaw

2020-04-17 Thread Jay Vosburgh

Public bug reported:

SRU Justification

Impact:

During PCI Express Downstream Port Containment (DPC) recovery,
certain types of failures do not recover due to a logic flaw
in pcie_do_recovery().

The upstream git commit log explains the change:

PCI/ERR: Update error status after reset_link()
Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
reset_link() to recover from fatal errors.  But during fatal error
recovery, if the initial value of error status is PCI_ERS_RESULT_DISCONNECT
or PCI_ERS_RESULT_NO_AER_DRIVER then even after successful recovery (using
reset_link()) pcie_do_recovery() will report the recovery result as
failure.  Update the status of error after reset_link().

You can reproduce this issue by triggering a SW DPC using "DPC Software
Trigger" bit in "DPC Control Register".  You should see recovery failed
dmesg log as below:

  pcieport :00:16.0: DPC: containment event, status:0x1f27 source:0x
  pcieport :00:16.0: DPC: software trigger detected
  pci :04:00.0: AER: can't recover (no error_detected callback)
  pcieport :00:16.0: AER: device recovery failed

Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
Link: 
https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.158584.git.sathyanarayanan.kuppusw...@linux.intel.com
[bhelgaas: split pci_channel_io_frozen simplification to separate patch]
Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: Bjorn Helgaas 
Acked-by: Keith Busch 
Cc: Ashok Raj 

Note that a second prerequisite patch is necessary as well.  This patch,

commit b5dfbeacf74865a8d62a4f70f501cdc61510f8e0
Author: Kuppuswamy Sathyanarayanan 
Date:   Fri Mar 27 17:33:24 2020 -0500

PCI/ERR: Combine pci_channel_io_frozen cases

is a code readability change, and makes no functional changes.


Testcase:

On a system with DPC enabled, setpci may be used to set the DPC Software
Trigger bit (bit 6, value 0x40) in the DPC Control register of a suitable
PCIe device (a PCIe bridge, for example).

On a system lacking the fix, the output will be as shown above (i.e.,
culminating in the "device recovery failed" message).  With the fix
applied, the device successfully recovers, resulting in a message of the
form

pcieport :d9:01.0: AER: Device recovery successful


Regression Potential:

The risk of regression is low, as (a) the path in question currently does
not work, and (b) the changes are minimal, comprising only a housekeeping
change and the logically correct updating of a status variable that did
not previously occur.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1873537

Title:
  PCIe AER device recovery failed due to logic flaw

Status in linux package in Ubuntu:
  New

Bug description:
  SRU Justification

  Impact:

  During PCI Express Downstream Port Containment (DPC) recovery,
  certain types of failures do not recover due to a logic flaw
  in pcie_do_recovery().

  The upstream git commit log explains the change:

  PCI/ERR: Update error status after reset_link()
  Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
  reset_link() to recover from fatal errors.  But during fatal error
  recovery, if the initial value of error status is PCI_ERS_RESULT_DISCONNECT
  or PCI_ERS_RESULT_NO_AER_DRIVER then even after successful recovery (using
  reset_link()) pcie_do_recovery() will report the recovery result as
  failure.  Update the status of error after reset_link().

  You can reproduce this issue by triggering a SW DPC using "DPC Software
  Trigger" bit in "DPC Control Register".  You should see recovery failed
  dmesg log as below:

pcieport :00:16.0: DPC: containment event, status:0x1f27 source:0x
pcieport :00:16.0: DPC: software trigger detected
pci :04:00.0: AER: can't recover (no error_detected callback)
pcieport :00:16.0: AER: device recovery failed

  Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
  Link: 
https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.158584.git.sathyanarayanan.kuppusw...@linux.intel.com
  [bhelgaas: split pci_channel_io_frozen simplification to separate patch]
  Signed-off-by: Kuppuswamy Sathyanarayanan 

  Signed-off-by: Bjorn Helgaas 
  Acked-by: Keith Busch 
  Cc: Ashok Raj 

  Note that a second prerequisite patch is necessary as well.  This
  patch,

  commit b5dfbeacf74865a8d62a4f70f501cdc61510f8e0
  Author: Kuppuswamy Sathyanarayanan 

  Date:   Fri Mar 27 17:33:24 2020 -0500

  PCI/ERR: Combine pci_channel_io_frozen cases

  is a code readability change, and makes no functional changes.

  
  Testcase:

  On a system with DPC enabled, setpci may be used to set the DPC Software
  Trigger bit (bit 6, value 0x40) in the DPC Control register of a suitable
  PCIe device (a PCIe bridge, for example).

[Kernel-packages] [Bug 1869423] [NEW] Restore kernel control of PCIe DPC via option

2020-03-27 Thread Jay Vosburgh

Public bug reported:


SRU Justification:

Impact:

Since upstream commit eed85ff4c0da7 (4.16), control of PCIe DPC
(Downstream Port Containment) is coupled with control of AER (Advanced
Error Reporting), eliminating the option for the kernel to separately
manage DPC (which was previously the default behavior).

Fix:

The upstream commit log explains the change:

commit 35a0b2378c199d4f26e458b2ca38ea56aaf2d9b8
Author: Olof Johansson 
Date:   Wed Oct 23 12:22:05 2019 -0700

PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER control

Prior to eed85ff4c0da7 ("PCI/DPC: Enable DPC only if AER is available"),
Linux handled DPC events regardless of whether firmware had granted it
ownership of AER or DPC, e.g., via _OSC.

PCIe r5.0, sec 6.2.10, recommends that the OS link control of DPC to
control of AER, so after eed85ff4c0da7, Linux handles DPC events only if it
has control of AER.

On platforms that do not grant OS control of AER via _OSC, Linux DPC
handling worked before eed85ff4c0da7 but not after.

To make Linux DPC handling work on those platforms the same way they did
before, add a "pcie_ports=dpc-native" kernel parameter that makes Linux
handle DPC events regardless of whether it has control of AER.

[bhelgaas: commit log, move pcie_ports_dpc_native to drivers/pci/]
Link: https://lore.kernel.org/r/20191023192205.97024-1-o...@lixom.net
Signed-off-by: Olof Johansson 
Signed-off-by: Bjorn Helgaas 

Testcase:

Control of DPC can be determined from kernel boot messages when
pciehp probes a capable slot; when the kernel controls DPC, messages
of the format:

pcieport :2d:00.0: pciehp: Slot #0
pcieport :2d:00.0: DPC: error containment capabilities:

will appear; if the kernel does not control DPC, the DPC line will
not be present (only the "pciehp: Slot" message).

Additionally, devices bound to the kernel DPC PCIe port service
driver will be found in the /sys/bus/pci_express/drivers/dpc/ sysfs
directory; this will be empty of devices if the kernel does not control
DPC.

Regression Potential:

The risk of regression is low as (a) by default, the patch has no
effect (the default setting is to not enable the option), and (b) when
enabled, the patch restores functionality that previously worked, and was,
in fact, the default behavior.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1869423

Title:
  Restore kernel control of PCIe DPC via option

Status in linux package in Ubuntu:
  New

Bug description:
  
  SRU Justification:

  Impact:

Since upstream commit eed85ff4c0da7 (4.16), control of PCIe DPC
  (Downstream Port Containment) is coupled with control of AER (Advanced
  Error Reporting), eliminating the option for the kernel to separately
  manage DPC (which was previously the default behavior).

  Fix:

  The upstream commit log explains the change:

  commit 35a0b2378c199d4f26e458b2ca38ea56aaf2d9b8
  Author: Olof Johansson 
  Date:   Wed Oct 23 12:22:05 2019 -0700

  PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER control
  
  Prior to eed85ff4c0da7 ("PCI/DPC: Enable DPC only if AER is available"),
  Linux handled DPC events regardless of whether firmware had granted it
  ownership of AER or DPC, e.g., via _OSC.
  
  PCIe r5.0, sec 6.2.10, recommends that the OS link control of DPC to
  control of AER, so after eed85ff4c0da7, Linux handles DPC events only if 
it
  has control of AER.
  
  On platforms that do not grant OS control of AER via _OSC, Linux DPC
  handling worked before eed85ff4c0da7 but not after.
  
  To make Linux DPC handling work on those platforms the same way they did
  before, add a "pcie_ports=dpc-native" kernel parameter that makes Linux
  handle DPC events regardless of whether it has control of AER.
  
  [bhelgaas: commit log, move pcie_ports_dpc_native to drivers/pci/]
  Link: https://lore.kernel.org/r/20191023192205.97024-1-o...@lixom.net
  Signed-off-by: Olof Johansson 
  Signed-off-by: Bjorn Helgaas 

  Testcase:

Control of DPC can be determined from kernel boot messages when
  pciehp probes a capable slot; when the kernel controls DPC, messages
  of the format:

  pcieport :2d:00.0: pciehp: Slot #0
  pcieport :2d:00.0: DPC: error containment capabilities:

will appear; if the kernel does not control DPC, the DPC line will
  not be present (only the "pciehp: Slot" message).

Additionally, devices bound to the kernel DPC PCIe port service
  driver will be found in the /sys/bus/pci_express/drivers/dpc/ sysfs
  directory; this will be empty of devices if the kernel does not control
  DPC.

  Regression

[Kernel-packages] [Bug 1805693] [NEW] User reports a hang on 18.04 LTS(4.15.18) under a heavy I/O load

2018-11-28 Thread Jay Vosburgh

Public bug reported:

User reports a hang under heavy I/O:

The IO hang problem on our cloud is caused by IO hang in block-wbt wbt_wait.
The fix commit id is 2887e41b910bb14fd847cf01ab7a5993db989d88. It is a block 
write buffer throttle queue lock contention and thundering herd issue in 
wbt_wait()

We can recreate the problem easily by running concurrent IO from
multiple VMs with sequential write. We can provide fio workload as
needed for recreate.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1805693

Title:
  User reports a hang on 18.04 LTS(4.15.18) under a heavy I/O load

Status in linux package in Ubuntu:
  New

Bug description:
  User reports a hang under heavy I/O:

  The IO hang problem on our cloud is caused by IO hang in block-wbt wbt_wait.
  The fix commit id is 2887e41b910bb14fd847cf01ab7a5993db989d88. It is a block 
write buffer throttle queue lock contention and thundering herd issue in 
wbt_wait()

  We can recreate the problem easily by running concurrent IO from
  multiple VMs with sequential write. We can provide fio workload as
  needed for recreate.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805693/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1800254] Re: packet socket panic in Trusty 3.13.0-157 and later

2018-11-07 Thread Jay Vosburgh

** Tags removed: verification-needed-trusty

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1800254

Title:
  packet socket panic in Trusty 3.13.0-157 and later

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  SRU Justification:

  Due to changes added as part of c108ac876c02 ("packet: hold bind lock when
  rebinding to fanout hook"), it is possible for fanout_add to add a
  packet_type handler via dev_add_pack and then kfree the memory backing the
  packet_type.  This corrupts the ptype_all list, causing the system to
  panic when network packet processing next traverses ptype_all.  The
  erroneous path is taken when a PACKET_FANOUT setsockopt is performed on a
  packet socket that is bound to an interface that is administratively down.

  This is not due to any flaw of c108ac876c02, but rather than the packet
  socket code base differs subtly in 3.13 as compared to 4.4.

  This affects only the Trusty 3.13 kernel series, starting with
  3.13.0-157.

  Fix:

  The remedy for this is to backport additional changes in the management of
  the dev_add_pack calls from 4.4.  This moves the dev_add_pack and
  dev_remove_pack calls from fanout_add and _release into __fanout_link and
  _unlink.

  Testcase:

  The issue can be reproduced reliably by (a) creating an AF_PACKET socket,
  binding it to an interface that is administratively down, and then (c)
  attempting to set the PACKET_FANOUT sockopt.  The setsockopt call will
  fail, but will corrupt ptype_all in the kernel.  Subsequent network traffic
  will induce a panic when evaulating the corrupted ptype_all entry.  A
  test program is attached.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1800254/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1800254] Re: packet socket panic in Trusty 3.13.0-157 and later

2018-10-26 Thread Jay Vosburgh

Reproducer for ptype_all corruption. Pass ifindex of an
administratively down interface on the command line.

** Attachment added: "packet-fry.c"

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1800254/+attachment/5206100/+files/packet-fry.c

--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1800254

Title:
packet socket panic in Trusty 3.13.0-157 and later

Status in linux package in Ubuntu:
New

Bug description:
SRU Justification:

Due to changes added as part of c108ac876c02 ("packet: hold bind lock when
rebinding to fanout hook"), it is possible for fanout_add to add a
packet_type handler via dev_add_pack and then kfree the memory backing the
packet_type. This corrupts the ptype_all list, causing the system to
panic when network packet processing next traverses ptype_all. The
erroneous path is taken when a PACKET_FANOUT setsockopt is performed on a
packet socket that is bound to an interface that is administratively down.

This is not due to any flaw of c108ac876c02, but rather than the packet
socket code base differs subtly in 3.13 as compared to 4.4.

This affects only the Trusty 3.13 kernel series, starting with
3.13.0-157.

Fix:

The remedy for this is to backport additional changes in the management of
the dev_add_pack calls from 4.4. This moves the dev_add_pack and
dev_remove_pack calls from fanout_add and _release into __fanout_link and
_unlink.

Testcase:

The issue can be reproduced reliably by (a) creating an AF_PACKET socket,
binding it to an interface that is administratively down, and then (c)
attempting to set the PACKET_FANOUT sockopt. The setsockopt call will
fail, but will corrupt ptype_all in the kernel. Subsequent network traffic
will induce a panic when evaulating the corrupted ptype_all entry. A
test program is attached.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1800254/+subscriptions

--
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1800254] [NEW] packet socket panic in Trusty 3.13.0-157 and later

2018-10-26 Thread Jay Vosburgh

Public bug reported:

SRU Justification:

Due to changes added as part of c108ac876c02 ("packet: hold bind lock when
rebinding to fanout hook"), it is possible for fanout_add to add a
packet_type handler via dev_add_pack and then kfree the memory backing the
packet_type.  This corrupts the ptype_all list, causing the system to
panic when network packet processing next traverses ptype_all.  The
erroneous path is taken when a PACKET_FANOUT setsockopt is performed on a
packet socket that is bound to an interface that is administratively down.

This is not due to any flaw of c108ac876c02, but rather than the packet
socket code base differs subtly in 3.13 as compared to 4.4.

This affects only the Trusty 3.13 kernel series, starting with
3.13.0-157.

Fix:

The remedy for this is to backport additional changes in the management of
the dev_add_pack calls from 4.4.  This moves the dev_add_pack and
dev_remove_pack calls from fanout_add and _release into __fanout_link and
_unlink.

Testcase:

The issue can be reproduced reliably by (a) creating an AF_PACKET socket,
binding it to an interface that is administratively down, and then (c)
attempting to set the PACKET_FANOUT sockopt.  The setsockopt call will
fail, but will corrupt ptype_all in the kernel.  Subsequent network traffic
will induce a panic when evaulating the corrupted ptype_all entry.  A
test program is attached.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1800254

Title:
  packet socket panic in Trusty 3.13.0-157 and later

Status in linux package in Ubuntu:
  New

Bug description:
  SRU Justification:

  Due to changes added as part of c108ac876c02 ("packet: hold bind lock when
  rebinding to fanout hook"), it is possible for fanout_add to add a
  packet_type handler via dev_add_pack and then kfree the memory backing the
  packet_type.  This corrupts the ptype_all list, causing the system to
  panic when network packet processing next traverses ptype_all.  The
  erroneous path is taken when a PACKET_FANOUT setsockopt is performed on a
  packet socket that is bound to an interface that is administratively down.

  This is not due to any flaw of c108ac876c02, but rather than the packet
  socket code base differs subtly in 3.13 as compared to 4.4.

  This affects only the Trusty 3.13 kernel series, starting with
  3.13.0-157.

  Fix:

  The remedy for this is to backport additional changes in the management of
  the dev_add_pack calls from 4.4.  This moves the dev_add_pack and
  dev_remove_pack calls from fanout_add and _release into __fanout_link and
  _unlink.

  Testcase:

  The issue can be reproduced reliably by (a) creating an AF_PACKET socket,
  binding it to an interface that is administratively down, and then (c)
  attempting to set the PACKET_FANOUT sockopt.  The setsockopt call will
  fail, but will corrupt ptype_all in the kernel.  Subsequent network traffic
  will induce a panic when evaulating the corrupted ptype_all entry.  A
  test program is attached.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1800254/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1771480] Re: WARNING: CPU: 28 PID: 34085 at /build/linux-90Gc2C/linux-3.13.0/net/core/dev.c:1433 dev_disable_lro+0x87/0x90()

2018-05-16 Thread Jay Vosburgh

The dev_disable_lro warning is happening due to some logic issues in the
features code.  The LRO on the VLAN (bond0.200, e.g.) that's being
warned about does end up being disabled by a NETDEV_FEAT_CHANGE callback
when the underlying bond0's features are updated, so the warning is
spurious.

Tracing the dev_disable_lro -> netdev_update_features for the bond0.2004
VLAN, I see:

name="bond0" feat=219db89 hw_feat=20219cbe9 want_feat=20219cbe9
vlan_feat=198069

NETIF_F_LRO = 0x8000

dev_disable_lro
wanted_features &= ~NETIF_F_LRO
bond0.2004 wanted_features = 0x200194869# no LRO

__netdev_update_features
features = netdev_get_wanted_features
return (dev->features & ~dev->hw_features) | dev->wanted_features;
(0x19d809 & ~0x23839487b) | 0x200194869
 ^LRO   ^no LRO^no LRO
0x9000 | 0x200194869
$2 = 0x20019d869
^ LRO

vlan_dev_fix_features(dev, 0x20019d869)   # has LRO

struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
netdev_features_t old_features = features;

features &= real_dev->vlan_features;# 0x198069 has LRO
features |= NETIF_F_RXCSUM; # 0x100198069 has LRO
features &= real_dev->features; # 0x198009 has LRO

features |= old_features & NETIF_F_SOFT_FEATURES;  # save GSO / GRO
features |= NETIF_F_LLTX;

return features; # will have LRO

So, basically, LRO is set in the underlying bond0's features, so it ends
up being kept in the VLAN device's features even though it wasn't in
wanted_features.  Later, dev_disable_lro will call dev_disable_lro on
all the lower devices (the bond0 in this case), and the update of
features for the bond0 will issue a NETDEV_FEAT_CHANGE callback to the
bond0.2004 VLAN, which will then set the features correctly.

The Ubuntu 3.13  __netdev_update_features (called by dev_disable_lro via
netdev_update_features) lacks additional logic found in later kernels to
sync the features to lower devices.  That presumably triggers the
NETDEV_FEAT_CHANGE within the call to __netdev_update_features so that
the bond0.2004 VLAN is updated before we return back to dev_disable_lro
(but I haven't verified this).

I suspect the fix to eliminate the warning is to apply the "sync_lower:"
block from a later kernel __netdev_update_features to 3.13, along with
the netdev_sync_lower_features function it uses.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771480

Title:
  WARNING: CPU: 28 PID: 34085 at /build/linux-
  90Gc2C/linux-3.13.0/net/core/dev.c:1433 dev_disable_lro+0x87/0x90()

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  I have multiple instances of this dev_disable_lro error in kern.log. Also 
seeing this: 
  systemd-udevd[1452]: timeout: killing 'bridge-network-interface' [2765] 
  <4>May 1 22:56:42 xxx kernel: [ 404.520990] bonding: bond0: Warning: No 
802.3ad response from the link partner for any adapters in the bond 
  <4>May 1 22:56:44 xxx kernel: [ 406.926429] bonding: bond0: Warning: No 
802.3ad response from the link partner for any adapters in the bond 
  <4>May 1 22:56:45 xxx kernel: [ 407.569020] [ cut here 
] 
  <4>May 1 22:56:45 xxx kernel: [ 407.569029] WARNING: CPU: 28 PID: 34085 at 
/build/linux-90Gc2C/linux-3.13.0/net/core/dev.c:1433 
dev_disable_lro+0x87/0x90() 
  <4>May 1 22:56:45 xxx kernel: [ 407.569032] netdevice: bond0.2004 
  <4>May 1 22:56:45 xxx kernel: [ 407.569032] failed to disable LRO! 
  <4>May 1 22:56:45 xxx kernel: [ 407.569035] Modules linked in: 8021q garp mrp 
bridge stp llc bonding iptable_filter ip_tables x_tables nf_conntrack_proto_gre 
nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack 
ipmi_devintf mxm_wmi dcdbas x86_pkg_temp_thermal coretemp kvm_intel kvm 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw 
gf128mul glue_helper ablk_helper cryptd lpc_ich mei_me mei ipmi_si shpchp wmi 
acpi_power_meter mac_hid xfs libcrc32c raid10 usb_storage raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 
raid0 igb ixgbe i2c_algo_bit multipath ahci dca ptp libahci pps_core linear 
megaraid_sas mdio dm_multipath scsi_dh 
  <4>May 1 22:56:45 xxx kernel: [ 407.569112] CPU: 28 PID: 34085 Comm: brctl 
Not tainted 3.13.0-142-generic #191-Ubuntu 
  <4>May 1 22:56:45 xxx kernel: [ 407.569115] Hardware name: Dell Inc. 
PowerEdge R730xd/072T6D, BIOS 2.7.1 001/22/2018 
  <4>May 1 22:56:45 xxx kernel: [ 407.569118]  881fcc753c70 
8172e7fc 881fcc753cb8 
  <4>May 1 22:56:45 xxx kernel: [ 407.569129] 0009 881fcc753ca8 
8106afad 883fcc6f8000 
  <4>May 1 22:56:45 xxx kernel: [ 407.569139] 883fcc696880 883fcc6f8000 
 881fce82dd40 
  <4>May 1 22:56:45 xxx

[Kernel-packages] [Bug 1765241] Re: virtio_scsi race can corrupt memory, panic kernel

2018-05-01 Thread Jay Vosburgh

** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1765241

Title:
  virtio_scsi race can corrupt memory, panic kernel

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  There's a race in the virtio_scsi driver (for all kernels)

  That race is inadvertently avoided on most kernels due to a
  synchronize_rcu call coincidentally placed in one of the racing code paths

  By happenstance, the set of patches backported to the Ubuntu
  4.4 kernel ended up without a synchronize_rcu in the relevant place. The
  issue first manifests with 

  
  commit be2a20802abbde04ae09846406d7b670731d97d2
  Author: Jan Kara 
  Date:   Wed Feb 8 08:05:56 2017 +0100

  block: Move bdi_unregister() to del_gendisk()
  
  BugLink: http://bugs.launchpad.net/bugs/1659111

  The race can cause a kernel panic due to corruption of a freelist
  pointer in a slab cache.  The panics occur in arbitrary places as
  the failure occurs at an allocation after the corruption of the
  pointer.  However, the most common failure observed has been within
  virtio_scsi itself during probe processing, e.g.:

  [3.111628]  [] kfree_const+0x22/0x30
  [3.112340]  [] kobject_release+0x94/0x190
  [3.113126]  [] kobject_put+0x27/0x50
  [3.113838]  [] put_device+0x17/0x20
  [3.114568]  [] __scsi_remove_device+0x92/0xe0
  [3.115401]  [] scsi_probe_and_add_lun+0x95b/0xe80
  [3.116287]  [] ? kmem_cache_alloc_trace+0x183/0x1f0
  [3.117227]  [] ? __pm_runtime_resume+0x5b/0x80
  [3.118048]  [] __scsi_scan_target+0x10a/0x690
  [3.118879]  [] scsi_scan_channel+0x7e/0xa0
  [3.119653]  [] scsi_scan_host_selected+0xf3/0x160
  [3.120506]  [] do_scsi_scan_host+0x8d/0x90
  [3.121295]  [] do_scan_async+0x1c/0x190
  [3.122048]  [] async_run_entry_fn+0x48/0x150
  [3.122846]  [] process_one_work+0x165/0x480
  [3.123732]  [] worker_thread+0x4b/0x4d0
  [3.124508]  [] ? process_one_work+0x480/0x480

  
  Details on the race:

  CPU A:

  virtscsi_probe
  [...]
  __scsi_scan_target
  scsi_probe_and_add_lun  [on return calls  __scsi_remove_device, below]
  scsi_probe_lun  
  [...]
  blk_execute_rq

  blk_execute_rq waits for the completion event, and then on wakeup
  returns up to scsi_probe_and_all_lun, which calls __scsi_remove_device.
  In order for the race to occur, the wakeup must occur on a CPU other than
  CPU B.

  After being woken up by the completion of the request, the call
  returns up the stack to scsi_probe_and_add_lun, which calls
  __scsi_remove_device:

  __scsi_remove_device
  blk_cleanup_queue
  [ no longer calls bdi_unregister ]
  scsi_target_reap(scsi_target(sdev))
  scsi_target_reap_ref_put
  kref_put
  kref_sub
  scsi_target_reap_ref_release
  scsi_target_destroy
  ->target_destroy() = virtscsi_target_destroy
  kfree(tgt)  <=== FREE TGT

  Note that the removal of the call to bdi_unregister in commit

xenial be2a20802abbde block: Move bdi_unregister() to del_gendisk()

  and annotated above is the change that gates whether the issue
  manifests or not.  The other code change from be2a20802abbde has no effect
  on the race.

  CPU B:

  vring_interrupt
  virtscsi_complete_cmd
  scsi_mq_done (via ->scsi_done())
  scsi_mq_done
  blk_mq_complete_request
  __blk_mq_complete_request
  [...]
  blk_end_sync_rq
  complete
  [ wake up the task from CPU A ]

  After waking the CPU A task, execution returns up the stack, and
  then calls atomic_dec(>reqs) in virtscsi_complete_cmd immediately
  after returning from the call to ->scsi_done.

  If the activity on CPU A after it is woken up (starting at
  __scsi_remove_device) finishes before CPU B can call atomic_dec() in
  virtscsi_complete_cmd, then the atomic_dec() will modify a free list
  pointer in the freed slab object that contained tgt.  This causes the
  system to panic on a subsequent allocation from the per-cpu slab cache.

  The call path on CPU B is significantly shorter than that on CPU A
  after wakeup, so it is likely that an external event delays CPU B.  This
  could be either an interrupt within the VM or scheduling delays for the
  virtual cpu process on the hypervisor.  Whatever the delay is, it is not
  the root cause but merely the triggering event.

  The virtscsi race window described above exists in all kernels
  that have been checked (4.4 upstream LTS, Ubuntu 4.13 and 4.15, and
  current mainline as of this writing).  However, none of those kernels
  exhibit the panic in testing, only the Ubuntu 4.4 kernel after commit
  be2a20802abbde.

  The reason none of those kernels panic is they all have one

[Kernel-packages] [Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-30 Thread Jay Vosburgh

We've seen a similar-sounding issue in the past, but couldn't get it
tracked down to the root cause.

Is it possible to enable some instrumentation in the /etc/network/interfaces and
obtain some data on a failing occurrence?

What we've used in the past is adding something like

pre-up echo 'file bond_3ad.c +p' > /sys/kernel/debug/dynamic_debug/control 
pre-up echo 'file bond_main.c +p' > /sys/kernel/debug/dynamic_debug/control

to the /e/n/i section for the bond itself, and

post-up tcpdump -U -p -w /tmp/eth4.td -i eth4 ether proto 0x8809 &

to the sections for each slave in the bond (adjusting the "eth4" above
to the actual interface name).

The bond debug will appear in the kernel log, and the tcpdump data will
have to copied from the output file specified on the tcpdump command
line (and the tcpdump process terminated if need be).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1753662

Title:
  [i40e] LACP bonding start up race conditions

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Triaged

Bug description:
  When provisioning Ubuntu servers with MAAS at once, some bonding pairs
  will have unexpected LACP status such as "Expired". It randomly
  happens at each provisioning with the default xenial kernel(4.4), but
  not reproducible with HWE kernel(4.13). I'm using Intel X710 cards
  (Dell-branded).

  Using the HWE kernel works as a workaround for short term, but it's
  not ideal since 4.13 is not covered by Canonical Livepatch service.

  How to reproduce:
  1. configure LACP bonding with MAAS
  2. provision machines
  3. check the bonding status in /proc/net/bonding/bond*

  frequency of occurrence:
  About 5 bond pairs in 22 pairs at each provisioning.

  [reproducible combination]
  $ uname -a
  Linux comp006 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 1.4.25-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: :01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  [non-reproducible combination]
  $ uname -a
  Linux comp006 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 2.1.14-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: :01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-116-generic 4.4.0-116.140
  ProcVersionSignature: Ubuntu 4.4.0-116.140-generic 4.4.98
  Uname: Linux 4.4.0-116-generic x86_64
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar  6 06:37 seq
   crw-rw 1 root audio 116, 33 Mar  6 06:37 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.1-0ubuntu2.15
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Tue Mar  6 06:46:32 2018
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb:
   Bus 002 Device 002: ID 8087:8002 Intel Corp. 
   Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
   Bus 001 Device 002: ID 8087:800a Intel Corp. 
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Dell Inc. PowerEdge R730
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-116-generic.efi.signed 
root=UUID=0528f88e-cf1a-43e2-813a-e7261b88d460 ro console=tty0 
console=ttyS0,115200n8
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-116-generic N/A
   linux-backports-modules-4.4.0-116-generic  N/A
   linux-firmware 1.157.17
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 08/16/2017
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 2.5.5
  dmi.board.name: 072T6D
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A08
  dmi.chassis.asset.tag: 0018880
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvr2.5.5:bd08/16/2017:svnDellInc.:pnPowerEdgeR730:pvr:rvnDellInc.:rn072T6D:rvrA08:cvnDellInc.:ct23:cvr:
  dmi.product.name: PowerEdge R730
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:

[Kernel-packages] [Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-26 Thread Jay Vosburgh

I would suggest testing

commit de77ecd4ef02ca783f7762e04e92b3d0964be66b
Author: Mahesh Bandewar 
Date:   Mon Mar 27 11:37:33 2017 -0700

bonding: improve link-status update in mii-monitoring

and

commit d94708a553022bf012fa95af10532a134eeb5a52
Author: WANG Cong 
Date:   Tue Jul 25 09:44:25 2017 -0700

bonding: commit link status change after propose


backported to 4.4.0-120 (in the order above; the second is a fix to the first).

The first patch initially appears in 4.12-rc1, the second in 4.13.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1753662

Title:
  [i40e] LACP bonding start up race conditions

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Triaged

Bug description:
  When provisioning Ubuntu servers with MAAS at once, some bonding pairs
  will have unexpected LACP status such as "Expired". It randomly
  happens at each provisioning with the default xenial kernel(4.4), but
  not reproducible with HWE kernel(4.13). I'm using Intel X710 cards
  (Dell-branded).

  Using the HWE kernel works as a workaround for short term, but it's
  not ideal since 4.13 is not covered by Canonical Livepatch service.

  How to reproduce:
  1. configure LACP bonding with MAAS
  2. provision machines
  3. check the bonding status in /proc/net/bonding/bond*

  frequency of occurrence:
  About 5 bond pairs in 22 pairs at each provisioning.

  [reproducible combination]
  $ uname -a
  Linux comp006 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 1.4.25-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: :01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  [non-reproducible combination]
  $ uname -a
  Linux comp006 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 2.1.14-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: :01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-116-generic 4.4.0-116.140
  ProcVersionSignature: Ubuntu 4.4.0-116.140-generic 4.4.98
  Uname: Linux 4.4.0-116-generic x86_64
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar  6 06:37 seq
   crw-rw 1 root audio 116, 33 Mar  6 06:37 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.1-0ubuntu2.15
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Tue Mar  6 06:46:32 2018
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb:
   Bus 002 Device 002: ID 8087:8002 Intel Corp. 
   Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
   Bus 001 Device 002: ID 8087:800a Intel Corp. 
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Dell Inc. PowerEdge R730
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-116-generic.efi.signed 
root=UUID=0528f88e-cf1a-43e2-813a-e7261b88d460 ro console=tty0 
console=ttyS0,115200n8
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-116-generic N/A
   linux-backports-modules-4.4.0-116-generic  N/A
   linux-firmware 1.157.17
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 08/16/2017
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 2.5.5
  dmi.board.name: 072T6D
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A08
  dmi.chassis.asset.tag: 0018880
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvr2.5.5:bd08/16/2017:svnDellInc.:pnPowerEdgeR730:pvr:rvnDellInc.:rn072T6D:rvrA08:cvnDellInc.:ct23:cvr:
  dmi.product.name: PowerEdge R730
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1765241] Re: virtio_scsi race can corrupt memory, panic kernel

2018-04-19 Thread Jay Vosburgh

SRU Justification:

Impact:

 This issue can cause system panics of systems using the
virtio_scsi driver with the affected Ubuntu kernels. The issue manifests
irregularly, as it is timing dependent.

Fix:

 The issue is resolved by adding synchronization between the two
code paths that race with one another. The most straightforward fix
is to have the code wait for any outstanding
requests to drain prior to freeing the target structure, e.g.,

--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -762,6 +762,10 @@ static int virtscsi_target_alloc(struct scsi_target 
*starget)
 static void virtscsi_target_destroy(struct scsi_target *starget)
 {
struct virtio_scsi_target_state *tgt = starget->hostdata;
+
+ /* we can race with concurrent virtscsi_complete_cmd */
+ while (atomic_read(>reqs))
+ cpu_relax();
kfree(tgt);
 }


An alternative fix that was considered is to use a synchronize_rcu_expedited
call, as that is the functionality that blocks the race in unaffected kernels.
However, some call paths into virtscsi_target_destroy may hold mutexes that
are not held by the upstream RCU sync calls (which enter via the block layer).
For this reason the more confined fix described above was chosen.

Testcase:

This reproduces on Google Cloud, using the current, unmodified
ubuntu-1404-lts image (with the Ubuntu 4.4 kernel). Using the two attached
scripts, run e.g.

  ./create_shutdown_instance.sh 100

to create 100 instances. If an instance runs its startup script
successfully, it'll shut itself down right away. So instances that are
still running after a few minutes likely demonstrate this problem.

The issue reproduces easily with n1-standard-4.

create_shutdown_instance.sh:

#!/bin/bash -e

ZONE=us-central1-a

for i in $(seq -w $1); do
  gcloud compute instances create shutdown-experiment-$i \
--zone="${ZONE}" \
--image-family=ubuntu-1404-lts \
--image-project=ubuntu-os-cloud \
--machine-type=n1-standard-4 \
--scopes compute-rw \
--metadata-from-file startup-script=immediate_shutdown.sh &
done

wait

immediate_shutdown.sh:

#!/bin/bash -x

function get_metadata_value() {
  curl -H 'Metadata-Flavor: Google' \
"http://metadata.google.internal/computeMetadata/v1/instance/$1;
}

readonly ZONE="$(get_metadata_value zone | awk -F'/' '{print $NF}')"
gcloud compute instances delete "$(hostname)" --zone="${ZONE}" --quiet

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1765241

Title:
  virtio_scsi race can corrupt memory, panic kernel

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  There's a race in the virtio_scsi driver (for all kernels)

  That race is inadvertently avoided on most kernels due to a
  synchronize_rcu call coincidentally placed in one of the racing code paths

  By happenstance, the set of patches backported to the Ubuntu
  4.4 kernel ended up without a synchronize_rcu in the relevant place. The
  issue first manifests with 

  
  commit be2a20802abbde04ae09846406d7b670731d97d2
  Author: Jan Kara 
  Date:   Wed Feb 8 08:05:56 2017 +0100

  block: Move bdi_unregister() to del_gendisk()
  
  BugLink: http://bugs.launchpad.net/bugs/1659111

  The race can cause a kernel panic due to corruption of a freelist
  pointer in a slab cache.  The panics occur in arbitrary places as
  the failure occurs at an allocation after the corruption of the
  pointer.  However, the most common failure observed has been within
  virtio_scsi itself during probe processing, e.g.:

  [3.111628]  [] kfree_const+0x22/0x30
  [3.112340]  [] kobject_release+0x94/0x190
  [3.113126]  [] kobject_put+0x27/0x50
  [3.113838]  [] put_device+0x17/0x20
  [3.114568]  [] __scsi_remove_device+0x92/0xe0
  [3.115401]  [] scsi_probe_and_add_lun+0x95b/0xe80
  [3.116287]  [] ? kmem_cache_alloc_trace+0x183/0x1f0
  [3.117227]  [] ? __pm_runtime_resume+0x5b/0x80
  [3.118048]  [] __scsi_scan_target+0x10a/0x690
  [3.118879]  [] scsi_scan_channel+0x7e/0xa0
  [3.119653]  [] scsi_scan_host_selected+0xf3/0x160
  [3.120506]  [] do_scsi_scan_host+0x8d/0x90
  [3.121295]  [] do_scan_async+0x1c/0x190
  [3.122048]  [] async_run_entry_fn+0x48/0x150
  [3.122846]  [] process_one_work+0x165/0x480
  [3.123732]  [] worker_thread+0x4b/0x4d0
  [3.124508]  [] ? process_one_work+0x480/0x480

  
  Details on the race:

  CPU A:

  virtscsi_probe
  [...]
  __scsi_scan_target
  scsi_probe_and_add_lun  [on return calls  __scsi_remove_device, below]
  scsi_probe_lun  
  [...]
  blk_execute_rq

  blk_execute_rq waits for the completion event, and then on wakeup
  returns up to scsi_probe_and_all_lun, which calls __scsi_remove_device.
  In order for the race to occur, the wakeup must occur on a CPU other than
  CPU B.

  After being woken up by the completion of the

[Kernel-packages] [Bug 1765241] Re: virtio_scsi race can corrupt memory, panic kernel

2018-04-19 Thread Jay Vosburgh

SRU Justification:

Impact:

This issue can cause system panics of systems using the
virtio_scsi driver with the affected Ubuntu kernels.  The issue manifests
irregularly, as it is timing dependent.

Fix:

The issue is resolved by adding synchronization between the two
code paths that race with one another.  The lowest regression risk is to
use a synchronize_rcu_expedited call, as that is the functionality that
blocks the race in unaffected kernels.

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 03a2aad..c122e68 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -762,6 +762,9 @@ static int virtscsi_target_alloc(struct scsi_target 
*starget)
 static void virtscsi_target_destroy(struct scsi_target *starget)
 {
struct virtio_scsi_target_state *tgt = starget->hostdata;
+
+   /* we can race with concurrent virtscsi_complete_cmd */
+   synchronize_rcu_expedited();
kfree(tgt);
 }
 

It is also possible to have the code wait for any outstanding
requests to drain prior to freeing the target structure, e.g.,

--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -762,6 +762,10 @@ static int virtscsi_target_alloc(struct scsi_target 
*starget)
 static void virtscsi_target_destroy(struct scsi_target *starget)
 {
struct virtio_scsi_target_state *tgt = starget->hostdata;
+
+   /* we can race with concurrent virtscsi_complete_cmd */
+   while (atomic_read(>reqs))
+   cpu_relax();
kfree(tgt);
 }

This completes a bit faster for the usual case, but SCSI target
destroy is not a fast path and the above runs the risk of the loop never
terminating.


Testcase:

This reproduces on Google Cloud, using the current, unmodified
ubuntu-1404-lts image (with the Ubuntu 4.4 kernel). Using the two attached
scripts, run e.g.

  ./create_shutdown_instance.sh 100

to create 100 instances. If an instance runs its startup script
successfully, it'll shut itself down right away. So instances that are
still running after a few minutes likely demonstrate this problem.

The issue reproduces easily with n1-standard-4.

create_shutdown_instance.sh:

#!/bin/bash -e

ZONE=us-central1-a

for i in $(seq -w $1); do
  gcloud compute instances create shutdown-experiment-$i \
--zone="${ZONE}" \
--image-family=ubuntu-1404-lts \
--image-project=ubuntu-os-cloud \
--machine-type=n1-standard-4 \
--scopes compute-rw \
--metadata-from-file startup-script=immediate_shutdown.sh &
done

wait

immediate_shutdown.sh:

#!/bin/bash -x

function get_metadata_value() {
  curl -H 'Metadata-Flavor: Google' \
"http://metadata.google.internal/computeMetadata/v1/instance/$1;
}

readonly ZONE="$(get_metadata_value zone | awk -F'/' '{print $NF}')"
gcloud compute instances delete "$(hostname)" --zone="${ZONE}" --quiet

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1765241

Title:
  virtio_scsi race can corrupt memory, panic kernel

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  There's a race in the virtio_scsi driver (for all kernels)

  That race is inadvertently avoided on most kernels due to a
  synchronize_rcu call coincidentally placed in one of the racing code paths

  By happenstance, the set of patches backported to the Ubuntu
  4.4 kernel ended up without a synchronize_rcu in the relevant place. The
  issue first manifests with 

  
  commit be2a20802abbde04ae09846406d7b670731d97d2
  Author: Jan Kara 
  Date:   Wed Feb 8 08:05:56 2017 +0100

  block: Move bdi_unregister() to del_gendisk()
  
  BugLink: http://bugs.launchpad.net/bugs/1659111

  The race can cause a kernel panic due to corruption of a freelist
  pointer in a slab cache.  The panics occur in arbitrary places as
  the failure occurs at an allocation after the corruption of the
  pointer.  However, the most common failure observed has been within
  virtio_scsi itself during probe processing, e.g.:

  [3.111628]  [] kfree_const+0x22/0x30
  [3.112340]  [] kobject_release+0x94/0x190
  [3.113126]  [] kobject_put+0x27/0x50
  [3.113838]  [] put_device+0x17/0x20
  [3.114568]  [] __scsi_remove_device+0x92/0xe0
  [3.115401]  [] scsi_probe_and_add_lun+0x95b/0xe80
  [3.116287]  [] ? kmem_cache_alloc_trace+0x183/0x1f0
  [3.117227]  [] ? __pm_runtime_resume+0x5b/0x80
  [3.118048]  [] __scsi_scan_target+0x10a/0x690
  [3.118879]  [] scsi_scan_channel+0x7e/0xa0
  [3.119653]  [] scsi_scan_host_selected+0xf3/0x160
  [3.120506]  [] do_scsi_scan_host+0x8d/0x90
  [3.121295]  [] do_scan_async+0x1c/0x190
  [3.122048]  [] async_run_entry_fn+0x48/0x150
  [3.122846]  [] process_one_work+0x165/0x480
  [3.123732]  [] worker_thread+0x4b/0x4d0
  [3.124508]  [] ? process_one_work+0x480/0x480

  
  Details on

[Kernel-packages] [Bug 1765241] [NEW] virtio_scsi race can corrupt memory, panic kernel

2018-04-18 Thread Jay Vosburgh

s the race
window on the Ubuntu 4.4 kernel.

Resolving the issue can be accomplished by adding an RCU sync
to virtscsi_target_destroy prior to freeing the target.  It is also possible
to use a loop of the format:

+   while (atomic_read(>reqs))
+   cpu_relax();

but this is higher risk as the loop is non-terminating in the case
of other failure.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Assignee: Jay Vosburgh (jvosburgh)
 Status: Confirmed

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Jay Vosburgh (jvosburgh)

** Changed in: linux (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1765241

Title:
  virtio_scsi race can corrupt memory, panic kernel

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  There's a race in the virtio_scsi driver (for all kernels)

  That race is inadvertently avoided on most kernels due to a
  synchronize_rcu call coincidentally placed in one of the racing code paths

  By happenstance, the set of patches backported to the Ubuntu
  4.4 kernel ended up without a synchronize_rcu in the relevant place. The
  issue first manifests with 

  
  commit be2a20802abbde04ae09846406d7b670731d97d2
  Author: Jan Kara <j...@suse.cz>
  Date:   Wed Feb 8 08:05:56 2017 +0100

  block: Move bdi_unregister() to del_gendisk()
  
  BugLink: http://bugs.launchpad.net/bugs/1659111

  The race can cause a kernel panic due to corruption of a freelist
  pointer in a slab cache.  The panics occur in arbitrary places as
  the failure occurs at an allocation after the corruption of the
  pointer.  However, the most common failure observed has been within
  virtio_scsi itself during probe processing, e.g.:

  [3.111628]  [] kfree_const+0x22/0x30
  [3.112340]  [] kobject_release+0x94/0x190
  [3.113126]  [] kobject_put+0x27/0x50
  [3.113838]  [] put_device+0x17/0x20
  [3.114568]  [] __scsi_remove_device+0x92/0xe0
  [3.115401]  [] scsi_probe_and_add_lun+0x95b/0xe80
  [3.116287]  [] ? kmem_cache_alloc_trace+0x183/0x1f0
  [3.117227]  [] ? __pm_runtime_resume+0x5b/0x80
  [3.118048]  [] __scsi_scan_target+0x10a/0x690
  [3.118879]  [] scsi_scan_channel+0x7e/0xa0
  [3.119653]  [] scsi_scan_host_selected+0xf3/0x160
  [3.120506]  [] do_scsi_scan_host+0x8d/0x90
  [3.121295]  [] do_scan_async+0x1c/0x190
  [3.122048]  [] async_run_entry_fn+0x48/0x150
  [3.122846]  [] process_one_work+0x165/0x480
  [3.123732]  [] worker_thread+0x4b/0x4d0
  [3.124508]  [] ? process_one_work+0x480/0x480

  
  Details on the race:

  CPU A:

  virtscsi_probe
  [...]
  __scsi_scan_target
  scsi_probe_and_add_lun  [on return calls  __scsi_remove_device, below]
  scsi_probe_lun  
  [...]
  blk_execute_rq

  blk_execute_rq waits for the completion event, and then on wakeup
  returns up to scsi_probe_and_all_lun, which calls __scsi_remove_device.
  In order for the race to occur, the wakeup must occur on a CPU other than
  CPU B.

  After being woken up by the completion of the request, the call
  returns up the stack to scsi_probe_and_add_lun, which calls
  __scsi_remove_device:

  __scsi_remove_device
  blk_cleanup_queue
  [ no longer calls bdi_unregister ]
  scsi_target_reap(scsi_target(sdev))
  scsi_target_reap_ref_put
  kref_put
  kref_sub
  scsi_target_reap_ref_release
  scsi_target_destroy
  ->target_destroy() = virtscsi_target_destroy
  kfree(tgt)  <=== FREE TGT

  Note that the removal of the call to bdi_unregister in commit

xenial be2a20802abbde block: Move bdi_unregister() to del_gendisk()

  and annotated above is the change that gates whether the issue
  manifests or not.  The other code change from be2a20802abbde has no effect
  on the race.

  CPU B:

  vring_interrupt
  virtscsi_complete_cmd
  scsi_mq_done (via ->scsi_done())
  scsi_mq_done
  blk_mq_complete_request
  __blk_mq_complete_request
  [...]
  blk_end_sync_rq
  complete
  [ wake up the task from CPU A ]

  After waking the CPU A task, execution returns up the stack, and
  then calls atomic_dec(>reqs) in virtscsi_complete_cmd immediately
  after returning from the call to ->scsi_done.

  If the activity on CPU A after it is woken up (starting at
  __scsi_remove_device) finishes before CPU B can call atomic_dec() in
  virtscsi_complete_cmd, then the atomic_dec() will modify a free list
  pointer in the freed slab object that contained tgt.  This causes the
  system to panic on a subsequent allocation from the per-cpu slab cache.

  The call path on CPU B is significantly shorter than that on CPU A
  after wakeup, so it is likely that an external event delays CPU B.  This
  could be either an interrupt wit

[Kernel-packages] [Bug 1716747] Re: High system load and mouse delays - pipe A vblank wait timed out

2018-03-05 Thread Jay Vosburgh

Joe,

I didn't try anything in between, I went from 4.13.0-16 to -36 and -36
started wigging out again so I backed down to -16.  I can try some
interim kernels next week when I don't need to do work on the laptop in
question.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1716747

Title:
  High system load and mouse delays - pipe A vblank wait timed out

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Artful:
  In Progress

Bug description:
  This issue has been observed on a Lenovo X220i laptop:

  1. Booting the laptop with artful's 4.12 kernel results in a very sluggish 
system performance (mouse pointer delays) and a high system load.
  2. /var/log/kern.log indicates a problem with the display driver (see below)
  3. The system works without any issues if the zesty kernel (4.10) is used 
instead.

  Ubuntu release: Artful Aardvark (development branch)
  Kernel package version: 4.12.0.13.14

  
  Sep 12 20:22:45 trinity kernel: [  155.491074] pipe A vblank wait timed out
  Sep 12 20:22:45 trinity kernel: [  155.491117] [ cut here 
]
  Sep 12 20:22:45 trinity kernel: [  155.491171] WARNING: CPU: 0 PID: 203 at 
/build/linux-cK2WUa/linux-4.12.0/drivers/gpu/drm/i915/intel_display.c:12636 
intel_atomic_commit_tail+0x1010/0x1020 [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491172] Modules linked in: 
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter 
aufs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter 
ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc 
snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel 
intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec 
snd_hda_core snd_hwdep snd_pcm kvm_intel snd_seq_midi kvm snd_seq_midi_event 
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd intel_cstate 
snd_rawmidi intel_rapl_perf arc4 uvcvideo videobuf2_vmalloc iwldvm 
videobuf2_memops videobuf2_v4l2 mac80211
  Sep 12 20:22:45 trinity kernel: [  155.491215]  videobuf2_core videodev 
joydev input_leds media iwlwifi serio_raw cfg80211 snd_seq snd_seq_device 
snd_timer thinkpad_acpi nvram snd mac_hid mei_me mei soundcore lpc_ich shpchp 
parport_pc ppdev lp parport ip_tables x_tables autofs4 xfs libcrc32c mmc_block 
i915 sdhci_pci uas usb_storage sdhci i2c_algo_bit drm_kms_helper psmouse 
syscopyarea ahci sysfillrect libahci sysimgblt e1000e fb_sys_fops drm ptp 
pps_core wmi video
  Sep 12 20:22:45 trinity kernel: [  155.491251] CPU: 0 PID: 203 Comm: 
kworker/u16:5 Not tainted 4.12.0-13-generic #14-Ubuntu
  Sep 12 20:22:45 trinity kernel: [  155.491253] Hardware name: LENOVO 
4290W1A/4290W1A, BIOS 8DET69WW (1.39 ) 07/18/2013
  Sep 12 20:22:45 trinity kernel: [  155.491292] Workqueue: events_unbound 
intel_atomic_commit_work [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491295] task: 8842c14a8000 
task.stack: ae8f0217c000
  Sep 12 20:22:45 trinity kernel: [  155.491330] RIP: 
0010:intel_atomic_commit_tail+0x1010/0x1020 [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491331] RSP: 0018:ae8f0217fd88 
EFLAGS: 00010282
  Sep 12 20:22:45 trinity kernel: [  155.491333] RAX: 001c RBX: 
 RCX: 
  Sep 12 20:22:45 trinity kernel: [  155.491334] RDX:  RSI: 
8842de20dcc8 RDI: 8842de20dcc8
  Sep 12 20:22:45 trinity kernel: [  155.491336] RBP: ae8f0217fe40 R08: 
0001 R09: 039a
  Sep 12 20:22:45 trinity kernel: [  155.491337] R10: ae8f0217fd88 R11: 
 R12: 2359
  Sep 12 20:22:45 trinity kernel: [  155.491338] R13: 8842c150 R14: 
8842c15e6000 R15: 0001
  Sep 12 20:22:45 trinity kernel: [  155.491340] FS:  () 
GS:8842de20() knlGS:
  Sep 12 20:22:45 trinity kernel: [  155.491342] CS:  0010 DS:  ES:  
CR0: 80050033
  Sep 12 20:22:45 trinity kernel: [  155.491343] CR2: 00c420921000 CR3: 
0003a0409000 CR4: 000406f0
  Sep 12 20:22:45 trinity kernel: [  155.491345] Call Trace:
  Sep 12 20:22:45 trinity kernel: [  155.491352]  ? wake_bit_function+0x60/0x60
  Sep 12 20:22:45 trinity kernel: [  155.491386]  
intel_atomic_commit_work+0x12/0x20 [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491390]  process_one_work+0x1e7/0x410
  Sep 12 20:22:45 trinity kernel: [  155.491393]  worker_thread+0x4a/0x410
  Sep 12 20:22:45 trinity kernel: [  155.491396]  kthread+0x125/0x140
  Sep 12 20:22:45 trinity kernel: [  155.491398]  ? process_one_work+0x410/0x410
  Sep 12 20:22:45 trinity kernel: [  155.491400]  ? 
kthread_create_on_node+0x70/0x70
  Sep 12 20:22:45 trinity kernel:

[Kernel-packages] [Bug 1716747] Re: linux 4.12 - high system load and mouse delays - pipe A vblank wait timed out

2018-03-02 Thread Jay Vosburgh

Joe,

The issue has returned on my X220 tablet; running 4.13-0.36-generic and
the fully updated 17.10 user space.

Every time it happens the laptop display freezes for about 10 or 15
seconds.  A concurrent ssh session is unaffected.

[94261.464884] pipe A vblank wait timed out
[94261.464948] [ cut here ]
[94261.465044] WARNING: CPU: 2 PID: 16697 at /build/linux-r9581B/linux-4.13.0/dr
ivers/gpu/drm/i915/intel_display.c:12848 intel_atomic_commit_tail+0xfa7/0xfb0 [i
915]
[94261.465046] Modules linked in: ccm rfcomm xt_CHECKSUM iptable_mangle ipt_MASQ
UERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 n
f_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_t
cpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_
filter bnep binfmt_misc zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) sp
l(O) intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irq
bypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel arc4 a
es_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf snd_seq_mi
di snd_seq_midi_event snd_hda_codec_hdmi snd_rawmidi iwldvm mac80211 snd_hda_cod
ec_conexant snd_hda_codec_generic uvcvideo videobuf2_vmalloc videobuf2_memops vi
deobuf2_v4l2
[94261.465098]  input_leds thinkpad_acpi snd_seq snd_hda_intel serio_raw wmi_bmo
f videobuf2_core btusb btrtl btbcm iwlwifi videodev btintel joydev bluetooth nvr
am media snd_hda_codec snd_seq_device snd_hda_core ecdh_generic cfg80211 lpc_ich
 snd_hwdep snd_pcm shpchp snd_timer mei_me mei snd soundcore mac_hid nfsd parpor
t_pc ppdev auth_rpcgss nfs_acl lp lockd parport grace sunrpc ip_tables x_tables 
autofs4 i915 i2c_algo_bit drm_kms_helper syscopyarea e1000e sysfillrect wacom sy
simgblt ptp sdhci_pci fb_sys_fops ahci psmouse usbhid sdhci hid drm libahci pps_
core wmi video
[94261.465153] CPU: 2 PID: 16697 Comm: Xorg Tainted: P   O4.13.0-36-
generic #40-Ubuntu
[94261.465155] Hardware name: LENOVO 42992UU/42992UU, BIOS 8DET69WW (1.39 ) 07/1
8/2013
[94261.465157] task: 955d1d3845c0 task.stack: af29821bc000
[94261.465217] RIP: 0010:intel_atomic_commit_tail+0xfa7/0xfb0 [i915]
[94261.465219] RSP: 0018:af29821bf8a8 EFLAGS: 00010286
[94261.465221] RAX: 001c RBX:  RCX: 
[94261.465223] RDX:  RSI: 0002 RDI: 0246
[94261.465225] RBP: af29821bf960 R08: 001c R09: 6177206b6e616c62
[94261.465226] R10: af29821bf8a8 R11: 74756f2064656d69 R12: 003c6b37
[94261.465228] R13: 955d3fa08000 R14: 955d3fbb9000 R15: 0001
[94261.465231] FS:  7fa6fdfd0500() GS:955d5e28() knlGS:0
000
[94261.465233] CS:  0010 DS:  ES:  CR0: 80050033
[94261.465235] CR2: 55ccba6e9ba8 CR3: 000402762004 CR4: 000606e0

[94261.465237] Call Trace:
[94261.465250]  ? wait_woken+0x80/0x80
[94261.465303]  intel_atomic_commit+0x3d5/0x490 [i915]
[94261.465331]  ? drm_atomic_check_only+0x37e/0x540 [drm]
[94261.465352]  drm_atomic_commit+0x51/0x60 [drm]
[94261.465367]  restore_fbdev_mode+0x15e/0x270 [drm_kms_helper]
[94261.465379]  drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x80 
[drm_kms_helper]
[94261.465389]  drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper]
[94261.465447]  intel_fbdev_set_par+0x1a/0x70 [i915]
[94261.465451]  fb_set_var+0x19f/0x440
[94261.465456]  ? __find_get_block+0xb6/0x2b0
[94261.465460]  ? ext4_dirty_inode+0x48/0x70
[94261.465465]  ? __ext4_handle_dirty_metadata+0x87/0x1c0
[94261.465472]  fbcon_blank+0x2b7/0x3a0
[94261.465476]  ? find_get_entry+0x1e/0xd0
[94261.465483]  do_unblank_screen+0xba/0x1b0
[94261.465488]  vt_ioctl+0x4e1/0x11a0
[94261.465493]  ? __slab_free+0x14c/0x2d0
[94261.465497]  ? __slab_free+0x14c/0x2d0
[94261.465502]  tty_ioctl+0xf6/0x8b0
[94261.465507]  ? vga_arb_release+0xd6/0x130
[94261.465511]  ? security_file_free+0x44/0x60
[94261.465515]  ? dput.part.23+0xba/0x1e0
[94261.465521]  do_vfs_ioctl+0xa8/0x630
[94261.465527]  ? entry_SYSCALL_64_after_hwframe+0xe9/0x139
[94261.465530]  ? entry_SYSCALL_64_after_hwframe+0xe2/0x139
[94261.465534]  ? entry_SYSCALL_64_after_hwframe+0xdb/0x139
[94261.465537]  ? entry_SYSCALL_64_after_hwframe+0xd4/0x139
[94261.465541]  ? entry_SYSCALL_64_after_hwframe+0xcd/0x139
[94261.465545]  ? entry_SYSCALL_64_after_hwframe+0xc6/0x139
[94261.465548]  ? entry_SYSCALL_64_after_hwframe+0xbf/0x139
[94261.465552]  ? entry_SYSCALL_64_after_hwframe+0xb8/0x139
[94261.46]  ? entry_SYSCALL_64_after_hwframe+0xb1/0x139
[94261.465560]  SyS_ioctl+0x79/0x90
[94261.465563]  ? entry_SYSCALL_64_after_hwframe+0x72/0x139
[94261.465567]  entry_SYSCALL_64_fastpath+0x24/0xab
[94261.465570] RIP: 0033:0x7fa6fb442ef7
[94261.465572] RSP: 002b:7ffcc51286d8 EFLAGS: 3246 ORIG_RAX: 
0010
[94261.465575] RAX: ffda RBX: 000e RCX: 7fa6fb442ef7
[94261.465576] RDX:

[Kernel-packages] [Bug 1716747] Re: linux 4.12 - high system load and mouse delays - pipe A vblank wait timed out

2017-10-26 Thread Jay Vosburgh

Joe,

No, I'm not seeing the issue now; running 4.13.0-16 for the last 10 days
or so.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1716747

Title:
  linux 4.12 - high system load and mouse delays - pipe A vblank wait
  timed out

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Artful:
  In Progress

Bug description:
  This issue has been observed on a Lenovo X220i laptop:

  1. Booting the laptop with artful's 4.12 kernel results in a very sluggish 
system performance (mouse pointer delays) and a high system load.
  2. /var/log/kern.log indicates a problem with the display driver (see below)
  3. The system works without any issues if the zesty kernel (4.10) is used 
instead.

  Ubuntu release: Artful Aardvark (development branch)
  Kernel package version: 4.12.0.13.14

  
  Sep 12 20:22:45 trinity kernel: [  155.491074] pipe A vblank wait timed out
  Sep 12 20:22:45 trinity kernel: [  155.491117] [ cut here 
]
  Sep 12 20:22:45 trinity kernel: [  155.491171] WARNING: CPU: 0 PID: 203 at 
/build/linux-cK2WUa/linux-4.12.0/drivers/gpu/drm/i915/intel_display.c:12636 
intel_atomic_commit_tail+0x1010/0x1020 [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491172] Modules linked in: 
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter 
aufs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter 
ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc 
snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel 
intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec 
snd_hda_core snd_hwdep snd_pcm kvm_intel snd_seq_midi kvm snd_seq_midi_event 
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd intel_cstate 
snd_rawmidi intel_rapl_perf arc4 uvcvideo videobuf2_vmalloc iwldvm 
videobuf2_memops videobuf2_v4l2 mac80211
  Sep 12 20:22:45 trinity kernel: [  155.491215]  videobuf2_core videodev 
joydev input_leds media iwlwifi serio_raw cfg80211 snd_seq snd_seq_device 
snd_timer thinkpad_acpi nvram snd mac_hid mei_me mei soundcore lpc_ich shpchp 
parport_pc ppdev lp parport ip_tables x_tables autofs4 xfs libcrc32c mmc_block 
i915 sdhci_pci uas usb_storage sdhci i2c_algo_bit drm_kms_helper psmouse 
syscopyarea ahci sysfillrect libahci sysimgblt e1000e fb_sys_fops drm ptp 
pps_core wmi video
  Sep 12 20:22:45 trinity kernel: [  155.491251] CPU: 0 PID: 203 Comm: 
kworker/u16:5 Not tainted 4.12.0-13-generic #14-Ubuntu
  Sep 12 20:22:45 trinity kernel: [  155.491253] Hardware name: LENOVO 
4290W1A/4290W1A, BIOS 8DET69WW (1.39 ) 07/18/2013
  Sep 12 20:22:45 trinity kernel: [  155.491292] Workqueue: events_unbound 
intel_atomic_commit_work [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491295] task: 8842c14a8000 
task.stack: ae8f0217c000
  Sep 12 20:22:45 trinity kernel: [  155.491330] RIP: 
0010:intel_atomic_commit_tail+0x1010/0x1020 [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491331] RSP: 0018:ae8f0217fd88 
EFLAGS: 00010282
  Sep 12 20:22:45 trinity kernel: [  155.491333] RAX: 001c RBX: 
 RCX: 
  Sep 12 20:22:45 trinity kernel: [  155.491334] RDX:  RSI: 
8842de20dcc8 RDI: 8842de20dcc8
  Sep 12 20:22:45 trinity kernel: [  155.491336] RBP: ae8f0217fe40 R08: 
0001 R09: 039a
  Sep 12 20:22:45 trinity kernel: [  155.491337] R10: ae8f0217fd88 R11: 
 R12: 2359
  Sep 12 20:22:45 trinity kernel: [  155.491338] R13: 8842c150 R14: 
8842c15e6000 R15: 0001
  Sep 12 20:22:45 trinity kernel: [  155.491340] FS:  () 
GS:8842de20() knlGS:
  Sep 12 20:22:45 trinity kernel: [  155.491342] CS:  0010 DS:  ES:  
CR0: 80050033
  Sep 12 20:22:45 trinity kernel: [  155.491343] CR2: 00c420921000 CR3: 
0003a0409000 CR4: 000406f0
  Sep 12 20:22:45 trinity kernel: [  155.491345] Call Trace:
  Sep 12 20:22:45 trinity kernel: [  155.491352]  ? wake_bit_function+0x60/0x60
  Sep 12 20:22:45 trinity kernel: [  155.491386]  
intel_atomic_commit_work+0x12/0x20 [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491390]  process_one_work+0x1e7/0x410
  Sep 12 20:22:45 trinity kernel: [  155.491393]  worker_thread+0x4a/0x410
  Sep 12 20:22:45 trinity kernel: [  155.491396]  kthread+0x125/0x140
  Sep 12 20:22:45 trinity kernel: [  155.491398]  ? process_one_work+0x410/0x410
  Sep 12 20:22:45 trinity kernel: [  155.491400]  ? 
kthread_create_on_node+0x70/0x70
  Sep 12 20:22:45 trinity kernel: [  155.491403]  ret_from_fork+0x25/0x30
  Sep 12 20:22:45 trinity kernel: [  155.491405] Code: ff ff ff 48 83 c7 08 e8 
af

[Kernel-packages] [Bug 1716747] Re: linux 4.12 - high system load and mouse delays - pipe A vblank wait timed out

2017-10-04 Thread Jay Vosburgh

Albert,

This is the lspci from my X220 T:

-[:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM 
Controller
   +-02.0  Intel Corporation 2nd Generation Core Processor Family 
Integrated Graphics Controller
   +-16.0  Intel Corporation 6 Series/C200 Series Chipset Family MEI 
Controller #1
   +-19.0  Intel Corporation 82579LM Gigabit Network Connection
   +-1a.0  Intel Corporation 6 Series/C200 Series Chipset Family USB 
Enhanced Host Controller #2
   +-1b.0  Intel Corporation 6 Series/C200 Series Chipset Family High 
Definition Audio Controller
   +-1c.0-[01]--
   +-1c.1-[02]00.0  Intel Corporation Centrino Advanced-N 6205 
[Taylor Peak]
   +-1c.3-[03]--
   +-1c.4-[04]00.0  Ricoh Co Ltd PCIe SDXC/MMC Host Controller
   +-1d.0  Intel Corporation 6 Series/C200 Series Chipset Family USB 
Enhanced Host Controller #1
   +-1f.0  Intel Corporation QM67 Express Chipset Family LPC Controller
   +-1f.2  Intel Corporation 6 Series/C200 Series Chipset Family 6 port 
SATA AHCI Controller
   \-1f.3  Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
Controller

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1716747

Title:
  linux 4.12 - high system load and mouse delays - pipe A vblank wait
  timed out

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Artful:
  In Progress

Bug description:
  This issue has been observed on a Lenovo X220i laptop:

  1. Booting the laptop with artful's 4.12 kernel results in a very sluggish 
system performance (mouse pointer delays) and a high system load.
  2. /var/log/kern.log indicates a problem with the display driver (see below)
  3. The system works without any issues if the zesty kernel (4.10) is used 
instead.

  Ubuntu release: Artful Aardvark (development branch)
  Kernel package version: 4.12.0.13.14

  
  Sep 12 20:22:45 trinity kernel: [  155.491074] pipe A vblank wait timed out
  Sep 12 20:22:45 trinity kernel: [  155.491117] [ cut here 
]
  Sep 12 20:22:45 trinity kernel: [  155.491171] WARNING: CPU: 0 PID: 203 at 
/build/linux-cK2WUa/linux-4.12.0/drivers/gpu/drm/i915/intel_display.c:12636 
intel_atomic_commit_tail+0x1010/0x1020 [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491172] Modules linked in: 
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter 
aufs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter 
ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc 
snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel 
intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec 
snd_hda_core snd_hwdep snd_pcm kvm_intel snd_seq_midi kvm snd_seq_midi_event 
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd intel_cstate 
snd_rawmidi intel_rapl_perf arc4 uvcvideo videobuf2_vmalloc iwldvm 
videobuf2_memops videobuf2_v4l2 mac80211
  Sep 12 20:22:45 trinity kernel: [  155.491215]  videobuf2_core videodev 
joydev input_leds media iwlwifi serio_raw cfg80211 snd_seq snd_seq_device 
snd_timer thinkpad_acpi nvram snd mac_hid mei_me mei soundcore lpc_ich shpchp 
parport_pc ppdev lp parport ip_tables x_tables autofs4 xfs libcrc32c mmc_block 
i915 sdhci_pci uas usb_storage sdhci i2c_algo_bit drm_kms_helper psmouse 
syscopyarea ahci sysfillrect libahci sysimgblt e1000e fb_sys_fops drm ptp 
pps_core wmi video
  Sep 12 20:22:45 trinity kernel: [  155.491251] CPU: 0 PID: 203 Comm: 
kworker/u16:5 Not tainted 4.12.0-13-generic #14-Ubuntu
  Sep 12 20:22:45 trinity kernel: [  155.491253] Hardware name: LENOVO 
4290W1A/4290W1A, BIOS 8DET69WW (1.39 ) 07/18/2013
  Sep 12 20:22:45 trinity kernel: [  155.491292] Workqueue: events_unbound 
intel_atomic_commit_work [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491295] task: 8842c14a8000 
task.stack: ae8f0217c000
  Sep 12 20:22:45 trinity kernel: [  155.491330] RIP: 
0010:intel_atomic_commit_tail+0x1010/0x1020 [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491331] RSP: 0018:ae8f0217fd88 
EFLAGS: 00010282
  Sep 12 20:22:45 trinity kernel: [  155.491333] RAX: 001c RBX: 
 RCX: 
  Sep 12 20:22:45 trinity kernel: [  155.491334] RDX:  RSI: 
8842de20dcc8 RDI: 8842de20dcc8
  Sep 12 20:22:45 trinity kernel: [  155.491336] RBP: ae8f0217fe40 R08: 
0001 R09: 039a
  Sep 12 20:22:45 trinity kernel: [  155.491337] R10: ae8f0217fd88 R11: 
 R12: 2359
  Sep 12 20:22:45 trinity kernel: [  155.491338] R13: 8842c150 R14: 
8842c15e6000 R15: 0001

[Kernel-packages] [Bug 1716747] Re: linux 4.12 - high system load and mouse delays - pipe A vblank wait timed out

2017-09-23 Thread Jay Vosburgh

Just a comment that I have observed this bug as well, on an X220 T.  The
test kernel from comment #11 also appears to resolve the problem (so
far).  I do not have any external USB controllers attached, though, so
I'm not sure what the failure path was.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1716747

Title:
  linux 4.12 - high system load and mouse delays - pipe A vblank wait
  timed out

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Artful:
  In Progress

Bug description:
  This issue has been observed on a Lenovo X220i laptop:

  1. Booting the laptop with artful's 4.12 kernel results in a very sluggish 
system performance (mouse pointer delays) and a high system load.
  2. /var/log/kern.log indicates a problem with the display driver (see below)
  3. The system works without any issues if the zesty kernel (4.10) is used 
instead.

  Ubuntu release: Artful Aardvark (development branch)
  Kernel package version: 4.12.0.13.14

  
  Sep 12 20:22:45 trinity kernel: [  155.491074] pipe A vblank wait timed out
  Sep 12 20:22:45 trinity kernel: [  155.491117] [ cut here 
]
  Sep 12 20:22:45 trinity kernel: [  155.491171] WARNING: CPU: 0 PID: 203 at 
/build/linux-cK2WUa/linux-4.12.0/drivers/gpu/drm/i915/intel_display.c:12636 
intel_atomic_commit_tail+0x1010/0x1020 [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491172] Modules linked in: 
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter 
aufs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter 
ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc 
snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel 
intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec 
snd_hda_core snd_hwdep snd_pcm kvm_intel snd_seq_midi kvm snd_seq_midi_event 
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd intel_cstate 
snd_rawmidi intel_rapl_perf arc4 uvcvideo videobuf2_vmalloc iwldvm 
videobuf2_memops videobuf2_v4l2 mac80211
  Sep 12 20:22:45 trinity kernel: [  155.491215]  videobuf2_core videodev 
joydev input_leds media iwlwifi serio_raw cfg80211 snd_seq snd_seq_device 
snd_timer thinkpad_acpi nvram snd mac_hid mei_me mei soundcore lpc_ich shpchp 
parport_pc ppdev lp parport ip_tables x_tables autofs4 xfs libcrc32c mmc_block 
i915 sdhci_pci uas usb_storage sdhci i2c_algo_bit drm_kms_helper psmouse 
syscopyarea ahci sysfillrect libahci sysimgblt e1000e fb_sys_fops drm ptp 
pps_core wmi video
  Sep 12 20:22:45 trinity kernel: [  155.491251] CPU: 0 PID: 203 Comm: 
kworker/u16:5 Not tainted 4.12.0-13-generic #14-Ubuntu
  Sep 12 20:22:45 trinity kernel: [  155.491253] Hardware name: LENOVO 
4290W1A/4290W1A, BIOS 8DET69WW (1.39 ) 07/18/2013
  Sep 12 20:22:45 trinity kernel: [  155.491292] Workqueue: events_unbound 
intel_atomic_commit_work [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491295] task: 8842c14a8000 
task.stack: ae8f0217c000
  Sep 12 20:22:45 trinity kernel: [  155.491330] RIP: 
0010:intel_atomic_commit_tail+0x1010/0x1020 [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491331] RSP: 0018:ae8f0217fd88 
EFLAGS: 00010282
  Sep 12 20:22:45 trinity kernel: [  155.491333] RAX: 001c RBX: 
 RCX: 
  Sep 12 20:22:45 trinity kernel: [  155.491334] RDX:  RSI: 
8842de20dcc8 RDI: 8842de20dcc8
  Sep 12 20:22:45 trinity kernel: [  155.491336] RBP: ae8f0217fe40 R08: 
0001 R09: 039a
  Sep 12 20:22:45 trinity kernel: [  155.491337] R10: ae8f0217fd88 R11: 
 R12: 2359
  Sep 12 20:22:45 trinity kernel: [  155.491338] R13: 8842c150 R14: 
8842c15e6000 R15: 0001
  Sep 12 20:22:45 trinity kernel: [  155.491340] FS:  () 
GS:8842de20() knlGS:
  Sep 12 20:22:45 trinity kernel: [  155.491342] CS:  0010 DS:  ES:  
CR0: 80050033
  Sep 12 20:22:45 trinity kernel: [  155.491343] CR2: 00c420921000 CR3: 
0003a0409000 CR4: 000406f0
  Sep 12 20:22:45 trinity kernel: [  155.491345] Call Trace:
  Sep 12 20:22:45 trinity kernel: [  155.491352]  ? wake_bit_function+0x60/0x60
  Sep 12 20:22:45 trinity kernel: [  155.491386]  
intel_atomic_commit_work+0x12/0x20 [i915]
  Sep 12 20:22:45 trinity kernel: [  155.491390]  process_one_work+0x1e7/0x410
  Sep 12 20:22:45 trinity kernel: [  155.491393]  worker_thread+0x4a/0x410
  Sep 12 20:22:45 trinity kernel: [  155.491396]  kthread+0x125/0x140
  Sep 12 20:22:45 trinity kernel: [  155.491398]  ? process_one_work+0x410/0x410
  Sep 12 20:22:45 trinity kernel: [  155.491400]  ?

[Kernel-packages] [Bug 1700834] Re: Intel i40e PF reset under load

2017-08-11 Thread Jay Vosburgh

** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1700834

Title:
  Intel i40e PF reset under load

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Released

Bug description:
  SRU Justification:

  Impact:

Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued 
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued 
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued 

This causes a full reset of the PF, which causes an interruption
  in traffic flow.

In this case, these errors arise from a bug in the i40e device
  driver introduced by commit:

  commit 584a837e26408c66e87df87a022faa6a54c2b020
  Author: Alexander Duyck 
  Date:   Wed Feb 17 11:02:50 2016 -0800

  i40e/i40evf: Rewrite logic for 8 descriptor per packet check

This patch was added to the Xenial kernel beginning with version
  4.4.0-8.23.  This bug does not manifest on any other Ubuntu kernel series.

  
  Fix:

  This error is resolved upstream by:

  commit 3f3f7cb875c0f621485644d4fd7453b0d37f00e4
  Author: Alexander Duyck 
  Date:   Wed Mar 30 16:15:37 2016 -0700

  i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per 
packet
  
This fix was never backported into the Xenial 4.4 kernel series.
  

  Testcase:

In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled.  Under heavy load, the card will
  reset itself as described.  The customer has tested the 3f3f7cb875c patch
  in their environment and confirmed that it resolves the issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1700834/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1709032] Re: Creating conntrack entry failure with kernel 4.4.0-89

2017-08-09 Thread Jay Vosburgh

The panic appears to be fixed upstream via:

commit 9c3f3794926a997b1cab6c42480ff300efa2d162
Author: Liping Zhang 
Date:   Sat Mar 25 16:35:29 2017 +0800

netfilter: nf_ct_ext: fix possible panic after
nf_ct_extend_unregister

If one cpu is doing nf_ct_extend_unregister while another cpu is doing
__nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover,
there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to
NULL, so it's possible that we may access invalid pointer.
[...]

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1709032

Title:
  Creating conntrack entry failure with kernel 4.4.0-89

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  The functional job failure rate is at 100%. Every time some test gets
  stuck and job is killed after timeout.

  logstash query:
  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=build_name%3A%5C
  %22gate-neutron-dsvm-functional-ubuntu-
  
xenial%5C%22%20AND%20tags%3Aconsole%20AND%20message%3A%5C%22Killed%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20timeout%20-s%209%5C%22

  2017-08-05 12:36:50.127672 | /home/jenkins/workspace/gate-neutron-
  dsvm-functional-ubuntu-xenial/devstack-gate/functions.sh: line 1129:
  15261 Killed  timeout -s 9 ${REMAINING_TIME}m bash -c
  "source $WORKSPACE/devstack-gate/functions.sh && $cmd"

  There are a few test executors left, which means there are more tests
  stuck:

  stack15468 15445 15468  0.0  0.0   328   796 /bin/sh -c 
OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ 
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} 
\ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run 
discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpDTLPoX
  stack15469 15468 15469  1.5  1.8 139332 150008 python -m subunit.run 
discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpDTLPoX
  stack15470 15445 15470  0.0  0.0   328   700 /bin/sh -c 
OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ 
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} 
\ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run 
discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpICNqRQ
  stack15471 15470 15471  1.6  2.0 152056 164812 python -m subunit.run 
discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpICNqRQ
  stack15474 15445 15474  0.0  0.0   328   792 /bin/sh -c 
OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ 
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} 
\ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run 
discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpe646Tl
  stack15475 15474 15475  1.6  1.9 149972 162516 python -m subunit.run 
discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpe646Tl
  stack15476 15445 15476  0.0  0.0   328   804 /bin/sh -c 
OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ 
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} 
\ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run 
discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpv2ovhz
  stack15477 15476 15477  1.2  1.8 136760 149160 python -m subunit.run 
discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpv2ovhz
  stack15478 15445 15478  0.0  0.0   328   712 /bin/sh -c 
OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ 
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} 
\ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run 
discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpDqXE8S
  stack15479 15478 15479  1.5  1.9 148784 161004 python -m subunit.run 
discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpDqXE8S
  stack15480 15445 15480  0.0  0.0   328   804 /bin/sh -c 
OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ 
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} 
\ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run 
discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpTmmShS
  stack15482 15480 15482  1.6  1.9 148856 161516 python -m subunit.run 
discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpTmmShS

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1709032/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1697053] Re: Missing IOTLB flush causes DMAR errors with SR-IOV

2017-07-13 Thread Jay Vosburgh

proposed kernel tested by customer


** Tags removed: verification-needed-trusty
** Tags added: verification-done-trusty

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1697053

Title:
  Missing IOTLB flush causes DMAR errors with SR-IOV

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  SRU Justification:

  Impact:

  Using SR-IOV with Intel IOMMUs can observe DMAR errors of the
  following type:

  [606483.223009] DMAR:[fault reason 05] PTE Write access is not set 
  [606484.071974] dmar: DRHD: handling fault status reg 402 
  [606484.077121] dmar: DMAR:[DMA Write] Request device [d8:0a.1] fault addr 
35c6e000 

  The DMAR error causes, at a minimum, loss of network traffic
  because the request being serviced is lost.  Network cards were also
  observed to experience transmit timeouts after a DMAR fault.

  In this case, these errors arise from a race condition in
  the IOTLB management; this race is described (and fixed) in upstream
  commit:

  commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d
  Author: David Woodhouse 
  Date:   Wed Mar 5 17:09:32 2014 +

  iommu/vt-d: Clean up and fix page table clear/free behaviour

  This commit first appeared in mainline 3.15.  This issue
  affects only the Ubuntu 3.13 kernel series.

  Fix:

  The race avoidance portion of the above was backported to
  3.14-stable, but was never incorporated into the Ubuntu 3.13
  kernel series.

  commit 51d20e1096a711f8cfa9d98a3ac2dd2c7c0fc20c
  Author: David Woodhouse 
  Date:   Mon Jun 9 14:09:53 2014 +0100

  iommu/vt-d: Fix missing IOTLB flush in intel_iommu_unmap()
  
  Based on commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d upstream

  This 3.14-stable patch was tested by the customer and observed
  to resolve the issue in their environment.

  Testcase:

  In this case, the issue occurs on very recent Intel based
  servers using two different SR-IOV network cards (i40e and bnxt) at a
  customer site.  The customer has tested the patch in their environment
  and confirmed that it resolves the issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1697053/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1700834] Re: Intel i40e PF reset under load

2017-06-27 Thread Jay Vosburgh

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1700834

Title:
  Intel i40e PF reset under load

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  SRU Justification:

  Impact:

Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued 
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued 
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued 

This causes a full reset of the PF, which causes an interruption
  in traffic flow.

In this case, these errors arise from a bug in the i40e device
  driver introduced by commit:

  commit 584a837e26408c66e87df87a022faa6a54c2b020
  Author: Alexander Duyck 
  Date:   Wed Feb 17 11:02:50 2016 -0800

  i40e/i40evf: Rewrite logic for 8 descriptor per packet check

This patch was added to the Xenial kernel beginning with version
  4.4.0-8.23.  This bug does not manifest on any other Ubuntu kernel series.

  
  Fix:

  This error is resolved upstream by:

  commit 3f3f7cb875c0f621485644d4fd7453b0d37f00e4
  Author: Alexander Duyck 
  Date:   Wed Mar 30 16:15:37 2016 -0700

  i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per 
packet
  
This fix was never backported into the Xenial 4.4 kernel series.
  

  Testcase:

In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled.  Under heavy load, the card will
  reset itself as described.  The customer has tested the 3f3f7cb875c patch
  in their environment and confirmed that it resolves the issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1700834/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1700834] [NEW] Intel i40e PF reset under load

2017-06-27 Thread Jay Vosburgh

Public bug reported:

SRU Justification:

Impact:

Using an Intel i40e network device, under heavy traffic load with
TSO enabled, the device will spontaneously reset itself and issue errors
similar to the following:

Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver 
issue detected, PF reset issued 
Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver 
issue detected, PF reset issued 
Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver 
issue detected, PF reset issued 

This causes a full reset of the PF, which causes an interruption
in traffic flow.

In this case, these errors arise from a bug in the i40e device
driver introduced by commit:

commit 584a837e26408c66e87df87a022faa6a54c2b020
Author: Alexander Duyck <adu...@mirantis.com>
Date:   Wed Feb 17 11:02:50 2016 -0800

i40e/i40evf: Rewrite logic for 8 descriptor per packet check

This patch was added to the Xenial kernel beginning with version
4.4.0-8.23.  This bug does not manifest on any other Ubuntu kernel series.


Fix:

This error is resolved upstream by:

commit 3f3f7cb875c0f621485644d4fd7453b0d37f00e4
Author: Alexander Duyck <adu...@mirantis.com>
Date:   Wed Mar 30 16:15:37 2016 -0700

i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per packet

This fix was never backported into the Xenial 4.4 kernel series.


Testcase:

In this case, the issue occurs at a customer site using i40e based
Intel network cards with SR-IOV enabled.  Under heavy load, the card will
reset itself as described.  The customer has tested the 3f3f7cb875c patch
in their environment and confirmed that it resolves the issue.

** Affects: linux (Ubuntu)
 Importance: Undecided
     Assignee: Jay Vosburgh (jvosburgh)
 Status: New

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1700834

Title:
  Intel i40e PF reset under load

Status in linux package in Ubuntu:
  New

Bug description:
  SRU Justification:

  Impact:

Using an Intel i40e network device, under heavy traffic load with
  TSO enabled, the device will spontaneously reset itself and issue errors
  similar to the following:

  Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX 
driver issue detected, PF reset issued 
  Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX 
driver issue detected, PF reset issued 
  Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX 
driver issue detected, PF reset issued 

This causes a full reset of the PF, which causes an interruption
  in traffic flow.

In this case, these errors arise from a bug in the i40e device
  driver introduced by commit:

  commit 584a837e26408c66e87df87a022faa6a54c2b020
  Author: Alexander Duyck <adu...@mirantis.com>
  Date:   Wed Feb 17 11:02:50 2016 -0800

  i40e/i40evf: Rewrite logic for 8 descriptor per packet check

This patch was added to the Xenial kernel beginning with version
  4.4.0-8.23.  This bug does not manifest on any other Ubuntu kernel series.

  
  Fix:

  This error is resolved upstream by:

  commit 3f3f7cb875c0f621485644d4fd7453b0d37f00e4
  Author: Alexander Duyck <adu...@mirantis.com>
  Date:   Wed Mar 30 16:15:37 2016 -0700

  i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per 
packet
  
This fix was never backported into the Xenial 4.4 kernel series.
  

  Testcase:

In this case, the issue occurs at a customer site using i40e based
  Intel network cards with SR-IOV enabled.  Under heavy load, the card will
  reset itself as described.  The customer has tested the 3f3f7cb875c patch
  in their environment and confirmed that it resolves the issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1700834/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1697053] Re: Missing IOTLB flush causes DMAR errors with SR-IOV

2017-06-09 Thread Jay Vosburgh

** Changed in: linux (Ubuntu)
   Status: In Progress => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1697053

Title:
  Missing IOTLB flush causes DMAR errors with SR-IOV

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  SRU Justification:

  Impact:

  Using SR-IOV with Intel IOMMUs can observe DMAR errors of the
  following type:

  [606483.223009] DMAR:[fault reason 05] PTE Write access is not set 
  [606484.071974] dmar: DRHD: handling fault status reg 402 
  [606484.077121] dmar: DMAR:[DMA Write] Request device [d8:0a.1] fault addr 
35c6e000 

  The DMAR error causes, at a minimum, loss of network traffic
  because the request being serviced is lost.  Network cards were also
  observed to experience transmit timeouts after a DMAR fault.

  In this case, these errors arise from a race condition in
  the IOTLB management; this race is described (and fixed) in upstream
  commit:

  commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d
  Author: David Woodhouse 
  Date:   Wed Mar 5 17:09:32 2014 +

  iommu/vt-d: Clean up and fix page table clear/free behaviour

  This commit first appeared in mainline 3.15.  This issue
  affects only the Ubuntu 3.13 kernel series.

  Fix:

  The race avoidance portion of the above was backported to
  3.14-stable, but was never incorporated into the Ubuntu 3.13
  kernel series.

  commit 51d20e1096a711f8cfa9d98a3ac2dd2c7c0fc20c
  Author: David Woodhouse 
  Date:   Mon Jun 9 14:09:53 2014 +0100

  iommu/vt-d: Fix missing IOTLB flush in intel_iommu_unmap()
  
  Based on commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d upstream

  This 3.14-stable patch was tested by the customer and observed
  to resolve the issue in their environment.

  Testcase:

  In this case, the issue occurs on very recent Intel based
  servers using two different SR-IOV network cards (i40e and bnxt) at a
  customer site.  The customer has tested the patch in their environment
  and confirmed that it resolves the issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1697053/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1697053] [NEW] Missing IOTLB flush causes DMAR errors with SR-IOV

2017-06-09 Thread Jay Vosburgh

Public bug reported:

SRU Justification:

Impact:

Using SR-IOV with Intel IOMMUs can observe DMAR errors of the
following type:

[606483.223009] DMAR:[fault reason 05] PTE Write access is not set 
[606484.071974] dmar: DRHD: handling fault status reg 402 
[606484.077121] dmar: DMAR:[DMA Write] Request device [d8:0a.1] fault addr 
35c6e000 

The DMAR error causes, at a minimum, loss of network traffic
because the request being serviced is lost.  Network cards were also
observed to experience transmit timeouts after a DMAR fault.

In this case, these errors arise from a race condition in
the IOTLB management; this race is described (and fixed) in upstream
commit:

commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d
Author: David Woodhouse <david.woodho...@intel.com>
Date:   Wed Mar 5 17:09:32 2014 +

iommu/vt-d: Clean up and fix page table clear/free behaviour

This commit first appeared in mainline 3.15.  This issue
affects only the Ubuntu 3.13 kernel series.

Fix:

The race avoidance portion of the above was backported to
3.14-stable, but was never incorporated into the Ubuntu 3.13
kernel series.

commit 51d20e1096a711f8cfa9d98a3ac2dd2c7c0fc20c
Author: David Woodhouse <dw...@infradead.org>
Date:   Mon Jun 9 14:09:53 2014 +0100

iommu/vt-d: Fix missing IOTLB flush in intel_iommu_unmap()

Based on commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d upstream

This 3.14-stable patch was tested by the customer and observed
to resolve the issue in their environment.

Testcase:

In this case, the issue occurs on very recent Intel based
servers using two different SR-IOV network cards (i40e and bnxt) at a
customer site.  The customer has tested the patch in their environment
and confirmed that it resolves the issue.

** Affects: linux (Ubuntu)
 Importance: Undecided
     Assignee: Jay Vosburgh (jvosburgh)
 Status: New

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1697053

Title:
  Missing IOTLB flush causes DMAR errors with SR-IOV

Status in linux package in Ubuntu:
  New

Bug description:
  SRU Justification:

  Impact:

  Using SR-IOV with Intel IOMMUs can observe DMAR errors of the
  following type:

  [606483.223009] DMAR:[fault reason 05] PTE Write access is not set 
  [606484.071974] dmar: DRHD: handling fault status reg 402 
  [606484.077121] dmar: DMAR:[DMA Write] Request device [d8:0a.1] fault addr 
35c6e000 

  The DMAR error causes, at a minimum, loss of network traffic
  because the request being serviced is lost.  Network cards were also
  observed to experience transmit timeouts after a DMAR fault.

  In this case, these errors arise from a race condition in
  the IOTLB management; this race is described (and fixed) in upstream
  commit:

  commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d
  Author: David Woodhouse <david.woodho...@intel.com>
  Date:   Wed Mar 5 17:09:32 2014 +

  iommu/vt-d: Clean up and fix page table clear/free behaviour

  This commit first appeared in mainline 3.15.  This issue
  affects only the Ubuntu 3.13 kernel series.

  Fix:

  The race avoidance portion of the above was backported to
  3.14-stable, but was never incorporated into the Ubuntu 3.13
  kernel series.

  commit 51d20e1096a711f8cfa9d98a3ac2dd2c7c0fc20c
  Author: David Woodhouse <dw...@infradead.org>
  Date:   Mon Jun 9 14:09:53 2014 +0100

  iommu/vt-d: Fix missing IOTLB flush in intel_iommu_unmap()
  
  Based on commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d upstream

  This 3.14-stable patch was tested by the customer and observed
  to resolve the issue in their environment.

  Testcase:

  In this case, the issue occurs on very recent Intel based
  servers using two different SR-IOV network cards (i40e and bnxt) at a
  customer site.  The customer has tested the patch in their environment
  and confirmed that it resolves the issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1697053/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits

2017-05-26 Thread Jay Vosburgh

Customer has verified that 4.4.0-79-generic resolves the issue in their
environment that would previously panic.


** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1687512

Title:
  Kernel panics on Xenial when using cgroups and strict CFS limits

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  SRU Justification
  -

  [Impact]
  Apache Mesos and Kubernetes workloads on Xenial cause a panic
  (NULL pointer dereference) in the completely fair scheduler.

  These panics are in pick_next_entity and include pick_next_task_fair
  in the call stack.

  [Fix]
  Cherry-picking both
  754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
  (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
  and
  094f469172e00d6ab0a3130b0e01c83b3cf3a98d
  (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
  fix the crash.
  They appear to be intended as a series - they were posted to LKML at
  the same time.

  [Testcase]
  The fix has been validated by the user who reported the bug

  Bug description
  ---

  We see a number of kernel panics on servers running Apache Mesos using
  cgroups with small (0.1-0.2) cpu limits.

  These all appear as NULL pointer dereferences in and around
  pick_next_entity and pick_next_task_fair, for example:

  [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [24334.501611] IP: [] pick_next_entity+0x7f/0x160
  [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
  [24334.512806] Oops:  [#1] SMP
  [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack 
x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag 
inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt 
ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport 
serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [24334.584748] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 
8803ee67c000
  [24334.601799] RIP: 0010:[] [] 
pick_next_entity+0x7f/0x160
  [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086
  [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: 

  [24334.623190] RDX: 0225341f RSI:  RDI: 

  [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: 

  [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: 

  [24334.645153] R13:  R14: 0009067729c4 R15: 
8803ee672178
  [24334.652512] FS: () GS:8803ffd0() 
knlGS:
  [24334.660721] CS: 0010 DS:  ES:  CR0: 80050033
  [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 
001406e0
  [24334.673851] Stack:
  [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 
880036529800
  [24334.683995] 0002 8803ee67fe68 810b98a6 
8803ffd16e70
  [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 

  [24334.700172] Call Trace:
  [24334.702750] [] pick_next_task_fair+0x66/0x4b0
  [24334.708886] [] __schedule+0x7f4/0x980
  [24334.714349] [] schedule+0x35/0x80
  [24334.719445] [] schedule_preempt_disabled+0xe/0x10
  [24334.725962] [] cpu_startup_entry+0x18a/0x350
  [24334.732012] [] start_secondary+0x149/0x170
  [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff 
ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 
74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [24334.765124] RIP [] pick_next_entity+0x7f/0x160
  [24334.771473] RSP 
  [24334.775077] CR2: 0050
  [24334.779121] ---[ end trace 05d941efb97b7bae ]---

  and

  [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [155852.036931] IP: [] pick_next_entity+0x7f/0x160
  [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
  [155852.048550] Oops:  [#1] SMP
  [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp 
ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter 
ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge

[Kernel-packages] [Bug 1683947] Re: ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

2017-04-25 Thread Jay Vosburgh

Jason,

I work for Canonical; the issue came up with one of our customers.

FWIW, I debugged the issue by first using kprobes and ftrace on the
kernel of a running instance to trace the packet path through the
kernel.  Once it seemed that the affected packets were not being dropped
somewhere on the instance and that MASQUERADE appeared to be operating
correctly, I did a git bisect of the kernel to isolate the actual commit
that resolved the problem (as the 4.11 kernel did not suffer from the
issue).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1683947

Title:
  ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Yakkety:
  New

Bug description:
  
  SRU Justification:

  Impact:

  Configuring the 4.8 kernel with iptables MASQUERADE over virtio_net
  causes packets to be dropped by the hypervisor (host) due to improper
  flags being set based on the IP checksum state of the packet.  The
  host performing MASQUERADE is affected by the bug.

  Issue was introduced by

  commit fd2a0437dc33b6425cabf74cc7fc7fdba6d5903b
  Author: Mike Rapoport 
  Date: Wed Jun 8 16:09:18 2016 +0300

  virtio_net: introduce virtio_net_hdr_{from,to}_skb

  which first appears in v4.8-rc1

  Fix:

  Fixed upstream by

  3e9e40e74753 virtio_net: Simplify call sites for virtio_net_hdr_{from, 
to}_skb().
  501db511397f virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit
  6391a4481ba0 virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving

  3e9e40e74753 first appears in v4.9-rc5 (and is a prerequisite only),
  the others in v4.10-rc4.

  Testcase:

  Reproduction to date has been on GCE, although in principle it should
  manifest on any suitable topology using virtio_net.  There is a
  dependency on the forwarded packets having skb->ip_summed ==
  CHECKSUM_UNNECESSARY; not all incoming devices will have this
  property.

  On GCE, the following steps will induce the issue on an affected
  kernel:

  Setup a network:

  % gcloud compute networks create nat-network --mode legacy --range 
10.240.0.0/16
  % gcloud compute firewall-rules create nat-network-allow-ssh --allow tcp:22 
--network nat-network
  % gcloud compute firewall-rules create nat-network-allow-internal --allow 
tcp:1-65535,udp:1-65535,icmp --source-ranges 10.240.0.0/16 --network nat-network

  Setup an Ubuntu 16.04 NAT VM:

  % gcloud compute instances create nat-gateway-16 --zone us-central1-a
  --network nat-network --can-ip-forward --image-family ubuntu-1604-lts
  --image-project ubuntu-os-cloud --tags nat --metadata startup-
  script='sysctl -w net.ipv4.ip_forward=1 ; iptables -t nat -A
  POSTROUTING -o ens4 -j MASQUERADE'

  Setup a route to use the 16.04 NAT:

  % gcloud compute routes create no-ip-internet-route --network nat-
  network --destination-range 0.0.0.0/0 --next-hop-instance nat-
  gateway-16 --next-hop-instance-zone us-central1-a --tags no-ip
  --priority 800

  Setup a simple test VM without any external network:

  % gcloud compute instances create nat-client --zone us-central1-a
  --network nat-network --no-address --image-family ubuntu-1604-lts
  --image-project ubuntu-os-cloud --tags no-ip --metadata startup-
  script='wget --timeout=5 https://github.com/GoogleCloudPlatform
  /compute-image-packages/archive/20170327.tar.gz'

  Wait for it to boot... maybe 30 seconds or so.

  Look for serial port output:

  % gcloud compute instances get-serial-port-output nat-client --zone
  us-central1-a | grep startup-script

  You will see that the connection to github never succeeds - it just
  gets stuck on "Resolving github.com (github.com)... 192.30.253.112,
  192.30.253.113" and will timeout. (ignore the previous attempt from
  the successful 14.04 based NAT).

  Repeat the test by resettting the test client instance and watch for
  serial output:

  % gcloud compute instances reset nat-client --zone us-central1-a

  Wait a minute or so for new boot, then check the serial-port-output as
  above.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1683947/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1683947] Re: ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

2017-04-20 Thread Jay Vosburgh

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1683947

Title:
  ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  
  SRU Justification:

  Impact:

  Configuring the 4.8 kernel with iptables MASQUERADE over virtio_net
  causes packets to be dropped by the hypervisor (host) due to improper
  flags being set based on the IP checksum state of the packet.  The
  host performing MASQUERADE is affected by the bug.

  Issue was introduced by

  commit fd2a0437dc33b6425cabf74cc7fc7fdba6d5903b
  Author: Mike Rapoport 
  Date: Wed Jun 8 16:09:18 2016 +0300

  virtio_net: introduce virtio_net_hdr_{from,to}_skb

  which first appears in v4.8-rc1

  Fix:

  Fixed upstream by

  3e9e40e74753 virtio_net: Simplify call sites for virtio_net_hdr_{from, 
to}_skb().
  501db511397f virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit
  6391a4481ba0 virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving

  3e9e40e74753 first appears in v4.9-rc5 (and is a prerequisite only),
  the others in v4.10-rc4.

  Testcase:

  Reproduction to date has been on GCE, although in principle it should
  manifest on any suitable topology using virtio_net.  There is a
  dependency on the forwarded packets having skb->ip_summed ==
  CHECKSUM_UNNECESSARY; not all incoming devices will have this
  property.

  On GCE, the following steps will induce the issue on an affected
  kernel:

  Setup a network:

  % gcloud compute networks create nat-network --mode legacy --range 
10.240.0.0/16
  % gcloud compute firewall-rules create nat-network-allow-ssh --allow tcp:22 
--network nat-network
  % gcloud compute firewall-rules create nat-network-allow-internal --allow 
tcp:1-65535,udp:1-65535,icmp --source-ranges 10.240.0.0/16 --network nat-network

  Setup an Ubuntu 16.04 NAT VM:

  % gcloud compute instances create nat-gateway-16 --zone us-central1-a
  --network nat-network --can-ip-forward --image-family ubuntu-1604-lts
  --image-project ubuntu-os-cloud --tags nat --metadata startup-
  script='sysctl -w net.ipv4.ip_forward=1 ; iptables -t nat -A
  POSTROUTING -o ens4 -j MASQUERADE'

  Setup a route to use the 16.04 NAT:

  % gcloud compute routes create no-ip-internet-route --network nat-
  network --destination-range 0.0.0.0/0 --next-hop-instance nat-
  gateway-16 --next-hop-instance-zone us-central1-a --tags no-ip
  --priority 800

  Setup a simple test VM without any external network:

  % gcloud compute instances create nat-client --zone us-central1-a
  --network nat-network --no-address --image-family ubuntu-1604-lts
  --image-project ubuntu-os-cloud --tags no-ip --metadata startup-
  script='wget --timeout=5 https://github.com/GoogleCloudPlatform
  /compute-image-packages/archive/20170327.tar.gz'

  Wait for it to boot... maybe 30 seconds or so.

  Look for serial port output:

  % gcloud compute instances get-serial-port-output nat-client --zone
  us-central1-a | grep startup-script

  You will see that the connection to github never succeeds - it just
  gets stuck on "Resolving github.com (github.com)... 192.30.253.112,
  192.30.253.113" and will timeout. (ignore the previous attempt from
  the successful 14.04 based NAT).

  Repeat the test by resettting the test client instance and watch for
  serial output:

  % gcloud compute instances reset nat-client --zone us-central1-a

  Wait a minute or so for new boot, then check the serial-port-output as
  above.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1683947/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1683947] [NEW] ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

2017-04-18 Thread Jay Vosburgh

Public bug reported:


SRU Justification:

Impact:

Configuring the 4.8 kernel with iptables MASQUERADE over virtio_net
causes packets to be dropped by the hypervisor (host) due to improper
flags being set based on the IP checksum state of the packet.  The host
performing MASQUERADE is affected by the bug.

Issue was introduced by

commit fd2a0437dc33b6425cabf74cc7fc7fdba6d5903b
Author: Mike Rapoport <r...@linux.vnet.ibm.com>
Date: Wed Jun 8 16:09:18 2016 +0300

virtio_net: introduce virtio_net_hdr_{from,to}_skb

which first appears in v4.8-rc1

Fix:

Fixed upstream by

3e9e40e74753 virtio_net: Simplify call sites for virtio_net_hdr_{from, 
to}_skb().
501db511397f virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit
6391a4481ba0 virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving

3e9e40e74753 first appears in v4.9-rc5 (and is a prerequisite only), the
others in v4.10-rc4.

Testcase:

Reproduction to date has been on GCE, although in principle it should
manifest on any suitable topology using virtio_net.  There is a
dependency on the forwarded packets having skb->ip_summed ==
CHECKSUM_UNNECESSARY; not all incoming devices will have this property.

On GCE, the following steps will induce the issue on an affected kernel:

Setup a network:

% gcloud compute networks create nat-network --mode legacy --range 10.240.0.0/16
% gcloud compute firewall-rules create nat-network-allow-ssh --allow tcp:22 
--network nat-network
% gcloud compute firewall-rules create nat-network-allow-internal --allow 
tcp:1-65535,udp:1-65535,icmp --source-ranges 10.240.0.0/16 --network nat-network

Setup an Ubuntu 16.04 NAT VM:

% gcloud compute instances create nat-gateway-16 --zone us-central1-a
--network nat-network --can-ip-forward --image-family ubuntu-1604-lts
--image-project ubuntu-os-cloud --tags nat --metadata startup-
script='sysctl -w net.ipv4.ip_forward=1 ; iptables -t nat -A POSTROUTING
-o ens4 -j MASQUERADE'

Setup a route to use the 16.04 NAT:

% gcloud compute routes create no-ip-internet-route --network nat-
network --destination-range 0.0.0.0/0 --next-hop-instance nat-gateway-16
--next-hop-instance-zone us-central1-a --tags no-ip --priority 800

Setup a simple test VM without any external network:

% gcloud compute instances create nat-client --zone us-central1-a
--network nat-network --no-address --image-family ubuntu-1604-lts
--image-project ubuntu-os-cloud --tags no-ip --metadata startup-
script='wget --timeout=5 https://github.com/GoogleCloudPlatform/compute-
image-packages/archive/20170327.tar.gz'

Wait for it to boot... maybe 30 seconds or so.

Look for serial port output:

% gcloud compute instances get-serial-port-output nat-client --zone us-
central1-a | grep startup-script

You will see that the connection to github never succeeds - it just gets
stuck on "Resolving github.com (github.com)... 192.30.253.112,
192.30.253.113" and will timeout. (ignore the previous attempt from the
successful 14.04 based NAT).

Repeat the test by resettting the test client instance and watch for
serial output:

% gcloud compute instances reset nat-client --zone us-central1-a

Wait a minute or so for new boot, then check the serial-port-output as
above.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Assignee: Jay Vosburgh (jvosburgh)
 Status: New

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1683947

Title:
  ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

Status in linux package in Ubuntu:
  New

Bug description:
  
  SRU Justification:

  Impact:

  Configuring the 4.8 kernel with iptables MASQUERADE over virtio_net
  causes packets to be dropped by the hypervisor (host) due to improper
  flags being set based on the IP checksum state of the packet.  The
  host performing MASQUERADE is affected by the bug.

  Issue was introduced by

  commit fd2a0437dc33b6425cabf74cc7fc7fdba6d5903b
  Author: Mike Rapoport <r...@linux.vnet.ibm.com>
  Date: Wed Jun 8 16:09:18 2016 +0300

  virtio_net: introduce virtio_net_hdr_{from,to}_skb

  which first appears in v4.8-rc1

  Fix:

  Fixed upstream by

  3e9e40e74753 virtio_net: Simplify call sites for virtio_net_hdr_{from, 
to}_skb().
  501db511397f virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit
  6391a4481ba0 virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving

  3e9e40e74753 first appears in v4.9-rc5 (and is a prerequisite only),
  the others in v4.10-rc4.

  Testcase:

  Reproduction to date has been on GCE, although in principle it should
  manifest on any suitable topology using virtio_net.  There is a
  dependency on the forwarded packets having skb->ip_summed ==
  CHECKSUM_UNNECESSARY; not all incoming devices will have this
  property.

  On GCE, the following steps will induce the issue on

[Kernel-packages] [Bug 1658491] Re: VLAN SR-IOV regression for IXGBE driver

2017-01-23 Thread Jay Vosburgh

This issue may be fixed by this upstream commit:

commit f60439bc21e3337429838e477903214f5bd8277f
Author: Alexander Duyck 
Date:   Thu Aug 11 14:51:56 2016 -0700

ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths

When I was adding the code for enabling VLAN promiscuous mode with SR-IOV
enabled I had inadvertently left the VLNCTRL.VFE bit unchanged as I has
assumed there was code in another path that was setting it when we enabled
SR-IOV.  This wasn't the case and as a result we were just disabling VLAN
filtering for all the VFs apparently.

Also the previous patches were always clearing CFIEN which was always set
to 0 by the hardware anyway so I am dropping the redundant bit clearing.

Fixes: 16369564915a ("ixgbe: Add support for VLAN promiscuous with SR-IOV")
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1658491

Title:
  VLAN SR-IOV regression for IXGBE driver

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  New
Status in linux source package in Yakkety:
  New
Status in linux source package in Zesty:
  In Progress

Bug description:
  
  IXGBE driver, for SR-IOV setups, is misbehaving with VLANs.

  Description from affected user:

  - Create 2 networks (sriov 100 and 102 vlan)

  # neutron net-create --provider:physical_network=PHY0 
--provider:network_type=vlan --provider:segmentation_id=100 PHY0_vlan_100
  # neutron net-create --provider:physical_network=PHY0 
--provider:network_type=vlan --provider:segmentation_id=102 PHY0_vlan_102

  - Create the subnets:

  # neutron subnet-create PHY0_vlan_100 192.168.50.0/24
  # neutron subnet-create PHY0_vlan_102 192.168.60.0/24

  - Create the neutron ports:

  # neutron port-create e450757f-fec6-466e-bb21-a42a2019fe6b --name 
vlan_100_port1 --vnic-type direct
  # neutron port-create 32c468ed-7e1e-4267-bbbf-ec72d33e4454 --name 
vlan_102_port1 --vnic-type direct

  - Boot 2 VMs on 2 different hosts (add only 1 port to each of them +
  ovs dhcp network):

  # nova boot --flavor 789 --image ubuntu --nic 
net-id=1cf2a512-8963-413d-a745-99e758789c2b --nic 
port-id=92cf2867-cc0a-4e0d-aa87-14a345cdd708 102_port1_compute6 --key-name mkey 
--config-drive true --availability-zone nova:compute-0-6.domain.tld --poll
  # nova boot --flavor 789 --image ubutnu --nic 
net-id=1cf2a512-8963-413d-a745-99e758789c2b --nic 
port-id=baec6fd6-933d-4c58-94b6-44c50405d409 100_port1_compute5 --key-name mkey 
--config-drive true --availability-zone nova:compute-0-5.domain.tld --poll

  - After the VMs booted, configure the VFs:

  root@102-port1-compute6:~# ifconfig eth1 192.168.34.6 up
  root@100-port1-compute5:~# ifconfig eth1 192.168.34.5 up

  If I ping each other it works but it shouldn't work because in this
  case both of the VMs's interface (host VF) are in different vlans:

  - Pinging shouldn't work because the VMs interface (host VF) are in
  different VLANs.

  root@compute-0-5:~# ip link show eth6
  8: eth6:  mtu 2140 qdisc mq state UP mode 
DEFAULT group default qlen 1000
  link/ether a0:36:9f:3f:1a:64 brd ff:ff:ff:ff:ff:ff
  vf 5 MAC fa:16:3e:f0:2c:e2, vlan 100, spoof checking on, link-state auto

  root@compute-0-6:~# ip link show eth5
  8: eth5:  mtu 2140 qdisc mq state UP mode 
DEFAULT group default qlen 1000
  link/ether a0:36:9f:3f:20:88 brd ff:ff:ff:ff:ff:ff
  vf 7 MAC fa:16:3e:ce:69:41, vlan 101, spoof checking on, link-state auto 

  But user can ping both VMs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1658491/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response

2017-01-10 Thread Jay Vosburgh

I have instrumented ipconfig, and determined that the ultimate source of the 
problem
is that, for the case of multiple interfaces, ipconfig has a dependency on the 
kernel's probe order of the network interfaces.

For whatever reason, the -31 kernel probes the network devices in one
order (e.g., ens3 then ens4), and the -57 kernel in the other order
(ens4 first then ens3).

The probe order of network devices (and PCI devices in general) is
explicitly not defined, and so this is not a bug in the kernel itself;
ipconfig is failing due to its dependency on a specific enumeration
order.

The issue in ipconfig is that it is using a single packet socket to
attempt to multiplex packet traffic on multiple interfaces.  Presuming
that ens3 will answer DHCP and ens4 will not, for the case that works,
the order ends up being something like:

send DHCP request on ens3
send DHCP request on ens4
[ system gets DHCP response via ens3 ]
try to receive DHCP reply sent by peer for ens3; this matches, and all is happy

For the case that it fails, the sequence is roughly:

send DHCP request on ens4
send DHCP request on ens3
[ system gets DHCP response via ens3 ]
try to receive DHCP reply sent by peer for ens4; the reply is actually for 
ens3, so ipconfig
throws it away (as the XID, et al, don't match what is expected for the ens4 
DHCP request).

This repeats until ipconfig gives up.

As I said above, the issue is that ipconfig is trying to multiplex
traffic for two interfaces on one packet socket.  This is fine for
sending, but for receiving on an unbound packet socket, there is no way
to receive a packet sent to a specific interface.  Packets are delivered
to recvfrom/recvmsg in the order received.

I note that ipconfig sets sll.sll_ifindex on the msghdr provided to
recvfrom and recvmsg system calls; perhaps the author believed that this
limits received packets to only packets received on that ifindex.  This
is not the case, and the sll_ifindex passed to recvfrom/recvmsg is
ignored.

I'm looking into whether or not there is an simple fix for this that
will let ipconfig function without major rework to utilize one packet
socket per interface.



** Tags removed: kernel-key

** Package changed: linux (Ubuntu) => klibc (Ubuntu)

** Changed in: klibc (Ubuntu)
   Status: Triaged => Confirmed

** Changed in: klibc (Ubuntu)
 Assignee: (unassigned) => Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1652348

Title:
  initrd dhcp fails / ignores valid response

Status in klibc package in Ubuntu:
  Confirmed
Status in klibc source package in Xenial:
  Triaged

Bug description:
  Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been
  (re?)introduced that is breaking dhcp booting in the initrd
  environment.  This is stopping instances that use iscsi storage from
  being able to connect.

  Over serial console it outputs:

  IP-Config: no response after 2 secs - giving up
  IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP
  IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP
  IP-Config: no response after 3 secs - giving up

  with increasing delays until it fails.  At which point a simple
  ipconfig -t dhcp -d "ens2f0"  works.  The console output is slightly
  garbled but should give you an idea:

  (initramfs) ipconfig -t dhcp -[  728.379793] ixgbe :13:00.0 ens2f0: 
changing MTU from 1500 to 9000
  d "ens2f0"
  IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP
  IP-Config: ens2f0 guessed broadcast address 10.0.1.255
  IP-Config: ens2f0 complete (dhcp from 169.254.169.254):
   addres[  728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3
  s: 10.0.1.56broadcast: 10.0.1.255   netmask: 255.255.255.0
   gateway: 10.0.1.1   [  729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 
10 Gbps, Flow Control: RX/TX
    dns0 : 169.254.169.254  dns1   : 0.0.0.0
   rootserver: 169.254.169.254 rootpath:
   filename  : /ipxe.efi

  tcpdumps show that dhcp requests are being received from the host, and
  responses sent, but not accepted by the host.  When the ipconfig
  command is issued manually, an identical dhcp request and response
  happens, only this time it is accepted.  It doesn't appear to be that
  the messages are being sent and received incorrectly, just silently
  ignored by ipconfig.

  I was seeing this behaviour earlier this year, which I was able to fix
  by specifying "ip=dhcp" as a kernel parameter.  About a month ago that
  was identified as causing us other problems (long story) and we
  dropped it, at which point we discovered the original bug was no
  longer an issue.

  Putting "ip=dhcp" back on with this kernel no longer fixes the
  problem.

  I've compared the two initrds and effectively the only thing that has
  changed between the two

[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response

2017-01-09 Thread Jay Vosburgh

I have reproduced the described issue locally using the instructions
from comment 35; will start looking into the cause.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1652348

Title:
  initrd dhcp fails / ignores valid response

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Triaged

Bug description:
  Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been
  (re?)introduced that is breaking dhcp booting in the initrd
  environment.  This is stopping instances that use iscsi storage from
  being able to connect.

  Over serial console it outputs:

  IP-Config: no response after 2 secs - giving up
  IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP
  IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP
  IP-Config: no response after 3 secs - giving up

  with increasing delays until it fails.  At which point a simple
  ipconfig -t dhcp -d "ens2f0"  works.  The console output is slightly
  garbled but should give you an idea:

  (initramfs) ipconfig -t dhcp -[  728.379793] ixgbe :13:00.0 ens2f0: 
changing MTU from 1500 to 9000
  d "ens2f0"
  IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP
  IP-Config: ens2f0 guessed broadcast address 10.0.1.255
  IP-Config: ens2f0 complete (dhcp from 169.254.169.254):
   addres[  728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3
  s: 10.0.1.56broadcast: 10.0.1.255   netmask: 255.255.255.0
   gateway: 10.0.1.1   [  729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 
10 Gbps, Flow Control: RX/TX
    dns0 : 169.254.169.254  dns1   : 0.0.0.0
   rootserver: 169.254.169.254 rootpath:
   filename  : /ipxe.efi

  tcpdumps show that dhcp requests are being received from the host, and
  responses sent, but not accepted by the host.  When the ipconfig
  command is issued manually, an identical dhcp request and response
  happens, only this time it is accepted.  It doesn't appear to be that
  the messages are being sent and received incorrectly, just silently
  ignored by ipconfig.

  I was seeing this behaviour earlier this year, which I was able to fix
  by specifying "ip=dhcp" as a kernel parameter.  About a month ago that
  was identified as causing us other problems (long story) and we
  dropped it, at which point we discovered the original bug was no
  longer an issue.

  Putting "ip=dhcp" back on with this kernel no longer fixes the
  problem.

  I've compared the two initrds and effectively the only thing that has
  changed between the two is the kernel components.

  Ubuntu kernel bisect offending commit:
  # first bad commit: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per 
mount namespace limit on the number of mounts

  Ubuntu kernel bisect offending commit submission:
  https://lkml.org/lkml/2016/10/5/308

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response

2017-01-09 Thread Jay Vosburgh

Just a note that I'm setting up to try the reproduction instructions
from comment #35

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1652348

Title:
  initrd dhcp fails / ignores valid response

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been
  (re?)introduced that is breaking dhcp booting in the initrd
  environment.  This is stopping instances that use iscsi storage from
  being able to connect.

  Over serial console it outputs:

  IP-Config: no response after 2 secs - giving up
  IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP
  IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP
  IP-Config: no response after 3 secs - giving up

  with increasing delays until it fails.  At which point a simple
  ipconfig -t dhcp -d "ens2f0"  works.  The console output is slightly
  garbled but should give you an idea:

  (initramfs) ipconfig -t dhcp -[  728.379793] ixgbe :13:00.0 ens2f0: 
changing MTU from 1500 to 9000
  d "ens2f0"
  IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP
  IP-Config: ens2f0 guessed broadcast address 10.0.1.255
  IP-Config: ens2f0 complete (dhcp from 169.254.169.254):
   addres[  728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3
  s: 10.0.1.56broadcast: 10.0.1.255   netmask: 255.255.255.0
   gateway: 10.0.1.1   [  729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 
10 Gbps, Flow Control: RX/TX
    dns0 : 169.254.169.254  dns1   : 0.0.0.0
   rootserver: 169.254.169.254 rootpath:
   filename  : /ipxe.efi

  tcpdumps show that dhcp requests are being received from the host, and
  responses sent, but not accepted by the host.  When the ipconfig
  command is issued manually, an identical dhcp request and response
  happens, only this time it is accepted.  It doesn't appear to be that
  the messages are being sent and received incorrectly, just silently
  ignored by ipconfig.

  I was seeing this behaviour earlier this year, which I was able to fix
  by specifying "ip=dhcp" as a kernel parameter.  About a month ago that
  was identified as causing us other problems (long story) and we
  dropped it, at which point we discovered the original bug was no
  longer an issue.

  Putting "ip=dhcp" back on with this kernel no longer fixes the
  problem.

  I've compared the two initrds and effectively the only thing that has
  changed between the two is the kernel components.

  Ubuntu kernel bisect offending commit:
  # first bad commit: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per 
mount namespace limit on the number of mounts

  Ubuntu kernel bisect offending commit submission:
  https://lkml.org/lkml/2016/10/5/308

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1584092] Re: Docker misconfigured when using non-default overlay/underlay netmask size

2016-05-20 Thread Jay Vosburgh

I haven't tested this patch, but fanctl had the same issue, and I
believe the fix is that the subnet math has to be "overlay_width + ( 32
- underlay_width )", not "overlay_width + underlay_width".

Patch attached.

** Patch removed: "fanatic patch"
   
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+attachment/4667027/+files/fanatic.patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to ubuntu-fan in Ubuntu.
https://bugs.launchpad.net/bugs/1584092

Title:
  Docker misconfigured when using non-default overlay/underlay netmask
  size

Status in ubuntu-fan package in Ubuntu:
  New

Bug description:
  Fan allows for variable sized subnet map sizes.  For example, if I
  want to map a /24 to a /16 instead of the default /16 to /8, Fan
  supports this.  However, when configuring this via fanatic, I see that
  docker configuration fails.  In /etc/default/docker, the --fixed-cidr
  flag is defined incorrectly.

  $ sudo fanatic
  Welcome to the fanatic fan networking wizard.  This will help you set
  up an example fan network and optionally configure docker and/or LXD touse 
this network.  See fanatic(1) for more details.

  Configure fan underlay (hit return to accept, or specify alternative) 
[192.168.0.0/16]: 192.168.1.0/24
  Configure fan overlay (hit return to accept, or specify alternative) 
[250.0.0.0/8]: 250.99.0.0/16
  Create LXD networking for underlay:192.168.1.0/24 overlay:250.99.0.0/16 [Yn]: 
Y
  Profile fan-250-99 created
  Create docker networking for underlay:192.168.1.0/24 overlay:250.99.0.0/16 
[Yn]: Y
  Job for docker.service failed because the control process exited with error 
code. See "systemctl status docker.service" and "journalctl -xe" for details.
  Test LXD networking for underlay:192.168.1.10/24 overlay:250.99.0.0/16
  (NOTE: potentially triggers large image downloads) [Yn]: n
  Test docker networking for underlay:192.168.1.10/24 overlay:250.99.0.0/16
  (NOTE: potentially triggers large image downloads) [Yn]: n
  This host IP address: 192.168.1.10
  Remote test host IP address (none to skip): 
  /usr/sbin/fanatic: Testing skipped

  $ grep "DOCKER_OPTS" /etc/default/docker
  # Use DOCKER_OPTS to modify the daemon startup options.
  #DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4"
  DOCKER_OPTS=" -b fan-250-99 --mtu=1450 --iptables=false 
--fixed-cidr=250.99.10.0/40"

  May 20 05:15:30 macbook docker[27364]:
  time="2016-05-20T05:15:30.411933688-07:00" level=fatal msg="Error
  starting daemon: Error initializing network controller: invalid CIDR
  address: 250.99.10.0/40"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1584092] Re: Docker misconfigured when using non-default overlay/underlay netmask size

2016-05-20 Thread Jay Vosburgh

I haven't tested this patch, but fanctl had the same issue, and I
believe the fix is that the subnet math has to be "overlay_width + ( 32
- underlay_width )", not "overlay_width + underlay_width".

Patch attached.

** Patch added: "fanatic.patch"
   
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+attachment/4667033/+files/fanatic.patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to ubuntu-fan in Ubuntu.
https://bugs.launchpad.net/bugs/1584092

Title:
  Docker misconfigured when using non-default overlay/underlay netmask
  size

Status in ubuntu-fan package in Ubuntu:
  New

Bug description:
  Fan allows for variable sized subnet map sizes.  For example, if I
  want to map a /24 to a /16 instead of the default /16 to /8, Fan
  supports this.  However, when configuring this via fanatic, I see that
  docker configuration fails.  In /etc/default/docker, the --fixed-cidr
  flag is defined incorrectly.

  $ sudo fanatic
  Welcome to the fanatic fan networking wizard.  This will help you set
  up an example fan network and optionally configure docker and/or LXD touse 
this network.  See fanatic(1) for more details.

  Configure fan underlay (hit return to accept, or specify alternative) 
[192.168.0.0/16]: 192.168.1.0/24
  Configure fan overlay (hit return to accept, or specify alternative) 
[250.0.0.0/8]: 250.99.0.0/16
  Create LXD networking for underlay:192.168.1.0/24 overlay:250.99.0.0/16 [Yn]: 
Y
  Profile fan-250-99 created
  Create docker networking for underlay:192.168.1.0/24 overlay:250.99.0.0/16 
[Yn]: Y
  Job for docker.service failed because the control process exited with error 
code. See "systemctl status docker.service" and "journalctl -xe" for details.
  Test LXD networking for underlay:192.168.1.10/24 overlay:250.99.0.0/16
  (NOTE: potentially triggers large image downloads) [Yn]: n
  Test docker networking for underlay:192.168.1.10/24 overlay:250.99.0.0/16
  (NOTE: potentially triggers large image downloads) [Yn]: n
  This host IP address: 192.168.1.10
  Remote test host IP address (none to skip): 
  /usr/sbin/fanatic: Testing skipped

  $ grep "DOCKER_OPTS" /etc/default/docker
  # Use DOCKER_OPTS to modify the daemon startup options.
  #DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4"
  DOCKER_OPTS=" -b fan-250-99 --mtu=1450 --iptables=false 
--fixed-cidr=250.99.10.0/40"

  May 20 05:15:30 macbook docker[27364]:
  time="2016-05-20T05:15:30.411933688-07:00" level=fatal msg="Error
  starting daemon: Error initializing network controller: invalid CIDR
  address: 250.99.10.0/40"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1584092] Re: Docker misconfigured when using non-default overlay/underlay netmask size

2016-05-20 Thread Jay Vosburgh

I haven't tested this patch, but fanctl had the same issue, and I
believe the fix is that the subnet math has to be "overlay_width + ( 32
- underlay_width )", not "overlay_width + underlay_width".

Patch attached.


** Patch added: "fanatic patch"
   
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+attachment/4667027/+files/fanatic.patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to ubuntu-fan in Ubuntu.
https://bugs.launchpad.net/bugs/1584092

Title:
  Docker misconfigured when using non-default overlay/underlay netmask
  size

Status in ubuntu-fan package in Ubuntu:
  New

Bug description:
  Fan allows for variable sized subnet map sizes.  For example, if I
  want to map a /24 to a /16 instead of the default /16 to /8, Fan
  supports this.  However, when configuring this via fanatic, I see that
  docker configuration fails.  In /etc/default/docker, the --fixed-cidr
  flag is defined incorrectly.

  $ sudo fanatic
  Welcome to the fanatic fan networking wizard.  This will help you set
  up an example fan network and optionally configure docker and/or LXD touse 
this network.  See fanatic(1) for more details.

  Configure fan underlay (hit return to accept, or specify alternative) 
[192.168.0.0/16]: 192.168.1.0/24
  Configure fan overlay (hit return to accept, or specify alternative) 
[250.0.0.0/8]: 250.99.0.0/16
  Create LXD networking for underlay:192.168.1.0/24 overlay:250.99.0.0/16 [Yn]: 
Y
  Profile fan-250-99 created
  Create docker networking for underlay:192.168.1.0/24 overlay:250.99.0.0/16 
[Yn]: Y
  Job for docker.service failed because the control process exited with error 
code. See "systemctl status docker.service" and "journalctl -xe" for details.
  Test LXD networking for underlay:192.168.1.10/24 overlay:250.99.0.0/16
  (NOTE: potentially triggers large image downloads) [Yn]: n
  Test docker networking for underlay:192.168.1.10/24 overlay:250.99.0.0/16
  (NOTE: potentially triggers large image downloads) [Yn]: n
  This host IP address: 192.168.1.10
  Remote test host IP address (none to skip): 
  /usr/sbin/fanatic: Testing skipped

  $ grep "DOCKER_OPTS" /etc/default/docker
  # Use DOCKER_OPTS to modify the daemon startup options.
  #DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4"
  DOCKER_OPTS=" -b fan-250-99 --mtu=1450 --iptables=false 
--fixed-cidr=250.99.10.0/40"

  May 20 05:15:30 macbook docker[27364]:
  time="2016-05-20T05:15:30.411933688-07:00" level=fatal msg="Error
  starting daemon: Error initializing network controller: invalid CIDR
  address: 250.99.10.0/40"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1463911] Re: IPV6 fragmentation and mtu issue

2016-03-03 Thread Jay Vosburgh

** Tags removed: verification-needed-trusty verification-needed-vivid
** Tags added: verification-done-trusty verification-done-vivid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Confirmed
Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Trusty:
  Fix Committed
Status in linux source package in Vivid:
  Fix Committed

Bug description:
  Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes.
  The traffic is goign through an intra-VM network and the packet loss
  is hurting the system.

  There is a patch for this issue:
  http://patchwork.ozlabs.org/patch/434957/

  I would like to know is there any bug report or official release date
  for this issue ?

  This is pretty critical for my deployment.

  Thanks in advance,

  BR,

  Gyula

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1463911] Re: IPV6 fragmentation and mtu issue

2016-03-02 Thread Jay Vosburgh

The Wily kernel (4.2) already contains the fixes for this bug.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Confirmed
Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Trusty:
  Fix Committed
Status in linux source package in Vivid:
  Fix Committed

Bug description:
  Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes.
  The traffic is goign through an intra-VM network and the packet loss
  is hurting the system.

  There is a patch for this issue:
  http://patchwork.ozlabs.org/patch/434957/

  I would like to know is there any bug report or official release date
  for this issue ?

  This is pretty critical for my deployment.

  Thanks in advance,

  BR,

  Gyula

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1463911] Re: IPV6 fragmentation and mtu issue

2016-02-23 Thread Jay Vosburgh

Yes,  the patch has been committed for the next Ubuntu kernel releases.

I have no information on a Centos patch; you would need to file a bug
against Centos or RHEL.

No patch to Neutron is required.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Confirmed
Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Trusty:
  Fix Committed
Status in linux source package in Vivid:
  Fix Committed

Bug description:
  Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes.
  The traffic is goign through an intra-VM network and the packet loss
  is hurting the system.

  There is a patch for this issue:
  http://patchwork.ozlabs.org/patch/434957/

  I would like to know is there any bug report or official release date
  for this issue ?

  This is pretty critical for my deployment.

  Thanks in advance,

  BR,

  Gyula

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1508706] Re: Networking hangs on azure using hv_netvsc; bisected

2015-11-19 Thread Jay Vosburgh

SRU Justification:

Impact:

Bug causes easily reproducible freeze of networking on affected
systems when under moderate to high network load.  Ordinary benchmark
tools such as iperf induce the problem without difficulty.  Affected
systems are virtual machine instances running on Azure, utilizing the
hv_netvsc network device driver.

Fix:

Fix is to apply patch provided by Microsoft:

http://marc.info/?l=linux-kernel=144787522532687=2

Testcase:

Tested as described in Bug Description.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1508706

Title:
  Networking hangs on azure using hv_netvsc; bisected

Status in linux package in Ubuntu:
  Triaged

Bug description:
  
  Running Ubuntu instances on azure, testing basic networking between two 
instances.  This involves configuring VXLAN between the two instances and 
running iperf and rsync of the kernel tree between the instances, e.g.,

  ip link add vxlan0 type vxlan id 999 local 10.88.0.12 remote 10.88.0.11 dev 
eth0
  ip l set vxlan0 up
  ip addr add 242.0.0.12/8 dev vxlan0

  After some time (sometimes instantly, sometimes up to 30 minutes of
  activity), the networking will hang.  This hang takes two forms:  a
  complete loss of connectivity (all network, even the ssh session used
  to log in), or just a loss of connectivity between instances (the ssh
  session remains active).  Sometimes for the latter case, the ssh
  session will then later hang.

  This first appeared when testing with the Ubuntu 3.19 kernel, and I
  subsequently bisected this to:

  commit effa2012d207f78cbc5a8360e62d420a8860b7e9
  Author: KY Srinivasan 
  Date:   Mon May 11 15:39:46 2015 -0700

  hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

  BugLink: http://bugs.launchpad.net/bugs/1454892

  Based on the information given to this driver (via the xmit_more skb 
flag),
  we can defer signaling the host if more packets are on the way. This will 
help
  make the host more efficient since it can potentially process a larger 
batch of
  packets. Implement this optimization.

  Signed-off-by: K. Y. Srinivasan 
  Signed-off-by: David S. Miller 
  Acked-by: Tim Gardner 
  Acked-by: Brad Figg 
  Signed-off-by: Brad Figg 

  I also tested the mainline kernel (net-next); it fails with the
  equivalent commit:

  commit 82fa3c776e5abba7ed6e4b4f4983d14731c37d6a
  Author: KY Srinivasan 
  Date:   Mon May 11 15:39:46 2015 -0700

  hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

  For both kernel trees, I also tested the prior commit and it did not
  exhibit the failure after many hours.  For ubuntu, this was

  commit a4aeb290bd75af5e16a6144a418291476ac6140c
  Author: K. Y. Srinivasan 
  Date:   Wed Mar 18 12:29:29 2015 -0700

  Drivers: hv: vmbus: Export the vmbus_sendpacket_pagebuffer_ctl()

  and for mainline it was

  commit 9eea92226407e7a117ef1ceef45380ebd000a0e2
  Author: Alexei Starovoitov 
  Date:   Mon May 11 15:19:48 2015 -0700

  pktgen: fix packet generation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508706/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1508706] Re: Networking hangs on azure using hv_netvsc; bisected

2015-11-18 Thread Jay Vosburgh

We are testing this patch immediately (overnight US time) and will
report our results as soon as they are available

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1508706

Title:
  Networking hangs on azure using hv_netvsc; bisected

Status in linux package in Ubuntu:
  Triaged

Bug description:
  
  Running Ubuntu instances on azure, testing basic networking between two 
instances.  This involves configuring VXLAN between the two instances and 
running iperf and rsync of the kernel tree between the instances, e.g.,

  ip link add vxlan0 type vxlan id 999 local 10.88.0.12 remote 10.88.0.11 dev 
eth0
  ip l set vxlan0 up
  ip addr add 242.0.0.12/8 dev vxlan0

  After some time (sometimes instantly, sometimes up to 30 minutes of
  activity), the networking will hang.  This hang takes two forms:  a
  complete loss of connectivity (all network, even the ssh session used
  to log in), or just a loss of connectivity between instances (the ssh
  session remains active).  Sometimes for the latter case, the ssh
  session will then later hang.

  This first appeared when testing with the Ubuntu 3.19 kernel, and I
  subsequently bisected this to:

  commit effa2012d207f78cbc5a8360e62d420a8860b7e9
  Author: KY Srinivasan 
  Date:   Mon May 11 15:39:46 2015 -0700

  hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

  BugLink: http://bugs.launchpad.net/bugs/1454892

  Based on the information given to this driver (via the xmit_more skb 
flag),
  we can defer signaling the host if more packets are on the way. This will 
help
  make the host more efficient since it can potentially process a larger 
batch of
  packets. Implement this optimization.

  Signed-off-by: K. Y. Srinivasan 
  Signed-off-by: David S. Miller 
  Acked-by: Tim Gardner 
  Acked-by: Brad Figg 
  Signed-off-by: Brad Figg 

  I also tested the mainline kernel (net-next); it fails with the
  equivalent commit:

  commit 82fa3c776e5abba7ed6e4b4f4983d14731c37d6a
  Author: KY Srinivasan 
  Date:   Mon May 11 15:39:46 2015 -0700

  hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

  For both kernel trees, I also tested the prior commit and it did not
  exhibit the failure after many hours.  For ubuntu, this was

  commit a4aeb290bd75af5e16a6144a418291476ac6140c
  Author: K. Y. Srinivasan 
  Date:   Wed Mar 18 12:29:29 2015 -0700

  Drivers: hv: vmbus: Export the vmbus_sendpacket_pagebuffer_ctl()

  and for mainline it was

  commit 9eea92226407e7a117ef1ceef45380ebd000a0e2
  Author: Alexei Starovoitov 
  Date:   Mon May 11 15:19:48 2015 -0700

  pktgen: fix packet generation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508706/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-18 Thread Jay Vosburgh

The equivalent testing to comment #20 was also performed on the 3.13 and
3.16 kernels, additionally, a customer separately validated the 3.13 and
3.16 patches in their environment.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes.
  The traffic is goign through an intra-VM network and the packet loss
  is hurting the system.

  There is a patch for this issue:
  http://patchwork.ozlabs.org/patch/434957/

  I would like to know is there any bug report or official release date
  for this issue ?

  This is pretty critical for my deployment.

  Thanks in advance,

  BR,

  Gyula

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-18 Thread Jay Vosburgh

Test methodology performed on 3.19 kernel with patch applied:

Host A: fd01:::1/64 direct connect to host C

ip addr add fd01:::1/64 dev eth0

Host B: fd01:::2/64 direct connect to host C

ip addr add fd01:::2/64 dev eth0

host C: direct connect interfaces for Hosts A & B bridged together:

brctl addbr testbr0
brctl addif testbr0 eth1
brctl addif testbr0 eth5
ip link set dev eth1 up
ip link set dev eth5 up
ip link set dev testbr0 up
ip addr add fd01:::99/64 dev testbr0

host A:

continuous ping6 to host C's address beyond the bridge, using size large
enough to generate fragmented IPv6 datagrams for mtu setting of 1500:

ping6 -s 4000 fd01:::2

host C:

load ip6tables_nat:

ip6tables -t nat -Ln

Observe on host A that ping continues uninterrupted

Inspect eth1 and eth5 interfaces on host C with tcpdump to confirm traffic 
passes
through the bridge

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes.
  The traffic is goign through an intra-VM network and the packet loss
  is hurting the system.

  There is a patch for this issue:
  http://patchwork.ozlabs.org/patch/434957/

  I would like to know is there any bug report or official release date
  for this issue ?

  This is pretty critical for my deployment.

  Thanks in advance,

  BR,

  Gyula

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1508706] Re: Networking hangs on azure using hv_netvsc; bisected

2015-11-18 Thread Jay Vosburgh

I have tested the patch referenced in comment #5 and it appears to
resolve the network hang.

I first built and tested the Ubuntu LTS 3.19.0-31.36~14.04.1 kernel and
reproduced the issue using the methodology described in the original bug
description.  This is commit

commit 15e42c329445b4e0f0aecefc39e205c44755c2ba
Author: Luis Henriques 
Date:   Thu Oct 8 10:26:57 2015 +0100

UBUNTU: Ubuntu-lts-3.19.0-31.36~14.04.1

in the lts-backport-vivid branch of git://kernel.ubuntu.com/ubuntu
/ubuntu-trusty.git

I then applied the referenced patch and tested again and was unable to
reproduce the issue after roughly an hour of testing.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1508706

Title:
  Networking hangs on azure using hv_netvsc; bisected

Status in linux package in Ubuntu:
  Triaged

Bug description:
  
  Running Ubuntu instances on azure, testing basic networking between two 
instances.  This involves configuring VXLAN between the two instances and 
running iperf and rsync of the kernel tree between the instances, e.g.,

  ip link add vxlan0 type vxlan id 999 local 10.88.0.12 remote 10.88.0.11 dev 
eth0
  ip l set vxlan0 up
  ip addr add 242.0.0.12/8 dev vxlan0

  After some time (sometimes instantly, sometimes up to 30 minutes of
  activity), the networking will hang.  This hang takes two forms:  a
  complete loss of connectivity (all network, even the ssh session used
  to log in), or just a loss of connectivity between instances (the ssh
  session remains active).  Sometimes for the latter case, the ssh
  session will then later hang.

  This first appeared when testing with the Ubuntu 3.19 kernel, and I
  subsequently bisected this to:

  commit effa2012d207f78cbc5a8360e62d420a8860b7e9
  Author: KY Srinivasan 
  Date:   Mon May 11 15:39:46 2015 -0700

  hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

  BugLink: http://bugs.launchpad.net/bugs/1454892

  Based on the information given to this driver (via the xmit_more skb 
flag),
  we can defer signaling the host if more packets are on the way. This will 
help
  make the host more efficient since it can potentially process a larger 
batch of
  packets. Implement this optimization.

  Signed-off-by: K. Y. Srinivasan 
  Signed-off-by: David S. Miller 
  Acked-by: Tim Gardner 
  Acked-by: Brad Figg 
  Signed-off-by: Brad Figg 

  I also tested the mainline kernel (net-next); it fails with the
  equivalent commit:

  commit 82fa3c776e5abba7ed6e4b4f4983d14731c37d6a
  Author: KY Srinivasan 
  Date:   Mon May 11 15:39:46 2015 -0700

  hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

  For both kernel trees, I also tested the prior commit and it did not
  exhibit the failure after many hours.  For ubuntu, this was

  commit a4aeb290bd75af5e16a6144a418291476ac6140c
  Author: K. Y. Srinivasan 
  Date:   Wed Mar 18 12:29:29 2015 -0700

  Drivers: hv: vmbus: Export the vmbus_sendpacket_pagebuffer_ctl()

  and for mainline it was

  commit 9eea92226407e7a117ef1ceef45380ebd000a0e2
  Author: Alexei Starovoitov 
  Date:   Mon May 11 15:19:48 2015 -0700

  pktgen: fix packet generation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508706/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-17 Thread Jay Vosburgh

** Patch added: "Backport patch for trusty 3.13"
   
https://bugs.launchpad.net/nova/+bug/1463911/+attachment/4520982/+files/ubuntu-trusty-3.13-sru.patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes.
  The traffic is goign through an intra-VM network and the packet loss
  is hurting the system.

  There is a patch for this issue:
  http://patchwork.ozlabs.org/patch/434957/

  I would like to know is there any bug report or official release date
  for this issue ?

  This is pretty critical for my deployment.

  Thanks in advance,

  BR,

  Gyula

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-17 Thread Jay Vosburgh

** Patch added: "Backport patch for trusty 3.16"
   
https://bugs.launchpad.net/nova/+bug/1463911/+attachment/4520983/+files/ubuntu-trusty-3.16-sru.patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes.
  The traffic is goign through an intra-VM network and the packet loss
  is hurting the system.

  There is a patch for this issue:
  http://patchwork.ozlabs.org/patch/434957/

  I would like to know is there any bug report or official release date
  for this issue ?

  This is pretty critical for my deployment.

  Thanks in advance,

  BR,

  Gyula

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-17 Thread Jay Vosburgh

SRU Justification:

Impact:

This bug causes issues when ip6tables modules are loaded with IPv6
fragmented packets traversing a bridge.  The extant conntrack processing
will reassemble the IPv6 fragments for netfilter processing, but is
incapable of re-fragmenting these datagrams for subsequent forwarding.
This causes the fragmented IPv6 datagrams to be dropped.

Fix:

This is resolved by backporting functionality from mainline that
re-fragments the IPv6 datagrams upon bridge egress.

Testcase:

The patch commit log includes a test case; to summarize:

A bridge is configured with two ports and interfaces are attached
to these ports.  A traffic source beyond one port generates fragmented
IPv6 datagrams, e.g., ping6 -s 2000, destined for a host beyond the
bridge.

With ip6tables modules unloaded, the IPv6 fragments will traverse
the bridge.  Loading ip6tables, e.g., "ip6tables -t nat -L", will cause
IPv6 fragmented datagrams to be dropped on the unpatched kernel.

These datagrams are correctly forwarded with the patch applied.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes.
  The traffic is goign through an intra-VM network and the packet loss
  is hurting the system.

  There is a patch for this issue:
  http://patchwork.ozlabs.org/patch/434957/

  I would like to know is there any bug report or official release date
  for this issue ?

  This is pretty critical for my deployment.

  Thanks in advance,

  BR,

  Gyula

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-17 Thread Jay Vosburgh

** Patch added: "Backport patch for vivid 3.19"
   
https://bugs.launchpad.net/nova/+bug/1463911/+attachment/4520984/+files/ubuntu-vivid-sru.patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes.
  The traffic is goign through an intra-VM network and the packet loss
  is hurting the system.

  There is a patch for this issue:
  http://patchwork.ozlabs.org/patch/434957/

  I would like to know is there any bug report or official release date
  for this issue ?

  This is pretty critical for my deployment.

  Thanks in advance,

  BR,

  Gyula

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1508706] Re: Networking hangs on azure using hv_netvsc; bisected

2015-11-09 Thread Jay Vosburgh

Yes, it did, although it seemed to be easier to reproduce with vxlan
configured.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1508706

Title:
  Networking hangs on azure using hv_netvsc; bisected

Status in linux package in Ubuntu:
  Triaged

Bug description:
  
  Running Ubuntu instances on azure, testing basic networking between two 
instances.  This involves configuring VXLAN between the two instances and 
running iperf and rsync of the kernel tree between the instances, e.g.,

  ip link add vxlan0 type vxlan id 999 local 10.88.0.12 remote 10.88.0.11 dev 
eth0
  ip l set vxlan0 up
  ip addr add 242.0.0.12/8 dev vxlan0

  After some time (sometimes instantly, sometimes up to 30 minutes of
  activity), the networking will hang.  This hang takes two forms:  a
  complete loss of connectivity (all network, even the ssh session used
  to log in), or just a loss of connectivity between instances (the ssh
  session remains active).  Sometimes for the latter case, the ssh
  session will then later hang.

  This first appeared when testing with the Ubuntu 3.19 kernel, and I
  subsequently bisected this to:

  commit effa2012d207f78cbc5a8360e62d420a8860b7e9
  Author: KY Srinivasan 
  Date:   Mon May 11 15:39:46 2015 -0700

  hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

  BugLink: http://bugs.launchpad.net/bugs/1454892

  Based on the information given to this driver (via the xmit_more skb 
flag),
  we can defer signaling the host if more packets are on the way. This will 
help
  make the host more efficient since it can potentially process a larger 
batch of
  packets. Implement this optimization.

  Signed-off-by: K. Y. Srinivasan 
  Signed-off-by: David S. Miller 
  Acked-by: Tim Gardner 
  Acked-by: Brad Figg 
  Signed-off-by: Brad Figg 

  I also tested the mainline kernel (net-next); it fails with the
  equivalent commit:

  commit 82fa3c776e5abba7ed6e4b4f4983d14731c37d6a
  Author: KY Srinivasan 
  Date:   Mon May 11 15:39:46 2015 -0700

  hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

  For both kernel trees, I also tested the prior commit and it did not
  exhibit the failure after many hours.  For ubuntu, this was

  commit a4aeb290bd75af5e16a6144a418291476ac6140c
  Author: K. Y. Srinivasan 
  Date:   Wed Mar 18 12:29:29 2015 -0700

  Drivers: hv: vmbus: Export the vmbus_sendpacket_pagebuffer_ctl()

  and for mainline it was

  commit 9eea92226407e7a117ef1ceef45380ebd000a0e2
  Author: Alexei Starovoitov 
  Date:   Mon May 11 15:19:48 2015 -0700

  pktgen: fix packet generation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508706/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1508706] [NEW] Networking hangs on azure using hv_netvsc; bisected

2015-10-21 Thread Jay Vosburgh

Public bug reported:


Running Ubuntu instances on azure, testing basic networking between two 
instances.  This involves configuring VXLAN between the two instances and 
running iperf and rsync of the kernel tree between the instances, e.g.,

ip link add vxlan0 type vxlan id 999 local 10.88.0.12 remote 10.88.0.11 dev eth0
ip l set vxlan0 up
ip addr add 242.0.0.12/8 dev vxlan0

After some time (sometimes instantly, sometimes up to 30 minutes of
activity), the networking will hang.  This hang takes two forms:  a
complete loss of connectivity (all network, even the ssh session used to
log in), or just a loss of connectivity between instances (the ssh
session remains active).  Sometimes for the latter case, the ssh session
will then later hang.

This first appeared when testing with the Ubuntu 3.19 kernel, and I
subsequently bisected this to:

commit effa2012d207f78cbc5a8360e62d420a8860b7e9
Author: KY Srinivasan 
Date:   Mon May 11 15:39:46 2015 -0700

hv_netvsc: Use the xmit_more skb flag to optimize signaling the host

BugLink: http://bugs.launchpad.net/bugs/1454892

Based on the information given to this driver (via the xmit_more skb flag),
we can defer signaling the host if more packets are on the way. This will 
help
make the host more efficient since it can potentially process a larger 
batch of
packets. Implement this optimization.

Signed-off-by: K. Y. Srinivasan 
Signed-off-by: David S. Miller 
Acked-by: Tim Gardner 
Acked-by: Brad Figg 
Signed-off-by: Brad Figg 

I also tested the mainline kernel (net-next); it fails with the
equivalent commit:

commit 82fa3c776e5abba7ed6e4b4f4983d14731c37d6a
Author: KY Srinivasan 
Date:   Mon May 11 15:39:46 2015 -0700

hv_netvsc: Use the xmit_more skb flag to optimize signaling the host

For both kernel trees, I also tested the prior commit and it did not
exhibit the failure after many hours.  For ubuntu, this was

commit a4aeb290bd75af5e16a6144a418291476ac6140c
Author: K. Y. Srinivasan 
Date:   Wed Mar 18 12:29:29 2015 -0700

Drivers: hv: vmbus: Export the vmbus_sendpacket_pagebuffer_ctl()

and for mainline it was

commit 9eea92226407e7a117ef1ceef45380ebd000a0e2
Author: Alexei Starovoitov 
Date:   Mon May 11 15:19:48 2015 -0700

pktgen: fix packet generation

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1508706

Title:
  Networking hangs on azure using hv_netvsc; bisected

Status in linux package in Ubuntu:
  New

Bug description:
  
  Running Ubuntu instances on azure, testing basic networking between two 
instances.  This involves configuring VXLAN between the two instances and 
running iperf and rsync of the kernel tree between the instances, e.g.,

  ip link add vxlan0 type vxlan id 999 local 10.88.0.12 remote 10.88.0.11 dev 
eth0
  ip l set vxlan0 up
  ip addr add 242.0.0.12/8 dev vxlan0

  After some time (sometimes instantly, sometimes up to 30 minutes of
  activity), the networking will hang.  This hang takes two forms:  a
  complete loss of connectivity (all network, even the ssh session used
  to log in), or just a loss of connectivity between instances (the ssh
  session remains active).  Sometimes for the latter case, the ssh
  session will then later hang.

  This first appeared when testing with the Ubuntu 3.19 kernel, and I
  subsequently bisected this to:

  commit effa2012d207f78cbc5a8360e62d420a8860b7e9
  Author: KY Srinivasan 
  Date:   Mon May 11 15:39:46 2015 -0700

  hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

  BugLink: http://bugs.launchpad.net/bugs/1454892

  Based on the information given to this driver (via the xmit_more skb 
flag),
  we can defer signaling the host if more packets are on the way. This will 
help
  make the host more efficient since it can potentially process a larger 
batch of
  packets. Implement this optimization.

  Signed-off-by: K. Y. Srinivasan 
  Signed-off-by: David S. Miller 
  Acked-by: Tim Gardner 
  Acked-by: Brad Figg 
  Signed-off-by: Brad Figg 

  I also tested the mainline kernel (net-next); it fails with the
  equivalent commit:

  commit 82fa3c776e5abba7ed6e4b4f4983d14731c37d6a
  Author: KY Srinivasan 
  Date:   Mon May 11 15:39:46 2015 -0700

  hv_netvsc: Use the xmit_more skb flag to optimize signaling the
  host

  For both kernel trees, I also tested the prior commit and it did not
  exhibit the failure after many hours.  For ubuntu,

[Kernel-packages] [Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-10-09 Thread Jay Vosburgh

The original patch had an error in it; I believe I've found it and once
I verify that and clean it up a bit I"ll attach it to the bug.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes.
  The traffic is goign through an intra-VM network and the packet loss
  is hurting the system.

  There is a patch for this issue:
  http://patchwork.ozlabs.org/patch/434957/

  I would like to know is there any bug report or official release date
  for this issue ?

  This is pretty critical for my deployment.

  Thanks in advance,

  BR,

  Gyula

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1502238] Re: bridge does not forward neighbor solicitation packets

2015-10-06 Thread Jay Vosburgh

I set up a similar configuration locally, and I see the bridge correctly
forwarding the IPv6 NS packets.  The ping functions as expected.  I have
different network cards, and used IPv6 ULA addresses (fc00:1234::/64)
but I'm not sure how that would affect the bridge forwarding decision.

I'm also not sure what exactly is meant by your statement "Adding a host
route for the 2001:: IP via the link IP"; I don't see any other
reference to a 2001:: address.  Could you clarify what this refers to?

Also, for completeness, can you insure that there are no bridge table
rules installed?  This would be in the output of

ebtables -t filter -L
ebtables -t nat -L
ebtables -t broute -L

I would also suggest disabling the bridge callouts to arptables,
ip6tables and iptables to see if that affects the behavior.  This would
be done via

sysctl -w net.bridge.bridge-nf-call-arptables=0
sysctl -w net.bridge.bridge-nf-call-ip6tables=0
sysctl -w net.bridge.bridge-nf-call-iptables=0

(all of the above sysctl and ebtables commands need to be done as root)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502238

Title:
  bridge does not forward neighbor solicitation packets

Status in linux package in Ubuntu:
  Triaged

Bug description:
  3 hosts involved here:
  kailan is connected to a cisco switch, which is also connected to kurrat 
(eth3), which is running a bridge with tigernut connected to eth1.

  kurrat's controllers are 06:00.0 Ethernet controller: Intel
  Corporation 82574L Gigabit Network Connection, using the e1000e driver
  (3.13.0-65-generic kernel)

  (while kailan is doing a ping6
  2601:282:8100:3500:82ee:73ff:fe99:368d):

  +kurrat 324 : sudo tcpdump -eni eth3 ip6 and not tcp and not udp
  tcpdump: WARNING: eth3: no IPv4 address assigned
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on eth3, link-type EN10MB (Ethernet), capture size 65535 bytes
  10:39:16.080888 00:1c:c0:83:32:40 > 33:33:ff:99:36:8d, ethertype IPv6 
(0x86dd), length 86: 2601:282:8100:3500::1 > ff02::1:ff99:368d: ICMP6, neighbor 
solicitation, who has 2601:282:8100:3500:82ee:73ff:fe99:368d, length 32
  10:39:16.431484 00:1c:c0:83:32:40 > 33:33:00:00:00:01, ethertype IPv6 
(0x86dd), length 110: fe80::21c:c0ff:fe83:3240 > ff02::1: ICMP6, router 
advertisement, length 56
  10:39:17.077446 00:1c:c0:83:32:40 > 33:33:ff:99:36:8d, ethertype IPv6 
(0x86dd), length 86: 2601:282:8100:3500::1 > ff02::1:ff99:368d: ICMP6, neighbor 
solicitation, who has 2601:282:8100:3500:82ee:73ff:fe99:368d, length 32
  10:39:18.077457 00:1c:c0:83:32:40 > 33:33:ff:99:36:8d, ethertype IPv6 
(0x86dd), length 86: 2601:282:8100:3500::1 > ff02::1:ff99:368d: ICMP6, neighbor 
solicitation, who has 2601:282:8100:3500:82ee:73ff:fe99:368d, length 32
  10:39:19.095034 00:1c:c0:83:32:40 > 33:33:ff:99:36:8d, ethertype IPv6 
(0x86dd), length 86: 2601:282:8100:3500::1 > ff02::1:ff99:368d: ICMP6, neighbor 
solicitation, who has 2601:282:8100:3500:82ee:73ff:fe99:368d, length 32
  10:39:20.093436 00:1c:c0:83:32:40 > 33:33:ff:99:36:8d, ethertype IPv6 
(0x86dd), length 86: 2601:282:8100:3500::1 > ff02::1:ff99:368d: ICMP6, neighbor 
solicitation, who has 2601:282:8100:3500:82ee:73ff:fe99:368d, length 32
  10:39:21.093425 00:1c:c0:83:32:40 > 33:33:ff:99:36:8d, ethertype IPv6 
(0x86dd), length 86: 2601:282:8100:3500::1 > ff02::1:ff99:368d: ICMP6, neighbor 
solicitation, who has 2601:282:8100:3500:82ee:73ff:fe99:368d, length 32
  10:39:21.43 00:1c:c0:83:32:40 > 33:33:00:00:00:01, ethertype IPv6 
(0x86dd), length 110: fe80::21c:c0ff:fe83:3240 > ff02::1: ICMP6, router 
advertisement, length 56
  10:39:22.111042 00:1c:c0:83:32:40 > 33:33:ff:99:36:8d, ethertype IPv6 
(0x86dd), length 86: 2601:282:8100:3500::1 > ff02::1:ff99:368d: ICMP6, neighbor 
solicitation, who has 2601:282:8100:3500:82ee:73ff:fe99:368d, length 32
  ^C
  10 packets captured
  11 packets received by filter
  0 packets dropped by kernel
  +kurrat 325 : sudo tcpdump -eni eth1 ip6 and not tcp and not udp
  tcpdump: WARNING: eth1: no IPv4 address assigned
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
  10:39:28.201110 00:1c:c0:83:32:40 > 33:33:00:00:00:01, ethertype IPv6 
(0x86dd), length 110: fe80::21c:c0ff:fe83:3240 > ff02::1: ICMP6, router 
advertisement, length 56
  10:39:31.552677 00:1c:c0:83:32:40 > 33:33:00:00:00:01, ethertype IPv6 
(0x86dd), length 110: fe80::21c:c0ff:fe83:3240 > ff02::1: ICMP6, router 
advertisement, length 56
  10:39:38.103919 08:10:78:fc:b3:d2 > 33:33:00:00:00:01, ethertype IPv6 
(0x86dd), length 90: fe80::a10:78ff:fefc:b3d2 > ff02::1: HBH ICMP6, multicast 
listener query v2 [gaddr ::], length 28
  10:39:39.663357 00:1c:c0:83:32:40 > 33:33:00:00:00:01, ethertype IPv6 
(0x86dd), length 110: fe80::21c:c0ff:fe83:3240 > ff02::1: ICMP6, router 
advertisement,

[Kernel-packages] [Bug 1497812] Re: i40e bug: non physical MAC outbound frames appear as copied back inbound (mirrored)

2015-09-29 Thread Jay Vosburgh

Just looking at the log, it might be this:

commit fa11cb3d16a9b9b296a2b811a49faf1356240348
Author: Anjali Singhai Jain 
Date:   Wed May 27 12:06:14 2015 -0400

i40e: Make sure to be in VEB mode if SRIOV is enabled at probe

If SRIOV is enabled we need to be in VEB mode not VEPA mode at probe.
This fixes an NPAR bug when SRIOV is enabled in the BIOS.

Change-ID: Ibf006abafd9a0ca3698ec24848cd771cf345cbbc
Signed-off-by: Anjali Singhai Jain 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1497812

Title:
  i40e bug: non physical MAC outbound frames appear as copied back
  inbound  (mirrored)

Status in linux package in Ubuntu:
  Triaged
Status in linux-lts-vivid package in Ubuntu:
  Confirmed

Bug description:
  Using 3.19.0-28-generic #30~14.04.1-Ubuntu with stock i40e
  driver version 2.2.2-k makes every 'non physical' MAC output
  frame appear as copied back at input, as if the switch was
  doing frame 'mirroring' (and/or hair-pinning).

  FYI same setup, with i40e upgraded to 1.2.48 from
  http://downloadmirror.intel.com/25282/eng/i40e-1.2.48.tar.gz
  behaves OK, fyi also we did a port mirroring setup at
  the switch directed to a different physical port for debugging,
  and didn't observe these frames to be physically present.

  See tcpdump -P in/out and more details at
  http://paste.ubuntu.com/12511680/

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-image-3.19.0-28-generic 3.19.0-28.30~14.04.1
  ProcVersionSignature: Ubuntu 3.19.0-28.30~14.04.1-generic 3.19.8-ckt5
  Uname: Linux 3.19.0-28-generic x86_64
  ApportVersion: 2.14.1-0ubuntu3.13
  Architecture: amd64
  Date: Mon Sep 21 02:05:28 2015
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-lts-vivid
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1497812/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-09-16 Thread Jay Vosburgh

I have done a backport of

commit efb6de9b4ba0092b2c55f6a52d16294a8a698edd
Author: Bernhard Thaler 
Date:   Sat May 30 15:30:16 2015 +0200

netfilter: bridge: forward IPv6 fragmented packets

to the trusty 3.13 kernel.  This necessitated pulling in some bits from
other patches as well. I am currently testing for regressions and will
submit it for SRU if all goes well.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes.
  The traffic is goign through an intra-VM network and the packet loss
  is hurting the system.

  There is a patch for this issue:
  http://patchwork.ozlabs.org/patch/434957/

  I would like to know is there any bug report or official release date
  for this issue ?

  This is pretty critical for my deployment.

  Thanks in advance,

  BR,

  Gyula

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1442828] [NEW] change for LP 1425376 breaks systemd After=network-online.target

2015-04-10 Thread Jay Vosburgh

Public bug reported:


The change to ifup@.service done as part of LP 1425376 appears to break the 
ordering of units marked as After=network-online.target.  In my specific 
case, a new service script with After=network-online.target is erroneously 
run concurrently with dhclient.  As the new script depends on networking 
configuration being complete, it fails as the IP addresses and routes from DHCP 
are not configured.  This functioned correctly on vivid daily images from a few 
days ago, and appears to break starting with the vivid daily from approximately 
0409.

Infinity suggested this change as a likely suspect:

diff -Nru systemd-219/debian/extra/units/ifup@.service 
systemd-219/debian/extra/units/ifup@.service
--- systemd-219/debian/extra/units/ifup@.service2015-04-02 
08:08:56.0 +
+++ systemd-219/debian/extra/units/ifup@.service2015-04-07 
14:38:38.0 +
@@ -6,10 +6,8 @@
 DefaultDependencies=no
 
 [Service]
-Type=oneshot
-ExecStart=/sbin/ifup --allow=hotplug %I
-ExecStartPost=/sbin/ifup --allow=auto %I
 # only fail if ifupdown knows about the iface AND it's not up
-ExecStartPost=/bin/sh -c 'if ifquery %I /dev/null; then ifquery --state %I 
/dev/null; fi'
+ExecStart=/bin/sh -ec 'ifup --allow=hotplug %I; ifup --allow=auto %I; \
+if ifquery %I /dev/null; then ifquery --state %I /dev/null; fi'
 ExecStop=/sbin/ifdown %I
 RemainAfterExit=true

and, indeed, reverting this (copying ifup@.service from a few-days old
vivid image to a current image) resolves the problem.

The affected version is  ubuntu-vivid-daily-amd64-server-20150409.2
(installed via AWS).

** Affects: systemd (Ubuntu)
 Importance: Undecided
 Status: New

** Package changed: linux (Ubuntu) = systemd (Ubuntu)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1442828

Title:
  change for LP 1425376 breaks systemd After=network-online.target

Status in systemd package in Ubuntu:
  New

Bug description:
  
  The change to ifup@.service done as part of LP 1425376 appears to break the 
ordering of units marked as After=network-online.target.  In my specific 
case, a new service script with After=network-online.target is erroneously 
run concurrently with dhclient.  As the new script depends on networking 
configuration being complete, it fails as the IP addresses and routes from DHCP 
are not configured.  This functioned correctly on vivid daily images from a few 
days ago, and appears to break starting with the vivid daily from approximately 
0409.

  Infinity suggested this change as a likely suspect:

  diff -Nru systemd-219/debian/extra/units/ifup@.service 
systemd-219/debian/extra/units/ifup@.service
  --- systemd-219/debian/extra/units/ifup@.service  2015-04-02 
08:08:56.0 +
  +++ systemd-219/debian/extra/units/ifup@.service  2015-04-07 
14:38:38.0 +
  @@ -6,10 +6,8 @@
   DefaultDependencies=no
   
   [Service]
  -Type=oneshot
  -ExecStart=/sbin/ifup --allow=hotplug %I
  -ExecStartPost=/sbin/ifup --allow=auto %I
   # only fail if ifupdown knows about the iface AND it's not up
  -ExecStartPost=/bin/sh -c 'if ifquery %I /dev/null; then ifquery --state %I 
/dev/null; fi'
  +ExecStart=/bin/sh -ec 'ifup --allow=hotplug %I; ifup --allow=auto %I; \
  +if ifquery %I /dev/null; then ifquery --state %I /dev/null; fi'
   ExecStop=/sbin/ifdown %I
   RemainAfterExit=true

  and, indeed, reverting this (copying ifup@.service from a few-days old
  vivid image to a current image) resolves the problem.

  The affected version is  ubuntu-vivid-daily-amd64-server-20150409.2
  (installed via AWS).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1442828/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1409123] [NEW] hw csum failure in encapsulated network topolgies

2015-01-09 Thread Jay Vosburgh

Public bug reported:

Virtualized network topologies that utilize encapsulation (e.g., VXLAN)
and bridging  may experience kernel errors of the format:

[ 4297.761899] eth0: hw csum failure
[ 4297.765210] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   OE  3.18.0-rc4
-nn+ #22
[ 4297.765212] Hardware name: LENOVO 0829F3U/To be filled by O.E.M., BIOS 90KT15
AUS 07/21/2010
[ 4297.765216]   88013fc03ba8 8172f026 0
001
[ 4297.765219]  88013870e000 88013fc03bc8 8162ba52 8161c
1a0
[ 4297.765221]  8800afdf1000 88013fc03c08 8162325c 88013870e
000
[ 4297.765223] Call Trace:
[ 4297.765224]  IRQ  [8172f026] dump_stack+0x46/0x58
[ 4297.765235]  [8162ba52] netdev_rx_csum_fault+0x42/0x50
[ 4297.765238]  [8161c1a0] ? skb_push+0x40/0x40
[ 4297.765240]  [8162325c] __skb_checksum_complete+0xbc/0xd0
[ 4297.765243]  [8168c602] tcp_v4_rcv+0x2e2/0x950
[ 4297.765246]  [81666ca0] ? ip_rcv_finish+0x360/0x360
[ 4297.765248]  [81660224] ? nf_hook_slow+0x74/0x130
[ 4297.765250]  [81666ca0] ? ip_rcv_finish+0x360/0x360
[ 4297.765253]  [81666d4c] ip_local_deliver_finish+0xac/0x220
[ 4297.765255]  [81667058] ip_local_deliver+0x48/0x80
[ 4297.765257]  [816669c1] ip_rcv_finish+0x81/0x360
[ 4297.765259]  [81667332] ip_rcv+0x2a2/0x3f0
[ 4297.765261]  [8162e932] __netif_receive_skb_core+0x562/0x7a0
[ 4297.765263]  [8162eb88] __netif_receive_skb+0x18/0x60
[ 4297.765265]  [8162f8f6] process_backlog+0xa6/0x150

The backtrace may vary, stacks descending into conntrack have also been
observed:

Call Trace:
 IRQ  [8171a324] dump_stack+0x45/0x56
 [8161bfba] netdev_rx_csum_fault+0x3a/0x40
 [81614782] __skb_checksum_complete_head+0x62/0x70
 [816147a1] __skb_checksum_complete+0x11/0x20
 [816a3eac] nf_ip_checksum+0xcc/0x100
 [a04df33b] udp_error+0xdb/0x1f0 [nf_conntrack]
 [a04d926e] nf_conntrack_in+0xee/0xb40 [nf_conntrack]
 [a0307653] ? do_execute_actions+0x2e3/0xab0 [openvswitch]
 [a0307e4b] ? ovs_execute_actions+0x2b/0x30 [openvswitch]
 [81654540] ? inet_del_offload+0x40/0x40
 [a03b52e2] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
 [8164e0aa] nf_iterate+0x9a/0xb0
 [81654540] ? inet_del_offload+0x40/0x40
 [8164e134] nf_hook_slow+0x74/0x130
 [81654540] ? inet_del_offload+0x40/0x40
 [81654f68] ip_rcv+0x2f8/0x3d0

The root cause of this is twofold:

First, the kernel handling of forwarded packets that have been
encapsulated (e.g., from VXLAN) for devices that support
CHECKSUM_COMPLETE checksum offload fails to update the running checksum
when decapsulating the packet.

Second, for the enic device itself, the hardware is not correctly
computing the checksum for some cases.

Both of these issues are patched in mainline:

commit 17e96834fd35997ca7cdfbf15413bcd5a36ad448 
Author: Govindarajulu Varadarajan _gov...@gmx.com 
Date: Thu Dec 18 15:58:42 2014 +0530 

enic: fix rx skb checksum

commit 2c26d34bbcc0b3f30385d5587aa232289e2eed8e 
Author: Jay Vosburgh jay.vosbu...@canonical.com 
Date: Fri Dec 19 15:32:00 2014 -0800 

net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding

** Affects: linux (Ubuntu)
 Importance: Undecided
 Assignee: Jay Vosburgh (jvosburgh)
 Status: New

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) = Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1409123

Title:
  hw csum failure in encapsulated network topolgies

Status in linux package in Ubuntu:
  New

Bug description:
  Virtualized network topologies that utilize encapsulation (e.g.,
  VXLAN) and bridging  may experience kernel errors of the format:

  [ 4297.761899] eth0: hw csum failure
  [ 4297.765210] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   OE  
3.18.0-rc4
  -nn+ #22
  [ 4297.765212] Hardware name: LENOVO 0829F3U/To be filled by O.E.M., BIOS 
90KT15
  AUS 07/21/2010
  [ 4297.765216]   88013fc03ba8 8172f026 
0
  001
  [ 4297.765219]  88013870e000 88013fc03bc8 8162ba52 
8161c
  1a0
  [ 4297.765221]  8800afdf1000 88013fc03c08 8162325c 
88013870e
  000
  [ 4297.765223] Call Trace:
  [ 4297.765224]  IRQ  [8172f026] dump_stack+0x46/0x58
  [ 4297.765235]  [8162ba52] netdev_rx_csum_fault+0x42/0x50
  [ 4297.765238]  [8161c1a0] ? skb_push+0x40/0x40
  [ 4297.765240]  [8162325c] __skb_checksum_complete+0xbc/0xd0
  [ 4297.765243]  [8168c602] tcp_v4_rcv+0x2e2/0x950
  [ 4297.765246]  [81666ca0] ? ip_rcv_finish+0x360/0x360
  [ 4297.765248]  [81660224] ? nf_hook_slow+0x74/0x130
  [ 4297.765250]  [81666ca0] ? ip_rcv_finish+0x360/0x360

[Kernel-packages] [Bug 1233175] Re: Kernel panic : mempolicy potential use-after-free on server running mongodb

2014-08-05 Thread Jay Vosburgh

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) = Jay Vosburgh (jvosburgh)

** Changed in: linux (Ubuntu Precise)
 Assignee: (unassigned) = Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1233175

Title:
  Kernel panic : mempolicy potential use-after-free on server running
  mongodb

Status in “linux” package in Ubuntu:
  In Progress
Status in “linux” source package in Precise:
  In Progress

Bug description:
  PID: 21767 TASK: 8800874bdc00 CPU: 12 COMMAND: mongod
   #0 [880657cc3820] machine_kexec at 810393da
   #1 [880657cc3890] crash_kexec at 810b53f8
   #2 [880657cc3960] oops_end at 8165e528
   #3 [880657cc3990] die at 810178d8
   #4 [880657cc39c0] do_trap at 8165de94
   #5 [880657cc3a20] do_invalid_op at 81014f65
   #6 [880657cc3ac0] invalid_op at 8166796b
  [exception RIP: slab_node+46]
  RIP: 8115a66e RSP: 880657cc3b70 RFLAGS: 00010097
  RAX:  RBX: 880657802c00 RCX: e62f6aef
  RDX:  RSI: 0020 RDI: 880abf18a288
  RBP: 880657cc3b80 R8: 0001 R9: 000100100010
  R10:  R11: 0022 R12: 0002
  R13:  R14:  R15: 0020
  ORIG_RAX:  CS: 0010 SS: 0018
   #7 [880657cc3b88] get_any_partial at 816496a0
   #8 [880657cc3c18] __slab_alloc at 816498cf
   #9 [880657cc3cc8] __kmalloc_node_track_caller at 81166f07
  #10 [880657cc3d38] __alloc_skb at 815364c8
  #11 [880657cc3d88] __netdev_alloc_skb at 81536b14
  #12 [880657cc3da8] enic_rq_alloc_buf at a005484c [enic]
  #13 [880657cc3e08] enic_poll_msix at a00559ff [enic]
  #14 [880657cc3e58] net_rx_action at 81545274
  #15 [880657cc3ec8] __do_softirq at 8106f5f8
  #16 [880657cc3f38] call_softirq at 81667bec
  #17 [880657cc3f50] do_softirq at 81016305
  #18 [880657cc3f70] irq_exit at 8106f9de
  #19 [880657cc3f80] do_IRQ at 816684a3
  --- IRQ stack ---
  #20 [880544d8bd48] ret_from_intr at 8165d82e
  [exception RIP: __slab_free+737]
  RIP: 81649467 RSP: 880544d8bdf8 RFLAGS: 0202
  RAX: 0001 RBX: ff0a0210 RCX: 000180aa00a9
  RDX: 000180aa00aa RSI: ea002afc6201 RDI: 880657806200
  RBP: 880544d8bea8 R8: 0001 R9: 
  R10: 8800874be020 R11: 8800874be030 R12: 880544d8be33
  R13: 000d R14: 81191895 R15: 880544d8bdb8
  ORIG_RAX: ff54 CS: 0010 SS: 0018
  #21 [880544d8be30] __change_pid at 81087dca
  #22 [880544d8beb0] kmem_cache_free at 81163634
  #23 [880544d8bef0] __mpol_put at 81159937
  #24 [880544d8bf00] do_exit at 8106c75c
  #25 [880544d8bf70] sys_exit at 8106caf7
  #26 [880544d8bf80] system_call_fastpath at 81665982
  RIP: 7f6f476b8f37 RSP: 7f68cbcfdbb0 RFLAGS: 0202
  RAX: 003c RBX: 81665982 RCX: 
  RDX: 7f68cbcfe700 RSI: 7f6f478c9250 RDI: 
  RBP:  R8: 7f68cbcfe700 R9: 7f68e82a0370
  R10: 7fff R11: 0246 R12: 8106caf7
  R13: 880544d8bf78 R14: 0003 R15: 7f68f8744a10
  ORIG_RAX: ...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1233175/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1344323] [NEW] Trusty kernel network performance regression

2014-07-18 Thread Jay Vosburgh

Public bug reported:

SRU Justification:

Impact:

Reduced TCP/IP receive performance for network devices that do not split
packet headers into skb linear area (e.g., mlx4).  The trusty kernel has
incorporated

commit eff44f9cc9a02aad53d568d3ae5020b6792ae4f6
Author: Jerry Chu hk...@google.com
Date:   Wed Dec 11 20:53:45 2013 -0800

net-gro: Prepare GRO stack for the upcoming tunneling support

which modifies the GRO frag0 optimization, but unfortunately for some
cases results in calls to __skb_pull_tail for every packet being
received via the GRO path.  This causes a reduction in TCP receive
performance (or, more accurately, an increase in CPU load for TCP
receive processing, which will cause throughput reduction for CPU
limited workloads).

Fix:

This has already been fixed in mainline in

commit a50e233c50dbc881abaa0e4070789064e8d12d70
Author: Eric Dumazet eduma...@google.com
Date:   Sat Mar 29 21:28:21 2014 -0700

net-gro: restore frag0 optimization

The fix has been backported to and verified on the trusty kernel using
mlx4 devices and iperf; an increase from 7.5 to 8.5 Gb/sec was observed
when adding the patch, and the relevant portion of perf captures show
changes in the call paths from:

 7.17%iperf  [kernel.kallsyms]   [k] __pskb_pull_tail   

  |
  --- __pskb_pull_tail
 |  
 |--48.03%-- tcp_gro_receive
 |  tcp4_gro_receive
 |  inet_gro_receive
 |  dev_gro_receive
 |  napi_gro_frags
 |  mlx4_en_process_rx_cq
 |  mlx4_en_poll_rx_cq
 |  net_rx_action
 |  __do_softirq
[...]
 |--28.53%-- napi_gro_frags
 |  mlx4_en_process_rx_cq
 |  mlx4_en_poll_rx_cq
 |  net_rx_action
 |  __do_softirq
[...]
 |--13.11%-- inet_gro_receive
 |  dev_gro_receive
 |  napi_gro_frags
 |  mlx4_en_process_rx_cq
 |  mlx4_en_poll_rx_cq
 |  net_rx_action
 |  __do_softirq

to:

 4.87%  iperf  [kernel.kallsyms]   [k] skb_gro_receive  
  
|
--- skb_gro_receive
   |  
   |--98.13%-- tcp_gro_receive
   |  tcp4_gro_receive
   |  inet_gro_receive
   |  dev_gro_receive
   |  napi_gro_frags
   |  mlx4_en_process_rx_cq
   |  mlx4_en_poll_rx_cq
   |  net_rx_action
   |  __do_softirq

Testcase:

The fix was tested using mlx4 10Gb/sec network devices between two arm64
systems using iperf -s on one end and iperf -c on the other.  The
unmodified kernel reported approximately 7.5 Gb/sec throughput, the
fixed kernel approximately 8.5 Gb/sec.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1344323

Title:
  Trusty kernel network performance regression

Status in “linux” package in Ubuntu:
  New

Bug description:
  SRU Justification:

  Impact:

  Reduced TCP/IP receive performance for network devices that do not
  split packet headers into skb linear area (e.g., mlx4).  The trusty
  kernel has incorporated

  commit eff44f9cc9a02aad53d568d3ae5020b6792ae4f6
  Author: Jerry Chu hk...@google.com
  Date:   Wed Dec 11 20:53:45 2013 -0800

  net-gro: Prepare GRO stack for the upcoming tunneling support

  which modifies the GRO frag0 optimization, but unfortunately for some
  cases results in calls to __skb_pull_tail for every packet being
  received via the GRO path.  This causes a reduction in TCP receive
  performance (or, more accurately, an increase in CPU load for TCP
  receive processing, which will cause throughput reduction for CPU
  limited workloads).

  Fix:

  This has already been fixed in mainline in

  commit a50e233c50dbc881abaa0e4070789064e8d12d70
  Author: Eric Dumazet eduma...@google.com
  Date:   Sat Mar 29 21:28:21 2014 -0700

  net-gro: restore frag0 optimization

  The fix has been backported to and verified on the trusty kernel using
  mlx4 devices and iperf; an increase from 7.5 to 8.5 Gb/sec was
  observed when adding the patch, and the relevant portion

65 matches

Mail list logo