[Bug 1233175] Re: Kernel panic : mempolicy potential use-after-free on server running mongodb

2014-08-05 Thread Jay Vosburgh
** Changed in: linux (Ubuntu)
 Assignee: (unassigned) = Jay Vosburgh (jvosburgh)

** Changed in: linux (Ubuntu Precise)
 Assignee: (unassigned) = Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1233175

Title:
  Kernel panic : mempolicy potential use-after-free on server running
  mongodb

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1233175/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1327466] [NEW] on login, X goes to blank screen, regression in 14.04

2014-06-06 Thread Jay Vosburgh
Public bug reported:

Note: this occurs when booting the 14.04 desktop installer ISO as well,
not just on the installed system.  Booting any of 13.10, 12.04 or Fedora
20 does not exhibit the same problem on the same machines.

For the installed system, after logging in, the desktop comes up and
works for a few seconds (~10) and then the screen goes blank (and
switches to power save).  Switching (Ctl-Alt-1) to a text console and
back will sometimes bring the graphics display back, and sometimes
won't.  When the display is restored, it exhibits one or more of the
following:

- popup  could not switch the monitor configuration, could not set the
configuration for CRTC 63

- executing xrandr (no arguments) causes the display resolution to
flip between 1280x1024 and 1440x900

- works for a short while, then goes black again

Examination of the Xorg.0.log shows that during the initial blank screen
time, X is rapidly cycling back and forth between those two resolutions:

[37.124] (II) intel(0): resizing framebuffer to 1280x1024
[37.128] (II) intel(0): switch to mode 1280x1024@76.0 on VGA1 using pipe 0, 
position (0, 0), rotation normal, reflection none
[37.476] (II) intel(0): resizing framebuffer to 1440x900
[37.476] (II) intel(0): switch to mode 1440x900@59.9 on VGA1 using pipe 0, 
position (0, 0), rotation normal, reflection none
[37.784] (II) intel(0): resizing framebuffer to 1280x1024
[37.785] (II) intel(0): switch to mode 1280x1024@76.0 on VGA1 using pipe 0, 
position (0, 0), rotation normal, reflection none
[...]

This has occurred with two machines (with similar hardware, Intel Core 2
Duo with embedded graphics) connected to a particular monitor (a Lenovo
L194, 4434-HB6).

% apt-cache policy xorg
xorg:
  Installed: 1:7.7+1ubuntu8
  Candidate: 1:7.7+1ubuntu8
  Version table:
 *** 1:7.7+1ubuntu8 0
500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages
100 /var/lib/dpkg/status


I'm not sure if the EDID is included in the debug stuff; if not, here it is:
% xrandr --prop
Screen 0: minimum 320 x 200, current 1280 x 1024, maximum 32767 x 32767
VGA1 connected primary 1280x1024+0+0 (normal left inverted right x axis y axis) 
410mm x 257mm
EDID: 
000030ae521101010101
131201036c291a78eee5b5a355499927
135054afcf10314681c0818a818c8190
9500950f714f9a29a0d0518422305098
36009a01111c00fc004c3139
3420576964650a20202000fd0032
4c1e510e000a20202020202000ff
0036563644323531380a20202020006f
   1440x900   59.9 +   75.0  
   1280x1024  76.0*75.0 72.0 70.0  
   1152x864   75.0  
   1280x720   60.0  
   1024x768   75.1 70.1 60.0  
   800x60072.2 75.0 60.3 56.2  
   640x48075.0 72.8 66.0 60.0  
   720x40070.1  
VIRTUAL1 disconnected (normal left inverted right x axis y axis)

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: xorg 1:7.7+1ubuntu8
ProcVersionSignature: Ubuntu 3.13.0-27.50-generic 3.13.11
Uname: Linux 3.13.0-27-generic x86_64
.tmp.unity.support.test.0:
 
ApportVersion: 2.14.1-0ubuntu3.2
Architecture: amd64
CompizPlugins: No value set for 
`/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: compiz
CompositorUnredirectDriverBlacklist: '(nouveau|Intel).*Mesa 8.0'
CompositorUnredirectFSW: true
CurrentDesktop: Unity
Date: Fri Jun  6 17:13:33 2014
DistUpgraded: Fresh install
DistroCodename: trusty
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, including running git bisection searches
GraphicsCard:
 Intel Corporation 4 Series Chipset Integrated Graphics Controller [8086:2e32] 
(rev 03) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:305d]
InstallationDate: Installed on 2014-05-28 (9 days ago)
InstallationMedia: Ubuntu 14.04 LTS Trusty Tahr - Release amd64 (20140417)
MachineType: LENOVO 0829F3U
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-27-generic 
root=UUID=97f50aaf-f549-4fcd-8135-3f6fdafd3737 ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/21/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 90KT15AUS
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: To be filled by O.E.M.
dmi.board.vendor: LENOVO
dmi.board.version: To be filled by O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: 
dmi:bvnLENOVO:bvr90KT15AUS:bd07/21/2010:svnLENOVO:pn0829F3U:pvrThinkCentreM70e:rvnLENOVO:rnTobefilledbyO.E.M.:rvrTobefilledbyO.E.M.:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: 0829F3U
dmi.product.version: ThinkCentre M70e
dmi.sys.vendor: LENOVO
version.compiz: compiz 1:0.9.11+14.04.20140409-0ubuntu1
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.52-1
version.libgl1-mesa-dri: 

[Bug 1344323] [NEW] Trusty kernel network performance regression

2014-07-18 Thread Jay Vosburgh
Public bug reported:

SRU Justification:

Impact:

Reduced TCP/IP receive performance for network devices that do not split
packet headers into skb linear area (e.g., mlx4).  The trusty kernel has
incorporated

commit eff44f9cc9a02aad53d568d3ae5020b6792ae4f6
Author: Jerry Chu hk...@google.com
Date:   Wed Dec 11 20:53:45 2013 -0800

net-gro: Prepare GRO stack for the upcoming tunneling support

which modifies the GRO frag0 optimization, but unfortunately for some
cases results in calls to __skb_pull_tail for every packet being
received via the GRO path.  This causes a reduction in TCP receive
performance (or, more accurately, an increase in CPU load for TCP
receive processing, which will cause throughput reduction for CPU
limited workloads).

Fix:

This has already been fixed in mainline in

commit a50e233c50dbc881abaa0e4070789064e8d12d70
Author: Eric Dumazet eduma...@google.com
Date:   Sat Mar 29 21:28:21 2014 -0700

net-gro: restore frag0 optimization

The fix has been backported to and verified on the trusty kernel using
mlx4 devices and iperf; an increase from 7.5 to 8.5 Gb/sec was observed
when adding the patch, and the relevant portion of perf captures show
changes in the call paths from:

 7.17%iperf  [kernel.kallsyms]   [k] __pskb_pull_tail   

  |
  --- __pskb_pull_tail
 |  
 |--48.03%-- tcp_gro_receive
 |  tcp4_gro_receive
 |  inet_gro_receive
 |  dev_gro_receive
 |  napi_gro_frags
 |  mlx4_en_process_rx_cq
 |  mlx4_en_poll_rx_cq
 |  net_rx_action
 |  __do_softirq
[...]
 |--28.53%-- napi_gro_frags
 |  mlx4_en_process_rx_cq
 |  mlx4_en_poll_rx_cq
 |  net_rx_action
 |  __do_softirq
[...]
 |--13.11%-- inet_gro_receive
 |  dev_gro_receive
 |  napi_gro_frags
 |  mlx4_en_process_rx_cq
 |  mlx4_en_poll_rx_cq
 |  net_rx_action
 |  __do_softirq

to:

 4.87%  iperf  [kernel.kallsyms]   [k] skb_gro_receive  
  
|
--- skb_gro_receive
   |  
   |--98.13%-- tcp_gro_receive
   |  tcp4_gro_receive
   |  inet_gro_receive
   |  dev_gro_receive
   |  napi_gro_frags
   |  mlx4_en_process_rx_cq
   |  mlx4_en_poll_rx_cq
   |  net_rx_action
   |  __do_softirq

Testcase:

The fix was tested using mlx4 10Gb/sec network devices between two arm64
systems using iperf -s on one end and iperf -c on the other.  The
unmodified kernel reported approximately 7.5 Gb/sec throughput, the
fixed kernel approximately 8.5 Gb/sec.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1344323

Title:
  Trusty kernel network performance regression

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1344323/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1409123] [NEW] hw csum failure in encapsulated network topolgies

2015-01-09 Thread Jay Vosburgh
Public bug reported:

Virtualized network topologies that utilize encapsulation (e.g., VXLAN)
and bridging  may experience kernel errors of the format:

[ 4297.761899] eth0: hw csum failure
[ 4297.765210] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   OE  3.18.0-rc4
-nn+ #22
[ 4297.765212] Hardware name: LENOVO 0829F3U/To be filled by O.E.M., BIOS 90KT15
AUS 07/21/2010
[ 4297.765216]   88013fc03ba8 8172f026 0
001
[ 4297.765219]  88013870e000 88013fc03bc8 8162ba52 8161c
1a0
[ 4297.765221]  8800afdf1000 88013fc03c08 8162325c 88013870e
000
[ 4297.765223] Call Trace:
[ 4297.765224]  IRQ  [8172f026] dump_stack+0x46/0x58
[ 4297.765235]  [8162ba52] netdev_rx_csum_fault+0x42/0x50
[ 4297.765238]  [8161c1a0] ? skb_push+0x40/0x40
[ 4297.765240]  [8162325c] __skb_checksum_complete+0xbc/0xd0
[ 4297.765243]  [8168c602] tcp_v4_rcv+0x2e2/0x950
[ 4297.765246]  [81666ca0] ? ip_rcv_finish+0x360/0x360
[ 4297.765248]  [81660224] ? nf_hook_slow+0x74/0x130
[ 4297.765250]  [81666ca0] ? ip_rcv_finish+0x360/0x360
[ 4297.765253]  [81666d4c] ip_local_deliver_finish+0xac/0x220
[ 4297.765255]  [81667058] ip_local_deliver+0x48/0x80
[ 4297.765257]  [816669c1] ip_rcv_finish+0x81/0x360
[ 4297.765259]  [81667332] ip_rcv+0x2a2/0x3f0
[ 4297.765261]  [8162e932] __netif_receive_skb_core+0x562/0x7a0
[ 4297.765263]  [8162eb88] __netif_receive_skb+0x18/0x60
[ 4297.765265]  [8162f8f6] process_backlog+0xa6/0x150

The backtrace may vary, stacks descending into conntrack have also been
observed:

Call Trace:
 IRQ  [8171a324] dump_stack+0x45/0x56
 [8161bfba] netdev_rx_csum_fault+0x3a/0x40
 [81614782] __skb_checksum_complete_head+0x62/0x70
 [816147a1] __skb_checksum_complete+0x11/0x20
 [816a3eac] nf_ip_checksum+0xcc/0x100
 [a04df33b] udp_error+0xdb/0x1f0 [nf_conntrack]
 [a04d926e] nf_conntrack_in+0xee/0xb40 [nf_conntrack]
 [a0307653] ? do_execute_actions+0x2e3/0xab0 [openvswitch]
 [a0307e4b] ? ovs_execute_actions+0x2b/0x30 [openvswitch]
 [81654540] ? inet_del_offload+0x40/0x40
 [a03b52e2] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
 [8164e0aa] nf_iterate+0x9a/0xb0
 [81654540] ? inet_del_offload+0x40/0x40
 [8164e134] nf_hook_slow+0x74/0x130
 [81654540] ? inet_del_offload+0x40/0x40
 [81654f68] ip_rcv+0x2f8/0x3d0

The root cause of this is twofold:

First, the kernel handling of forwarded packets that have been
encapsulated (e.g., from VXLAN) for devices that support
CHECKSUM_COMPLETE checksum offload fails to update the running checksum
when decapsulating the packet.

Second, for the enic device itself, the hardware is not correctly
computing the checksum for some cases.

Both of these issues are patched in mainline:

commit 17e96834fd35997ca7cdfbf15413bcd5a36ad448 
Author: Govindarajulu Varadarajan _gov...@gmx.com 
Date: Thu Dec 18 15:58:42 2014 +0530 

enic: fix rx skb checksum

commit 2c26d34bbcc0b3f30385d5587aa232289e2eed8e 
Author: Jay Vosburgh jay.vosbu...@canonical.com 
Date: Fri Dec 19 15:32:00 2014 -0800 

net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding

** Affects: linux (Ubuntu)
 Importance: Undecided
 Assignee: Jay Vosburgh (jvosburgh)
 Status: New

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) = Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1409123

Title:
  hw csum failure in encapsulated network topolgies

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1409123/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1442828] Re: ifup-wait-all-auto does not wait for interfaces to be fully up

2015-04-13 Thread Jay Vosburgh
ifupdown 0.7.48.1ubuntu9 resolves the original problem for me on a fresh
vivid install with the daily build for today.

Thanks.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1442828

Title:
  ifup-wait-all-auto does not wait for interfaces to be fully up

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1442828/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1442828] [NEW] change for LP 1425376 breaks systemd After=network-online.target

2015-04-10 Thread Jay Vosburgh
Public bug reported:


The change to ifup@.service done as part of LP 1425376 appears to break the 
ordering of units marked as After=network-online.target.  In my specific 
case, a new service script with After=network-online.target is erroneously 
run concurrently with dhclient.  As the new script depends on networking 
configuration being complete, it fails as the IP addresses and routes from DHCP 
are not configured.  This functioned correctly on vivid daily images from a few 
days ago, and appears to break starting with the vivid daily from approximately 
0409.

Infinity suggested this change as a likely suspect:

diff -Nru systemd-219/debian/extra/units/ifup@.service 
systemd-219/debian/extra/units/ifup@.service
--- systemd-219/debian/extra/units/ifup@.service2015-04-02 
08:08:56.0 +
+++ systemd-219/debian/extra/units/ifup@.service2015-04-07 
14:38:38.0 +
@@ -6,10 +6,8 @@
 DefaultDependencies=no
 
 [Service]
-Type=oneshot
-ExecStart=/sbin/ifup --allow=hotplug %I
-ExecStartPost=/sbin/ifup --allow=auto %I
 # only fail if ifupdown knows about the iface AND it's not up
-ExecStartPost=/bin/sh -c 'if ifquery %I /dev/null; then ifquery --state %I 
/dev/null; fi'
+ExecStart=/bin/sh -ec 'ifup --allow=hotplug %I; ifup --allow=auto %I; \
+if ifquery %I /dev/null; then ifquery --state %I /dev/null; fi'
 ExecStop=/sbin/ifdown %I
 RemainAfterExit=true

and, indeed, reverting this (copying ifup@.service from a few-days old
vivid image to a current image) resolves the problem.

The affected version is  ubuntu-vivid-daily-amd64-server-20150409.2
(installed via AWS).

** Affects: systemd (Ubuntu)
 Importance: Undecided
 Status: New

** Package changed: linux (Ubuntu) = systemd (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1442828

Title:
  change for LP 1425376 breaks systemd After=network-online.target

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1442828/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1508706] [NEW] Networking hangs on azure using hv_netvsc; bisected

2015-10-21 Thread Jay Vosburgh
Public bug reported:


Running Ubuntu instances on azure, testing basic networking between two 
instances.  This involves configuring VXLAN between the two instances and 
running iperf and rsync of the kernel tree between the instances, e.g.,

ip link add vxlan0 type vxlan id 999 local 10.88.0.12 remote 10.88.0.11 dev eth0
ip l set vxlan0 up
ip addr add 242.0.0.12/8 dev vxlan0

After some time (sometimes instantly, sometimes up to 30 minutes of
activity), the networking will hang.  This hang takes two forms:  a
complete loss of connectivity (all network, even the ssh session used to
log in), or just a loss of connectivity between instances (the ssh
session remains active).  Sometimes for the latter case, the ssh session
will then later hang.

This first appeared when testing with the Ubuntu 3.19 kernel, and I
subsequently bisected this to:

commit effa2012d207f78cbc5a8360e62d420a8860b7e9
Author: KY Srinivasan 
Date:   Mon May 11 15:39:46 2015 -0700

hv_netvsc: Use the xmit_more skb flag to optimize signaling the host

BugLink: http://bugs.launchpad.net/bugs/1454892

Based on the information given to this driver (via the xmit_more skb flag),
we can defer signaling the host if more packets are on the way. This will 
help
make the host more efficient since it can potentially process a larger 
batch of
packets. Implement this optimization.

Signed-off-by: K. Y. Srinivasan 
Signed-off-by: David S. Miller 
Acked-by: Tim Gardner 
Acked-by: Brad Figg 
Signed-off-by: Brad Figg 

I also tested the mainline kernel (net-next); it fails with the
equivalent commit:

commit 82fa3c776e5abba7ed6e4b4f4983d14731c37d6a
Author: KY Srinivasan 
Date:   Mon May 11 15:39:46 2015 -0700

hv_netvsc: Use the xmit_more skb flag to optimize signaling the host

For both kernel trees, I also tested the prior commit and it did not
exhibit the failure after many hours.  For ubuntu, this was

commit a4aeb290bd75af5e16a6144a418291476ac6140c
Author: K. Y. Srinivasan 
Date:   Wed Mar 18 12:29:29 2015 -0700

Drivers: hv: vmbus: Export the vmbus_sendpacket_pagebuffer_ctl()

and for mainline it was

commit 9eea92226407e7a117ef1ceef45380ebd000a0e2
Author: Alexei Starovoitov 
Date:   Mon May 11 15:19:48 2015 -0700

pktgen: fix packet generation

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1508706

Title:
  Networking hangs on azure using hv_netvsc; bisected

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508706/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1508706] Re: Networking hangs on azure using hv_netvsc; bisected

2015-11-09 Thread Jay Vosburgh
Yes, it did, although it seemed to be easier to reproduce with vxlan
configured.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1508706

Title:
  Networking hangs on azure using hv_netvsc; bisected

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508706/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1502238] Re: bridge does not forward neighbor solicitation packets

2015-10-06 Thread Jay Vosburgh
I set up a similar configuration locally, and I see the bridge correctly
forwarding the IPv6 NS packets.  The ping functions as expected.  I have
different network cards, and used IPv6 ULA addresses (fc00:1234::/64)
but I'm not sure how that would affect the bridge forwarding decision.

I'm also not sure what exactly is meant by your statement "Adding a host
route for the 2001:: IP via the link IP"; I don't see any other
reference to a 2001:: address.  Could you clarify what this refers to?

Also, for completeness, can you insure that there are no bridge table
rules installed?  This would be in the output of

ebtables -t filter -L
ebtables -t nat -L
ebtables -t broute -L

I would also suggest disabling the bridge callouts to arptables,
ip6tables and iptables to see if that affects the behavior.  This would
be done via

sysctl -w net.bridge.bridge-nf-call-arptables=0
sysctl -w net.bridge.bridge-nf-call-ip6tables=0
sysctl -w net.bridge.bridge-nf-call-iptables=0

(all of the above sysctl and ebtables commands need to be done as root)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1502238

Title:
  bridge does not forward neighbor solicitation packets

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502238/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-10-09 Thread Jay Vosburgh
The original patch had an error in it; I believe I've found it and once
I verify that and clean it up a bit I"ll attach it to the bug.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-17 Thread Jay Vosburgh
** Patch added: "Backport patch for vivid 3.19"
   
https://bugs.launchpad.net/nova/+bug/1463911/+attachment/4520984/+files/ubuntu-vivid-sru.patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-17 Thread Jay Vosburgh
SRU Justification:

Impact:

This bug causes issues when ip6tables modules are loaded with IPv6
fragmented packets traversing a bridge.  The extant conntrack processing
will reassemble the IPv6 fragments for netfilter processing, but is
incapable of re-fragmenting these datagrams for subsequent forwarding.
This causes the fragmented IPv6 datagrams to be dropped.

Fix:

This is resolved by backporting functionality from mainline that
re-fragments the IPv6 datagrams upon bridge egress.

Testcase:

The patch commit log includes a test case; to summarize:

A bridge is configured with two ports and interfaces are attached
to these ports.  A traffic source beyond one port generates fragmented
IPv6 datagrams, e.g., ping6 -s 2000, destined for a host beyond the
bridge.

With ip6tables modules unloaded, the IPv6 fragments will traverse
the bridge.  Loading ip6tables, e.g., "ip6tables -t nat -L", will cause
IPv6 fragmented datagrams to be dropped on the unpatched kernel.

These datagrams are correctly forwarded with the patch applied.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-17 Thread Jay Vosburgh
** Patch added: "Backport patch for trusty 3.13"
   
https://bugs.launchpad.net/nova/+bug/1463911/+attachment/4520982/+files/ubuntu-trusty-3.13-sru.patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-17 Thread Jay Vosburgh
** Patch added: "Backport patch for trusty 3.16"
   
https://bugs.launchpad.net/nova/+bug/1463911/+attachment/4520983/+files/ubuntu-trusty-3.16-sru.patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-09-16 Thread Jay Vosburgh
I have done a backport of

commit efb6de9b4ba0092b2c55f6a52d16294a8a698edd
Author: Bernhard Thaler 
Date:   Sat May 30 15:30:16 2015 +0200

netfilter: bridge: forward IPv6 fragmented packets

to the trusty 3.13 kernel.  This necessitated pulling in some bits from
other patches as well. I am currently testing for regressions and will
submit it for SRU if all goes well.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1497812] Re: i40e bug: non physical MAC outbound frames appear as copied back inbound (mirrored)

2015-09-29 Thread Jay Vosburgh
Just looking at the log, it might be this:

commit fa11cb3d16a9b9b296a2b811a49faf1356240348
Author: Anjali Singhai Jain 
Date:   Wed May 27 12:06:14 2015 -0400

i40e: Make sure to be in VEB mode if SRIOV is enabled at probe

If SRIOV is enabled we need to be in VEB mode not VEPA mode at probe.
This fixes an NPAR bug when SRIOV is enabled in the BIOS.

Change-ID: Ibf006abafd9a0ca3698ec24848cd771cf345cbbc
Signed-off-by: Anjali Singhai Jain 
Tested-by: Jim Young 
Signed-off-by: Jeff Kirsher 

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1497812

Title:
  i40e bug: non physical MAC outbound frames appear as copied back
  inbound  (mirrored)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1497812/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1508706] Re: Networking hangs on azure using hv_netvsc; bisected

2015-11-18 Thread Jay Vosburgh
We are testing this patch immediately (overnight US time) and will
report our results as soon as they are available

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1508706

Title:
  Networking hangs on azure using hv_netvsc; bisected

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508706/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-18 Thread Jay Vosburgh
Test methodology performed on 3.19 kernel with patch applied:

Host A: fd01:::1/64 direct connect to host C

ip addr add fd01:::1/64 dev eth0

Host B: fd01:::2/64 direct connect to host C

ip addr add fd01:::2/64 dev eth0

host C: direct connect interfaces for Hosts A & B bridged together:

brctl addbr testbr0
brctl addif testbr0 eth1
brctl addif testbr0 eth5
ip link set dev eth1 up
ip link set dev eth5 up
ip link set dev testbr0 up
ip addr add fd01:::99/64 dev testbr0

host A:

continuous ping6 to host C's address beyond the bridge, using size large
enough to generate fragmented IPv6 datagrams for mtu setting of 1500:

ping6 -s 4000 fd01:::2

host C:

load ip6tables_nat:

ip6tables -t nat -Ln

Observe on host A that ping continues uninterrupted

Inspect eth1 and eth5 interfaces on host C with tcpdump to confirm traffic 
passes
through the bridge

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1463911] Re: IPV6 fragmentation and mtu issue

2015-11-18 Thread Jay Vosburgh
The equivalent testing to comment #20 was also performed on the 3.13 and
3.16 kernels, additionally, a customer separately validated the 3.13 and
3.16 patches in their environment.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1508706] Re: Networking hangs on azure using hv_netvsc; bisected

2015-11-18 Thread Jay Vosburgh
I have tested the patch referenced in comment #5 and it appears to
resolve the network hang.

I first built and tested the Ubuntu LTS 3.19.0-31.36~14.04.1 kernel and
reproduced the issue using the methodology described in the original bug
description.  This is commit

commit 15e42c329445b4e0f0aecefc39e205c44755c2ba
Author: Luis Henriques 
Date:   Thu Oct 8 10:26:57 2015 +0100

UBUNTU: Ubuntu-lts-3.19.0-31.36~14.04.1

in the lts-backport-vivid branch of git://kernel.ubuntu.com/ubuntu
/ubuntu-trusty.git

I then applied the referenced patch and tested again and was unable to
reproduce the issue after roughly an hour of testing.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1508706

Title:
  Networking hangs on azure using hv_netvsc; bisected

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508706/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1508706] Re: Networking hangs on azure using hv_netvsc; bisected

2015-11-19 Thread Jay Vosburgh
SRU Justification:

Impact:

Bug causes easily reproducible freeze of networking on affected
systems when under moderate to high network load.  Ordinary benchmark
tools such as iperf induce the problem without difficulty.  Affected
systems are virtual machine instances running on Azure, utilizing the
hv_netvsc network device driver.

Fix:

Fix is to apply patch provided by Microsoft:

http://marc.info/?l=linux-kernel=144787522532687=2

Testcase:

Tested as described in Bug Description.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1508706

Title:
  Networking hangs on azure using hv_netvsc; bisected

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508706/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1539826] [NEW] initramfs-tools hook-functions error causes failure

2016-01-29 Thread Jay Vosburgh
Public bug reported:

The /usr/share/initramfs-tools/hook-functions contains what appears to be a 
variable name update (from root to dev_node) error.
It appears that one instance of root was not updated correctly; this causes 
mkinitramfs to fail with the error:

mkinitramfs: for device /dev/vda1 missing vda1 /sys/block/ entry
mkinitramfs: workaround is MODULES=most
mkinitramfs: Error please report the bug

A trivial patch that appears to resolve this is attached.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: initramfs-tools 0.120ubuntu7 [modified: 
usr/share/initramfs-tools/hook-functions]
ProcVersionSignature: Ubuntu 4.3.0-7.18-generic 4.3.3
Uname: Linux 4.3.0-7-generic x86_64
ApportVersion: 2.19.4-0ubuntu1
Architecture: amd64
CurrentDesktop: Unity
Date: Fri Jan 29 17:44:20 2016
InstallationDate: Installed on 2016-01-29 (0 days ago)
InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160129)
PackageArchitecture: all
SourcePackage: initramfs-tools
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: initramfs-tools (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: amd64 apport-bug xenial

** Patch added: "hook-functions variable name fix"
   
https://bugs.launchpad.net/bugs/1539826/+attachment/4559496/+files/hook-functions.patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1539826

Title:
  initramfs-tools hook-functions error causes failure

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1539826/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1463911] Re: IPV6 fragmentation and mtu issue

2016-02-23 Thread Jay Vosburgh
Yes,  the patch has been committed for the next Ubuntu kernel releases.

I have no information on a Centos patch; you would need to file a bug
against Centos or RHEL.

No patch to Neutron is required.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1463911] Re: IPV6 fragmentation and mtu issue

2016-03-02 Thread Jay Vosburgh
The Wily kernel (4.2) already contains the fixes for this bug.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1463911] Re: IPV6 fragmentation and mtu issue

2016-03-03 Thread Jay Vosburgh
** Tags removed: verification-needed-trusty verification-needed-vivid
** Tags added: verification-done-trusty verification-done-vivid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1463911

Title:
  IPV6 fragmentation and mtu issue

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1463911/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1584092] Re: Docker misconfigured when using non-default overlay/underlay netmask size

2016-05-20 Thread Jay Vosburgh
I haven't tested this patch, but fanctl had the same issue, and I
believe the fix is that the subnet math has to be "overlay_width + ( 32
- underlay_width )", not "overlay_width + underlay_width".

Patch attached.


** Patch added: "fanatic patch"
   
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+attachment/4667027/+files/fanatic.patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1584092

Title:
  Docker misconfigured when using non-default overlay/underlay netmask
  size

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1584092] Re: Docker misconfigured when using non-default overlay/underlay netmask size

2016-05-20 Thread Jay Vosburgh
I haven't tested this patch, but fanctl had the same issue, and I
believe the fix is that the subnet math has to be "overlay_width + ( 32
- underlay_width )", not "overlay_width + underlay_width".

Patch attached.

** Patch added: "fanatic.patch"
   
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+attachment/4667033/+files/fanatic.patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1584092

Title:
  Docker misconfigured when using non-default overlay/underlay netmask
  size

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1584092] Re: Docker misconfigured when using non-default overlay/underlay netmask size

2016-05-20 Thread Jay Vosburgh
I haven't tested this patch, but fanctl had the same issue, and I
believe the fix is that the subnet math has to be "overlay_width + ( 32
- underlay_width )", not "overlay_width + underlay_width".

Patch attached.

** Patch removed: "fanatic patch"
   
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+attachment/4667027/+files/fanatic.patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1584092

Title:
  Docker misconfigured when using non-default overlay/underlay netmask
  size

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-fan/+bug/1584092/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1658491] Re: VLAN SR-IOV regression for IXGBE driver

2017-01-23 Thread Jay Vosburgh
This issue may be fixed by this upstream commit:

commit f60439bc21e3337429838e477903214f5bd8277f
Author: Alexander Duyck 
Date:   Thu Aug 11 14:51:56 2016 -0700

ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths

When I was adding the code for enabling VLAN promiscuous mode with SR-IOV
enabled I had inadvertently left the VLNCTRL.VFE bit unchanged as I has
assumed there was code in another path that was setting it when we enabled
SR-IOV.  This wasn't the case and as a result we were just disabling VLAN
filtering for all the VFs apparently.

Also the previous patches were always clearing CFIEN which was always set
to 0 by the hardware anyway so I am dropping the redundant bit clearing.

Fixes: 16369564915a ("ixgbe: Add support for VLAN promiscuous with SR-IOV")
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1658491

Title:
  VLAN SR-IOV regression for IXGBE driver

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1658491/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1652348] Re: initrd dhcp fails / ignores valid response

2017-01-20 Thread Jay Vosburgh
Patch proposal to modify ipconfig to use one packet socket per interface


** Patch added: "klibc-fix-1.patch"
   
https://bugs.launchpad.net/ubuntu/+source/klibc/+bug/1652348/+attachment/4806861/+files/klibc-fix-1.patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1652348

Title:
  initrd dhcp fails / ignores valid response

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/klibc/+bug/1652348/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1327412] Re: Delay during PXE Boot, IP-Config gives up

2016-09-14 Thread Jay Vosburgh
The patch added to nominally fix this issue is incorrect; it is setting
the wrong bit in the BOOTP flags field for broadcast:

+   bootp.flags = htons(0x800);

The correct value should be 0x8000.  This is causing issues with
switches that reject the packet as having bits set in a "must be zero"
flag area.

RFC 1542 defines the flags field as 16 bits, and the broadcast bit is
the most significant bit:

2.2 Definition of the 'flags' Field

   The standard BOOTP message format defined in [1] includes a two-octet
   field located between the 'secs' field and the 'ciaddr' field.  This
   field is merely designated as "unused" and its contents left
   unspecified, although Section 7.1 of [1] does offer the following
   suggestion:

  "Before setting up the packet for the first time, it is a good
  idea to clear the entire packet buffer to all zeros; this will
  place all fields in their default state."

  This memo hereby designates this two-octet field as the 'flags'
  field.

  This memo hereby defines the most significant bit of the 'flags'
  field as the BROADCAST (B) flag.  The semantics of this flag are
  discussed in Sections 3.1.1 and 4.1.2 of this memo.

  The remaining bits of the 'flags' field are reserved for future
  use.  They MUST be set to zero by clients and ignored by servers
[...]
  and relay agents.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1327412

Title:
  Delay during PXE Boot, IP-Config gives up

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/klibc/+bug/1327412/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1652348] Re: initrd dhcp fails / ignores valid response

2017-01-11 Thread Jay Vosburgh
** Changed in: klibc (Ubuntu)
   Status: Confirmed => In Progress

** Tags removed: kernel-bug-exists-upstream kernel-bug-exists-
upstream-4.10-rc1

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1652348

Title:
  initrd dhcp fails / ignores valid response

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/klibc/+bug/1652348/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1652348] Re: initrd dhcp fails / ignores valid response

2017-01-09 Thread Jay Vosburgh
Just a note that I'm setting up to try the reproduction instructions
from comment #35

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1652348

Title:
  initrd dhcp fails / ignores valid response

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1652348] Re: initrd dhcp fails / ignores valid response

2017-01-10 Thread Jay Vosburgh
I have instrumented ipconfig, and determined that the ultimate source of the 
problem
is that, for the case of multiple interfaces, ipconfig has a dependency on the 
kernel's probe order of the network interfaces.

For whatever reason, the -31 kernel probes the network devices in one
order (e.g., ens3 then ens4), and the -57 kernel in the other order
(ens4 first then ens3).

The probe order of network devices (and PCI devices in general) is
explicitly not defined, and so this is not a bug in the kernel itself;
ipconfig is failing due to its dependency on a specific enumeration
order.

The issue in ipconfig is that it is using a single packet socket to
attempt to multiplex packet traffic on multiple interfaces.  Presuming
that ens3 will answer DHCP and ens4 will not, for the case that works,
the order ends up being something like:

send DHCP request on ens3
send DHCP request on ens4
[ system gets DHCP response via ens3 ]
try to receive DHCP reply sent by peer for ens3; this matches, and all is happy

For the case that it fails, the sequence is roughly:

send DHCP request on ens4
send DHCP request on ens3
[ system gets DHCP response via ens3 ]
try to receive DHCP reply sent by peer for ens4; the reply is actually for 
ens3, so ipconfig
throws it away (as the XID, et al, don't match what is expected for the ens4 
DHCP request).

This repeats until ipconfig gives up.

As I said above, the issue is that ipconfig is trying to multiplex
traffic for two interfaces on one packet socket.  This is fine for
sending, but for receiving on an unbound packet socket, there is no way
to receive a packet sent to a specific interface.  Packets are delivered
to recvfrom/recvmsg in the order received.

I note that ipconfig sets sll.sll_ifindex on the msghdr provided to
recvfrom and recvmsg system calls; perhaps the author believed that this
limits received packets to only packets received on that ifindex.  This
is not the case, and the sll_ifindex passed to recvfrom/recvmsg is
ignored.

I'm looking into whether or not there is an simple fix for this that
will let ipconfig function without major rework to utilize one packet
socket per interface.



** Tags removed: kernel-key

** Package changed: linux (Ubuntu) => klibc (Ubuntu)

** Changed in: klibc (Ubuntu)
   Status: Triaged => Confirmed

** Changed in: klibc (Ubuntu)
 Assignee: (unassigned) => Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1652348

Title:
  initrd dhcp fails / ignores valid response

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/klibc/+bug/1652348/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1652348] Re: initrd dhcp fails / ignores valid response

2017-01-09 Thread Jay Vosburgh
I have reproduced the described issue locally using the instructions
from comment 35; will start looking into the cause.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1652348

Title:
  initrd dhcp fails / ignores valid response

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1683947] Re: ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

2017-04-20 Thread Jay Vosburgh
** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1683947

Title:
  ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1683947/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1683947] [NEW] ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

2017-04-18 Thread Jay Vosburgh
Public bug reported:


SRU Justification:

Impact:

Configuring the 4.8 kernel with iptables MASQUERADE over virtio_net
causes packets to be dropped by the hypervisor (host) due to improper
flags being set based on the IP checksum state of the packet.  The host
performing MASQUERADE is affected by the bug.

Issue was introduced by

commit fd2a0437dc33b6425cabf74cc7fc7fdba6d5903b
Author: Mike Rapoport <r...@linux.vnet.ibm.com>
Date: Wed Jun 8 16:09:18 2016 +0300

virtio_net: introduce virtio_net_hdr_{from,to}_skb

which first appears in v4.8-rc1

Fix:

Fixed upstream by

3e9e40e74753 virtio_net: Simplify call sites for virtio_net_hdr_{from, 
to}_skb().
501db511397f virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit
6391a4481ba0 virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving

3e9e40e74753 first appears in v4.9-rc5 (and is a prerequisite only), the
others in v4.10-rc4.

Testcase:

Reproduction to date has been on GCE, although in principle it should
manifest on any suitable topology using virtio_net.  There is a
dependency on the forwarded packets having skb->ip_summed ==
CHECKSUM_UNNECESSARY; not all incoming devices will have this property.

On GCE, the following steps will induce the issue on an affected kernel:

Setup a network:

% gcloud compute networks create nat-network --mode legacy --range 10.240.0.0/16
% gcloud compute firewall-rules create nat-network-allow-ssh --allow tcp:22 
--network nat-network
% gcloud compute firewall-rules create nat-network-allow-internal --allow 
tcp:1-65535,udp:1-65535,icmp --source-ranges 10.240.0.0/16 --network nat-network

Setup an Ubuntu 16.04 NAT VM:

% gcloud compute instances create nat-gateway-16 --zone us-central1-a
--network nat-network --can-ip-forward --image-family ubuntu-1604-lts
--image-project ubuntu-os-cloud --tags nat --metadata startup-
script='sysctl -w net.ipv4.ip_forward=1 ; iptables -t nat -A POSTROUTING
-o ens4 -j MASQUERADE'

Setup a route to use the 16.04 NAT:

% gcloud compute routes create no-ip-internet-route --network nat-
network --destination-range 0.0.0.0/0 --next-hop-instance nat-gateway-16
--next-hop-instance-zone us-central1-a --tags no-ip --priority 800

Setup a simple test VM without any external network:

% gcloud compute instances create nat-client --zone us-central1-a
--network nat-network --no-address --image-family ubuntu-1604-lts
--image-project ubuntu-os-cloud --tags no-ip --metadata startup-
script='wget --timeout=5 https://github.com/GoogleCloudPlatform/compute-
image-packages/archive/20170327.tar.gz'

Wait for it to boot... maybe 30 seconds or so.

Look for serial port output:

% gcloud compute instances get-serial-port-output nat-client --zone us-
central1-a | grep startup-script

You will see that the connection to github never succeeds - it just gets
stuck on "Resolving github.com (github.com)... 192.30.253.112,
192.30.253.113" and will timeout. (ignore the previous attempt from the
successful 14.04 based NAT).

Repeat the test by resettting the test client instance and watch for
serial output:

% gcloud compute instances reset nat-client --zone us-central1-a

Wait a minute or so for new boot, then check the serial-port-output as
above.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Assignee: Jay Vosburgh (jvosburgh)
 Status: New

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1683947

Title:
  ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1683947/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1700834] Re: Intel i40e PF reset under load

2017-08-11 Thread Jay Vosburgh
** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1700834

Title:
  Intel i40e PF reset under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1700834/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1697053] Re: Missing IOTLB flush causes DMAR errors with SR-IOV

2017-07-13 Thread Jay Vosburgh
proposed kernel tested by customer


** Tags removed: verification-needed-trusty
** Tags added: verification-done-trusty

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1697053

Title:
  Missing IOTLB flush causes DMAR errors with SR-IOV

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1697053/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1700834] [NEW] Intel i40e PF reset under load

2017-06-27 Thread Jay Vosburgh
Public bug reported:

SRU Justification:

Impact:

Using an Intel i40e network device, under heavy traffic load with
TSO enabled, the device will spontaneously reset itself and issue errors
similar to the following:

Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e :05:00.1: TX driver 
issue detected, PF reset issued 
Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e :05:00.1: TX driver 
issue detected, PF reset issued 
Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e :05:00.1: TX driver 
issue detected, PF reset issued 

This causes a full reset of the PF, which causes an interruption
in traffic flow.

In this case, these errors arise from a bug in the i40e device
driver introduced by commit:

commit 584a837e26408c66e87df87a022faa6a54c2b020
Author: Alexander Duyck <adu...@mirantis.com>
Date:   Wed Feb 17 11:02:50 2016 -0800

i40e/i40evf: Rewrite logic for 8 descriptor per packet check

This patch was added to the Xenial kernel beginning with version
4.4.0-8.23.  This bug does not manifest on any other Ubuntu kernel series.


Fix:

This error is resolved upstream by:

commit 3f3f7cb875c0f621485644d4fd7453b0d37f00e4
Author: Alexander Duyck <adu...@mirantis.com>
Date:   Wed Mar 30 16:15:37 2016 -0700

i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per packet

This fix was never backported into the Xenial 4.4 kernel series.


Testcase:

In this case, the issue occurs at a customer site using i40e based
Intel network cards with SR-IOV enabled.  Under heavy load, the card will
reset itself as described.  The customer has tested the 3f3f7cb875c patch
in their environment and confirmed that it resolves the issue.

** Affects: linux (Ubuntu)
 Importance: Undecided
     Assignee: Jay Vosburgh (jvosburgh)
 Status: New

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1700834

Title:
  Intel i40e PF reset under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1700834/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1700834] Re: Intel i40e PF reset under load

2017-06-27 Thread Jay Vosburgh
** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1700834

Title:
  Intel i40e PF reset under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1700834/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1683947] Re: ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

2017-04-25 Thread Jay Vosburgh
Jason,

I work for Canonical; the issue came up with one of our customers.

FWIW, I debugged the issue by first using kprobes and ftrace on the
kernel of a running instance to trace the packet path through the
kernel.  Once it seemed that the affected packets were not being dropped
somewhere on the instance and that MASQUERADE appeared to be operating
correctly, I did a git bisect of the kernel to isolate the actual commit
that resolved the problem (as the 4.11 kernel did not suffer from the
issue).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1683947

Title:
  ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1683947/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1709032] Re: Creating conntrack entry failure with kernel 4.4.0-89

2017-08-09 Thread Jay Vosburgh
The panic appears to be fixed upstream via:

commit 9c3f3794926a997b1cab6c42480ff300efa2d162
Author: Liping Zhang 
Date:   Sat Mar 25 16:35:29 2017 +0800

netfilter: nf_ct_ext: fix possible panic after
nf_ct_extend_unregister

If one cpu is doing nf_ct_extend_unregister while another cpu is doing
__nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover,
there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to
NULL, so it's possible that we may access invalid pointer.
[...]

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1709032

Title:
  Creating conntrack entry failure with kernel 4.4.0-89

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1709032/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits

2017-05-26 Thread Jay Vosburgh
Customer has verified that 4.4.0-79-generic resolves the issue in their
environment that would previously panic.


** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1687512

Title:
  Kernel panics on Xenial when using cgroups and strict CFS limits

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1687512/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1677668] Re: no GARPs during ephemeral boot

2017-05-22 Thread Jay Vosburgh
Sam / Christian,

This sort of issue is not unheard of in cases where an IP address moves
from interface to interface, or between hosts.  Most situations that
expect this type of issue (e.g., bonding link failover) already issue
gratuitous ARPs in order to update L2 peers.

I think the bottom line here is that dhclient (the DHCP client typically
used on Ubuntu, and presumably the one in use here) does not implement
RFC 5227, "IPv4 Address Conflict Detection," which describes how
gratutious ARPs must be done if the host provides that functionality
(5227 2.3, "Announcing an Address").

Most network configuration tools (and the kernel itself) on linux do not
issue gratuitous ARPs by default at address assignment time, so this
lack isn't especially unusual.  E.g., there is no option in
/etc/network/interfaces to instruct ifup to issue a GARP.

I'll note that 5227 is a proposed standard, and, as such, hosts are not
required to implement it, so dhclient is not violating any standards by
not issuing gratuitous ARPs.


Now, none of the above actually resolves the problem here, it just
explains that you've landed in a corner case that doesn't come up very
often.

As far as resolving this, one obvious possibility is to add RFC 5227
functionality to dhclient through its dhclient-script facility (and in
fact the man page for that is close to suggesting that: for the BOUND
case, the script should "somehow" perform duplicate address detection
via ARP).

I'm not too familiar with cloud-init's internals, but for 5227
compliance, the GARP would be issued on every boot, and cloud-init only
runs on first boot, so an implementation within cloud-init would likely
be setting up some persistent configuration.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1677668

Title:
  no GARPs during ephemeral boot

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1677668/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1697053] [NEW] Missing IOTLB flush causes DMAR errors with SR-IOV

2017-06-09 Thread Jay Vosburgh
Public bug reported:

SRU Justification:

Impact:

Using SR-IOV with Intel IOMMUs can observe DMAR errors of the
following type:

[606483.223009] DMAR:[fault reason 05] PTE Write access is not set 
[606484.071974] dmar: DRHD: handling fault status reg 402 
[606484.077121] dmar: DMAR:[DMA Write] Request device [d8:0a.1] fault addr 
35c6e000 

The DMAR error causes, at a minimum, loss of network traffic
because the request being serviced is lost.  Network cards were also
observed to experience transmit timeouts after a DMAR fault.

In this case, these errors arise from a race condition in
the IOTLB management; this race is described (and fixed) in upstream
commit:

commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d
Author: David Woodhouse <david.woodho...@intel.com>
Date:   Wed Mar 5 17:09:32 2014 +

iommu/vt-d: Clean up and fix page table clear/free behaviour

This commit first appeared in mainline 3.15.  This issue
affects only the Ubuntu 3.13 kernel series.

Fix:

The race avoidance portion of the above was backported to
3.14-stable, but was never incorporated into the Ubuntu 3.13
kernel series.

commit 51d20e1096a711f8cfa9d98a3ac2dd2c7c0fc20c
Author: David Woodhouse <dw...@infradead.org>
Date:   Mon Jun 9 14:09:53 2014 +0100

iommu/vt-d: Fix missing IOTLB flush in intel_iommu_unmap()

Based on commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d upstream

This 3.14-stable patch was tested by the customer and observed
to resolve the issue in their environment.

Testcase:

In this case, the issue occurs on very recent Intel based
servers using two different SR-IOV network cards (i40e and bnxt) at a
customer site.  The customer has tested the patch in their environment
and confirmed that it resolves the issue.

** Affects: linux (Ubuntu)
 Importance: Undecided
     Assignee: Jay Vosburgh (jvosburgh)
 Status: New

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Jay Vosburgh (jvosburgh)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1697053

Title:
  Missing IOTLB flush causes DMAR errors with SR-IOV

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1697053/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1697053] Re: Missing IOTLB flush causes DMAR errors with SR-IOV

2017-06-09 Thread Jay Vosburgh
** Changed in: linux (Ubuntu)
   Status: In Progress => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1697053

Title:
  Missing IOTLB flush causes DMAR errors with SR-IOV

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1697053/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1716747] Re: linux 4.12 - high system load and mouse delays - pipe A vblank wait timed out

2017-10-04 Thread Jay Vosburgh
Albert,

This is the lspci from my X220 T:

-[:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM 
Controller
   +-02.0  Intel Corporation 2nd Generation Core Processor Family 
Integrated Graphics Controller
   +-16.0  Intel Corporation 6 Series/C200 Series Chipset Family MEI 
Controller #1
   +-19.0  Intel Corporation 82579LM Gigabit Network Connection
   +-1a.0  Intel Corporation 6 Series/C200 Series Chipset Family USB 
Enhanced Host Controller #2
   +-1b.0  Intel Corporation 6 Series/C200 Series Chipset Family High 
Definition Audio Controller
   +-1c.0-[01]--
   +-1c.1-[02]00.0  Intel Corporation Centrino Advanced-N 6205 
[Taylor Peak]
   +-1c.3-[03]--
   +-1c.4-[04]00.0  Ricoh Co Ltd PCIe SDXC/MMC Host Controller
   +-1d.0  Intel Corporation 6 Series/C200 Series Chipset Family USB 
Enhanced Host Controller #1
   +-1f.0  Intel Corporation QM67 Express Chipset Family LPC Controller
   +-1f.2  Intel Corporation 6 Series/C200 Series Chipset Family 6 port 
SATA AHCI Controller
   \-1f.3  Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
Controller

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1716747

Title:
  linux 4.12 - high system load and mouse delays - pipe A vblank wait
  timed out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1716747/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1716747] Re: linux 4.12 - high system load and mouse delays - pipe A vblank wait timed out

2017-09-23 Thread Jay Vosburgh
Just a comment that I have observed this bug as well, on an X220 T.  The
test kernel from comment #11 also appears to resolve the problem (so
far).  I do not have any external USB controllers attached, though, so
I'm not sure what the failure path was.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1716747

Title:
  linux 4.12 - high system load and mouse delays - pipe A vblank wait
  timed out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1716747/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1771480] Re: WARNING: CPU: 28 PID: 34085 at /build/linux-90Gc2C/linux-3.13.0/net/core/dev.c:1433 dev_disable_lro+0x87/0x90()

2018-05-16 Thread Jay Vosburgh
The dev_disable_lro warning is happening due to some logic issues in the
features code.  The LRO on the VLAN (bond0.200, e.g.) that's being
warned about does end up being disabled by a NETDEV_FEAT_CHANGE callback
when the underlying bond0's features are updated, so the warning is
spurious.

Tracing the dev_disable_lro -> netdev_update_features for the bond0.2004
VLAN, I see:

name="bond0" feat=219db89 hw_feat=20219cbe9 want_feat=20219cbe9
vlan_feat=198069

NETIF_F_LRO = 0x8000

dev_disable_lro
wanted_features &= ~NETIF_F_LRO
bond0.2004 wanted_features = 0x200194869# no LRO

__netdev_update_features
features = netdev_get_wanted_features
return (dev->features & ~dev->hw_features) | dev->wanted_features;
(0x19d809 & ~0x23839487b) | 0x200194869
 ^LRO   ^no LRO^no LRO
0x9000 | 0x200194869
$2 = 0x20019d869
^ LRO

vlan_dev_fix_features(dev, 0x20019d869)   # has LRO

struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
netdev_features_t old_features = features;

features &= real_dev->vlan_features;# 0x198069 has LRO
features |= NETIF_F_RXCSUM; # 0x100198069 has LRO
features &= real_dev->features; # 0x198009 has LRO

features |= old_features & NETIF_F_SOFT_FEATURES;  # save GSO / GRO
features |= NETIF_F_LLTX;

return features; # will have LRO

So, basically, LRO is set in the underlying bond0's features, so it ends
up being kept in the VLAN device's features even though it wasn't in
wanted_features.  Later, dev_disable_lro will call dev_disable_lro on
all the lower devices (the bond0 in this case), and the update of
features for the bond0 will issue a NETDEV_FEAT_CHANGE callback to the
bond0.2004 VLAN, which will then set the features correctly.

The Ubuntu 3.13  __netdev_update_features (called by dev_disable_lro via
netdev_update_features) lacks additional logic found in later kernels to
sync the features to lower devices.  That presumably triggers the
NETDEV_FEAT_CHANGE within the call to __netdev_update_features so that
the bond0.2004 VLAN is updated before we return back to dev_disable_lro
(but I haven't verified this).

I suspect the fix to eliminate the warning is to apply the "sync_lower:"
block from a later kernel __netdev_update_features to 3.13, along with
the netdev_sync_lower_features function it uses.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1771480

Title:
  WARNING: CPU: 28 PID: 34085 at /build/linux-
  90Gc2C/linux-3.13.0/net/core/dev.c:1433 dev_disable_lro+0x87/0x90()

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1771480/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1765241] Re: virtio_scsi race can corrupt memory, panic kernel

2018-05-01 Thread Jay Vosburgh
** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765241

Title:
  virtio_scsi race can corrupt memory, panic kernel

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765241/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-26 Thread Jay Vosburgh
I would suggest testing

commit de77ecd4ef02ca783f7762e04e92b3d0964be66b
Author: Mahesh Bandewar 
Date:   Mon Mar 27 11:37:33 2017 -0700

bonding: improve link-status update in mii-monitoring

and

commit d94708a553022bf012fa95af10532a134eeb5a52
Author: WANG Cong 
Date:   Tue Jul 25 09:44:25 2017 -0700

bonding: commit link status change after propose


backported to 4.4.0-120 (in the order above; the second is a fix to the first).

The first patch initially appears in 4.12-rc1, the second in 4.13.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1753662

Title:
  [i40e] LACP bonding start up race conditions

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-04-30 Thread Jay Vosburgh
We've seen a similar-sounding issue in the past, but couldn't get it
tracked down to the root cause.

Is it possible to enable some instrumentation in the /etc/network/interfaces and
obtain some data on a failing occurrence?

What we've used in the past is adding something like

pre-up echo 'file bond_3ad.c +p' > /sys/kernel/debug/dynamic_debug/control 
pre-up echo 'file bond_main.c +p' > /sys/kernel/debug/dynamic_debug/control

to the /e/n/i section for the bond itself, and

post-up tcpdump -U -p -w /tmp/eth4.td -i eth4 ether proto 0x8809 &

to the sections for each slave in the bond (adjusting the "eth4" above
to the actual interface name).

The bond debug will appear in the kernel log, and the tcpdump data will
have to copied from the output file specified on the tcpdump command
line (and the tcpdump process terminated if need be).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1753662

Title:
  [i40e] LACP bonding start up race conditions

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1716747] Re: linux 4.12 - high system load and mouse delays - pipe A vblank wait timed out

2017-10-26 Thread Jay Vosburgh
Joe,

No, I'm not seeing the issue now; running 4.13.0-16 for the last 10 days
or so.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1716747

Title:
  linux 4.12 - high system load and mouse delays - pipe A vblank wait
  timed out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1716747/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1765241] Re: virtio_scsi race can corrupt memory, panic kernel

2018-04-19 Thread Jay Vosburgh
SRU Justification:

Impact:

This issue can cause system panics of systems using the
virtio_scsi driver with the affected Ubuntu kernels.  The issue manifests
irregularly, as it is timing dependent.

Fix:

The issue is resolved by adding synchronization between the two
code paths that race with one another.  The lowest regression risk is to
use a synchronize_rcu_expedited call, as that is the functionality that
blocks the race in unaffected kernels.

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 03a2aad..c122e68 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -762,6 +762,9 @@ static int virtscsi_target_alloc(struct scsi_target 
*starget)
 static void virtscsi_target_destroy(struct scsi_target *starget)
 {
struct virtio_scsi_target_state *tgt = starget->hostdata;
+
+   /* we can race with concurrent virtscsi_complete_cmd */
+   synchronize_rcu_expedited();
kfree(tgt);
 }
 

It is also possible to have the code wait for any outstanding
requests to drain prior to freeing the target structure, e.g.,

--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -762,6 +762,10 @@ static int virtscsi_target_alloc(struct scsi_target 
*starget)
 static void virtscsi_target_destroy(struct scsi_target *starget)
 {
struct virtio_scsi_target_state *tgt = starget->hostdata;
+
+   /* we can race with concurrent virtscsi_complete_cmd */
+   while (atomic_read(>reqs))
+   cpu_relax();
kfree(tgt);
 }

This completes a bit faster for the usual case, but SCSI target
destroy is not a fast path and the above runs the risk of the loop never
terminating.


Testcase:

This reproduces on Google Cloud, using the current, unmodified
ubuntu-1404-lts image (with the Ubuntu 4.4 kernel). Using the two attached
scripts, run e.g.

  ./create_shutdown_instance.sh 100

to create 100 instances. If an instance runs its startup script
successfully, it'll shut itself down right away. So instances that are
still running after a few minutes likely demonstrate this problem.

The issue reproduces easily with n1-standard-4.

create_shutdown_instance.sh:

#!/bin/bash -e

ZONE=us-central1-a

for i in $(seq -w $1); do
  gcloud compute instances create shutdown-experiment-$i \
--zone="${ZONE}" \
--image-family=ubuntu-1404-lts \
--image-project=ubuntu-os-cloud \
--machine-type=n1-standard-4 \
--scopes compute-rw \
--metadata-from-file startup-script=immediate_shutdown.sh &
done

wait

immediate_shutdown.sh:

#!/bin/bash -x

function get_metadata_value() {
  curl -H 'Metadata-Flavor: Google' \
"http://metadata.google.internal/computeMetadata/v1/instance/$1;
}

readonly ZONE="$(get_metadata_value zone | awk -F'/' '{print $NF}')"
gcloud compute instances delete "$(hostname)" --zone="${ZONE}" --quiet

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765241

Title:
  virtio_scsi race can corrupt memory, panic kernel

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765241/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1765241] Re: virtio_scsi race can corrupt memory, panic kernel

2018-04-19 Thread Jay Vosburgh
SRU Justification:

Impact:

 This issue can cause system panics of systems using the
virtio_scsi driver with the affected Ubuntu kernels. The issue manifests
irregularly, as it is timing dependent.

Fix:

 The issue is resolved by adding synchronization between the two
code paths that race with one another. The most straightforward fix
is to have the code wait for any outstanding
requests to drain prior to freeing the target structure, e.g.,

--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -762,6 +762,10 @@ static int virtscsi_target_alloc(struct scsi_target 
*starget)
 static void virtscsi_target_destroy(struct scsi_target *starget)
 {
struct virtio_scsi_target_state *tgt = starget->hostdata;
+
+ /* we can race with concurrent virtscsi_complete_cmd */
+ while (atomic_read(>reqs))
+ cpu_relax();
kfree(tgt);
 }


An alternative fix that was considered is to use a synchronize_rcu_expedited
call, as that is the functionality that blocks the race in unaffected kernels.
However, some call paths into virtscsi_target_destroy may hold mutexes that
are not held by the upstream RCU sync calls (which enter via the block layer).
For this reason the more confined fix described above was chosen.

Testcase:

This reproduces on Google Cloud, using the current, unmodified
ubuntu-1404-lts image (with the Ubuntu 4.4 kernel). Using the two attached
scripts, run e.g.

  ./create_shutdown_instance.sh 100

to create 100 instances. If an instance runs its startup script
successfully, it'll shut itself down right away. So instances that are
still running after a few minutes likely demonstrate this problem.

The issue reproduces easily with n1-standard-4.

create_shutdown_instance.sh:

#!/bin/bash -e

ZONE=us-central1-a

for i in $(seq -w $1); do
  gcloud compute instances create shutdown-experiment-$i \
--zone="${ZONE}" \
--image-family=ubuntu-1404-lts \
--image-project=ubuntu-os-cloud \
--machine-type=n1-standard-4 \
--scopes compute-rw \
--metadata-from-file startup-script=immediate_shutdown.sh &
done

wait

immediate_shutdown.sh:

#!/bin/bash -x

function get_metadata_value() {
  curl -H 'Metadata-Flavor: Google' \
"http://metadata.google.internal/computeMetadata/v1/instance/$1;
}

readonly ZONE="$(get_metadata_value zone | awk -F'/' '{print $NF}')"
gcloud compute instances delete "$(hostname)" --zone="${ZONE}" --quiet

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765241

Title:
  virtio_scsi race can corrupt memory, panic kernel

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765241/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1765241] [NEW] virtio_scsi race can corrupt memory, panic kernel

2018-04-18 Thread Jay Vosburgh
s the race
window on the Ubuntu 4.4 kernel.

Resolving the issue can be accomplished by adding an RCU sync
to virtscsi_target_destroy prior to freeing the target.  It is also possible
to use a loop of the format:

+   while (atomic_read(>reqs))
+   cpu_relax();

but this is higher risk as the loop is non-terminating in the case
of other failure.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Assignee: Jay Vosburgh (jvosburgh)
 Status: Confirmed

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Jay Vosburgh (jvosburgh)

** Changed in: linux (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1765241

Title:
  virtio_scsi race can corrupt memory, panic kernel

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765241/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1716747] Re: High system load and mouse delays - pipe A vblank wait timed out

2018-03-05 Thread Jay Vosburgh
Joe,

I didn't try anything in between, I went from 4.13.0-16 to -36 and -36
started wigging out again so I backed down to -16.  I can try some
interim kernels next week when I don't need to do work on the laptop in
question.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1716747

Title:
  High system load and mouse delays - pipe A vblank wait timed out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1716747/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1716747] Re: linux 4.12 - high system load and mouse delays - pipe A vblank wait timed out

2018-03-02 Thread Jay Vosburgh
Joe,

The issue has returned on my X220 tablet; running 4.13-0.36-generic and
the fully updated 17.10 user space.

Every time it happens the laptop display freezes for about 10 or 15
seconds.  A concurrent ssh session is unaffected.

[94261.464884] pipe A vblank wait timed out
[94261.464948] [ cut here ]
[94261.465044] WARNING: CPU: 2 PID: 16697 at /build/linux-r9581B/linux-4.13.0/dr
ivers/gpu/drm/i915/intel_display.c:12848 intel_atomic_commit_tail+0xfa7/0xfb0 [i
915]
[94261.465046] Modules linked in: ccm rfcomm xt_CHECKSUM iptable_mangle ipt_MASQ
UERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 n
f_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_t
cpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_
filter bnep binfmt_misc zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) sp
l(O) intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irq
bypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel arc4 a
es_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf snd_seq_mi
di snd_seq_midi_event snd_hda_codec_hdmi snd_rawmidi iwldvm mac80211 snd_hda_cod
ec_conexant snd_hda_codec_generic uvcvideo videobuf2_vmalloc videobuf2_memops vi
deobuf2_v4l2
[94261.465098]  input_leds thinkpad_acpi snd_seq snd_hda_intel serio_raw wmi_bmo
f videobuf2_core btusb btrtl btbcm iwlwifi videodev btintel joydev bluetooth nvr
am media snd_hda_codec snd_seq_device snd_hda_core ecdh_generic cfg80211 lpc_ich
 snd_hwdep snd_pcm shpchp snd_timer mei_me mei snd soundcore mac_hid nfsd parpor
t_pc ppdev auth_rpcgss nfs_acl lp lockd parport grace sunrpc ip_tables x_tables 
autofs4 i915 i2c_algo_bit drm_kms_helper syscopyarea e1000e sysfillrect wacom sy
simgblt ptp sdhci_pci fb_sys_fops ahci psmouse usbhid sdhci hid drm libahci pps_
core wmi video
[94261.465153] CPU: 2 PID: 16697 Comm: Xorg Tainted: P   O4.13.0-36-
generic #40-Ubuntu
[94261.465155] Hardware name: LENOVO 42992UU/42992UU, BIOS 8DET69WW (1.39 ) 07/1
8/2013
[94261.465157] task: 955d1d3845c0 task.stack: af29821bc000
[94261.465217] RIP: 0010:intel_atomic_commit_tail+0xfa7/0xfb0 [i915]
[94261.465219] RSP: 0018:af29821bf8a8 EFLAGS: 00010286
[94261.465221] RAX: 001c RBX:  RCX: 
[94261.465223] RDX:  RSI: 0002 RDI: 0246
[94261.465225] RBP: af29821bf960 R08: 001c R09: 6177206b6e616c62
[94261.465226] R10: af29821bf8a8 R11: 74756f2064656d69 R12: 003c6b37
[94261.465228] R13: 955d3fa08000 R14: 955d3fbb9000 R15: 0001
[94261.465231] FS:  7fa6fdfd0500() GS:955d5e28() knlGS:0
000
[94261.465233] CS:  0010 DS:  ES:  CR0: 80050033
[94261.465235] CR2: 55ccba6e9ba8 CR3: 000402762004 CR4: 000606e0

[94261.465237] Call Trace:
[94261.465250]  ? wait_woken+0x80/0x80
[94261.465303]  intel_atomic_commit+0x3d5/0x490 [i915]
[94261.465331]  ? drm_atomic_check_only+0x37e/0x540 [drm]
[94261.465352]  drm_atomic_commit+0x51/0x60 [drm]
[94261.465367]  restore_fbdev_mode+0x15e/0x270 [drm_kms_helper]
[94261.465379]  drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x80 
[drm_kms_helper]
[94261.465389]  drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper]
[94261.465447]  intel_fbdev_set_par+0x1a/0x70 [i915]
[94261.465451]  fb_set_var+0x19f/0x440
[94261.465456]  ? __find_get_block+0xb6/0x2b0
[94261.465460]  ? ext4_dirty_inode+0x48/0x70
[94261.465465]  ? __ext4_handle_dirty_metadata+0x87/0x1c0
[94261.465472]  fbcon_blank+0x2b7/0x3a0
[94261.465476]  ? find_get_entry+0x1e/0xd0
[94261.465483]  do_unblank_screen+0xba/0x1b0
[94261.465488]  vt_ioctl+0x4e1/0x11a0
[94261.465493]  ? __slab_free+0x14c/0x2d0
[94261.465497]  ? __slab_free+0x14c/0x2d0
[94261.465502]  tty_ioctl+0xf6/0x8b0
[94261.465507]  ? vga_arb_release+0xd6/0x130
[94261.465511]  ? security_file_free+0x44/0x60
[94261.465515]  ? dput.part.23+0xba/0x1e0
[94261.465521]  do_vfs_ioctl+0xa8/0x630
[94261.465527]  ? entry_SYSCALL_64_after_hwframe+0xe9/0x139
[94261.465530]  ? entry_SYSCALL_64_after_hwframe+0xe2/0x139
[94261.465534]  ? entry_SYSCALL_64_after_hwframe+0xdb/0x139
[94261.465537]  ? entry_SYSCALL_64_after_hwframe+0xd4/0x139
[94261.465541]  ? entry_SYSCALL_64_after_hwframe+0xcd/0x139
[94261.465545]  ? entry_SYSCALL_64_after_hwframe+0xc6/0x139
[94261.465548]  ? entry_SYSCALL_64_after_hwframe+0xbf/0x139
[94261.465552]  ? entry_SYSCALL_64_after_hwframe+0xb8/0x139
[94261.46]  ? entry_SYSCALL_64_after_hwframe+0xb1/0x139
[94261.465560]  SyS_ioctl+0x79/0x90
[94261.465563]  ? entry_SYSCALL_64_after_hwframe+0x72/0x139
[94261.465567]  entry_SYSCALL_64_fastpath+0x24/0xab
[94261.465570] RIP: 0033:0x7fa6fb442ef7
[94261.465572] RSP: 002b:7ffcc51286d8 EFLAGS: 3246 ORIG_RAX: 
0010
[94261.465575] RAX: ffda RBX: 000e RCX: 7fa6fb442ef7
[94261.465576] RDX: 

[Bug 1800254] Re: packet socket panic in Trusty 3.13.0-157 and later

2018-10-26 Thread Jay Vosburgh
Reproducer for ptype_all corruption.  Pass ifindex of an
administratively down interface on the command line.


** Attachment added: "packet-fry.c"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1800254/+attachment/5206100/+files/packet-fry.c

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1800254

Title:
  packet socket panic in Trusty 3.13.0-157 and later

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1800254/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1800254] [NEW] packet socket panic in Trusty 3.13.0-157 and later

2018-10-26 Thread Jay Vosburgh
Public bug reported:

SRU Justification:

Due to changes added as part of c108ac876c02 ("packet: hold bind lock when
rebinding to fanout hook"), it is possible for fanout_add to add a
packet_type handler via dev_add_pack and then kfree the memory backing the
packet_type.  This corrupts the ptype_all list, causing the system to
panic when network packet processing next traverses ptype_all.  The
erroneous path is taken when a PACKET_FANOUT setsockopt is performed on a
packet socket that is bound to an interface that is administratively down.

This is not due to any flaw of c108ac876c02, but rather than the packet
socket code base differs subtly in 3.13 as compared to 4.4.

This affects only the Trusty 3.13 kernel series, starting with
3.13.0-157.

Fix:

The remedy for this is to backport additional changes in the management of
the dev_add_pack calls from 4.4.  This moves the dev_add_pack and
dev_remove_pack calls from fanout_add and _release into __fanout_link and
_unlink.

Testcase:

The issue can be reproduced reliably by (a) creating an AF_PACKET socket,
binding it to an interface that is administratively down, and then (c)
attempting to set the PACKET_FANOUT sockopt.  The setsockopt call will
fail, but will corrupt ptype_all in the kernel.  Subsequent network traffic
will induce a panic when evaulating the corrupted ptype_all entry.  A
test program is attached.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1800254

Title:
  packet socket panic in Trusty 3.13.0-157 and later

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1800254/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1800254] Re: packet socket panic in Trusty 3.13.0-157 and later

2018-11-07 Thread Jay Vosburgh
** Tags removed: verification-needed-trusty

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1800254

Title:
  packet socket panic in Trusty 3.13.0-157 and later

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1800254/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1671951] Re: networkd should allow configuring IPV6 MTU

2018-11-26 Thread Jay Vosburgh
Regarding #2 from comment #19:

As the defined range for the ipv6.mtu is from IPV6_MIN_MTU to the
device's MTU, and the existing API returns an error if the ipv6.mtu is
out of range, I think it's reasonable for a configuration with the
ipv6.mtu > device MTU to fail.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1671951

Title:
  networkd should allow configuring IPV6 MTU

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1671951/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1805693] [NEW] User reports a hang on 18.04 LTS(4.15.18) under a heavy I/O load

2018-11-28 Thread Jay Vosburgh
Public bug reported:

User reports a hang under heavy I/O:

The IO hang problem on our cloud is caused by IO hang in block-wbt wbt_wait.
The fix commit id is 2887e41b910bb14fd847cf01ab7a5993db989d88. It is a block 
write buffer throttle queue lock contention and thundering herd issue in 
wbt_wait()

We can recreate the problem easily by running concurrent IO from
multiple VMs with sequential write. We can provide fio workload as
needed for recreate.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1805693

Title:
  User reports a hang on 18.04 LTS(4.15.18) under a heavy I/O load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805693/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1815237] Re: stop shipping "update-pciids" in /usr/sbin

2019-02-20 Thread Jay Vosburgh
** Also affects: pciutils (Ubuntu Precise)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1815237

Title:
  stop shipping "update-pciids" in /usr/sbin

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pciutils/+bug/1815237/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1873537] [NEW] PCIe AER device recovery failed due to logic flaw

2020-04-17 Thread Jay Vosburgh
Public bug reported:

SRU Justification

Impact:

During PCI Express Downstream Port Containment (DPC) recovery,
certain types of failures do not recover due to a logic flaw
in pcie_do_recovery().

The upstream git commit log explains the change:

PCI/ERR: Update error status after reset_link()
Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
reset_link() to recover from fatal errors.  But during fatal error
recovery, if the initial value of error status is PCI_ERS_RESULT_DISCONNECT
or PCI_ERS_RESULT_NO_AER_DRIVER then even after successful recovery (using
reset_link()) pcie_do_recovery() will report the recovery result as
failure.  Update the status of error after reset_link().

You can reproduce this issue by triggering a SW DPC using "DPC Software
Trigger" bit in "DPC Control Register".  You should see recovery failed
dmesg log as below:

  pcieport :00:16.0: DPC: containment event, status:0x1f27 source:0x
  pcieport :00:16.0: DPC: software trigger detected
  pci :04:00.0: AER: can't recover (no error_detected callback)
  pcieport :00:16.0: AER: device recovery failed

Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
Link: 
https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.158584.git.sathyanarayanan.kuppusw...@linux.intel.com
[bhelgaas: split pci_channel_io_frozen simplification to separate patch]
Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: Bjorn Helgaas 
Acked-by: Keith Busch 
Cc: Ashok Raj 

Note that a second prerequisite patch is necessary as well.  This patch,

commit b5dfbeacf74865a8d62a4f70f501cdc61510f8e0
Author: Kuppuswamy Sathyanarayanan 
Date:   Fri Mar 27 17:33:24 2020 -0500

PCI/ERR: Combine pci_channel_io_frozen cases

is a code readability change, and makes no functional changes.


Testcase:

On a system with DPC enabled, setpci may be used to set the DPC Software
Trigger bit (bit 6, value 0x40) in the DPC Control register of a suitable
PCIe device (a PCIe bridge, for example).

On a system lacking the fix, the output will be as shown above (i.e.,
culminating in the "device recovery failed" message).  With the fix
applied, the device successfully recovers, resulting in a message of the
form

pcieport :d9:01.0: AER: Device recovery successful


Regression Potential:

The risk of regression is low, as (a) the path in question currently does
not work, and (b) the changes are minimal, comprising only a housekeeping
change and the logically correct updating of a status variable that did
not previously occur.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1873537

Title:
  PCIe AER device recovery failed due to logic flaw

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1873537/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1869423] [NEW] Restore kernel control of PCIe DPC via option

2020-03-27 Thread Jay Vosburgh
Public bug reported:


SRU Justification:

Impact:

Since upstream commit eed85ff4c0da7 (4.16), control of PCIe DPC
(Downstream Port Containment) is coupled with control of AER (Advanced
Error Reporting), eliminating the option for the kernel to separately
manage DPC (which was previously the default behavior).

Fix:

The upstream commit log explains the change:

commit 35a0b2378c199d4f26e458b2ca38ea56aaf2d9b8
Author: Olof Johansson 
Date:   Wed Oct 23 12:22:05 2019 -0700

PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER control

Prior to eed85ff4c0da7 ("PCI/DPC: Enable DPC only if AER is available"),
Linux handled DPC events regardless of whether firmware had granted it
ownership of AER or DPC, e.g., via _OSC.

PCIe r5.0, sec 6.2.10, recommends that the OS link control of DPC to
control of AER, so after eed85ff4c0da7, Linux handles DPC events only if it
has control of AER.

On platforms that do not grant OS control of AER via _OSC, Linux DPC
handling worked before eed85ff4c0da7 but not after.

To make Linux DPC handling work on those platforms the same way they did
before, add a "pcie_ports=dpc-native" kernel parameter that makes Linux
handle DPC events regardless of whether it has control of AER.

[bhelgaas: commit log, move pcie_ports_dpc_native to drivers/pci/]
Link: https://lore.kernel.org/r/20191023192205.97024-1-o...@lixom.net
Signed-off-by: Olof Johansson 
Signed-off-by: Bjorn Helgaas 

Testcase:

Control of DPC can be determined from kernel boot messages when
pciehp probes a capable slot; when the kernel controls DPC, messages
of the format:

pcieport :2d:00.0: pciehp: Slot #0
pcieport :2d:00.0: DPC: error containment capabilities:

will appear; if the kernel does not control DPC, the DPC line will
not be present (only the "pciehp: Slot" message).

Additionally, devices bound to the kernel DPC PCIe port service
driver will be found in the /sys/bus/pci_express/drivers/dpc/ sysfs
directory; this will be empty of devices if the kernel does not control
DPC.

Regression Potential:

The risk of regression is low as (a) by default, the patch has no
effect (the default setting is to not enable the option), and (b) when
enabled, the patch restores functionality that previously worked, and was,
in fact, the default behavior.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1869423

Title:
  Restore kernel control of PCIe DPC via option

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1869423/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894780] Re: Oops and hang when starting LVM snapshots on 5.4.0-47

2020-09-09 Thread Jay Vosburgh
wgrant, you said:

That :a-152 is meant to be /sys/kernel/slab/:a-152. Even a
working kernel shows some trouble there:

  $ uname -a
  Linux  5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  $ ls -l /sys/kernel/slab | grep a-152
  lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-152

Are you saying that the symlink is "some trouble" here?  Because that
part isn't an error, that's the effect of slab merge (that the kernel
normally treats all slabs of the same size as one big slab with multiple
references, more or less).

Slab merge can be disabled via "slab_nomerge" on the command line.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780

Title:
  Oops and hang when starting LVM snapshots on 5.4.0-47

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1820929] Re: netplan should consider adding more udev attribute for exact matching of failover 3-netdev interfaces

2020-10-06 Thread Jay Vosburgh
Si-Wei,

In the test environment I'm using, the only change needed was to
initramfs-tools.  I suspect the udevd change you're thinking of was an
alternate implementation that we did not proceed with due to the
regression it introduced (that network interface names would change).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1820929

Title:
  netplan should consider adding more udev attribute for exact matching
  of failover 3-netdev interfaces

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/1820929/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot

2020-07-14 Thread Jay Vosburgh
** Changed in: linux (Ubuntu)
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1834322

Title:
  Losing port aggregate with 802.3ad port-channel/bonding aggregation on
  reboot

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1834322/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Jay Vosburgh
Thimo,

Thanks for the update; just to clarify, for your "procedure to recover,"
are you saying that that procedure will always resolve the damage, or
that even after that procedure, there may be corruption?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1820929] Re: netplan should consider adding more udev attribute for exact matching of failover 3-netdev interfaces

2020-11-10 Thread Jay Vosburgh
Si-Wei,

What environment and methodology are you testing with?  I do not see the
same results you are reporting.  I am using the instructions you
previously provided, and with an 18.04.5 Ubuntu image, I see the
expected network interface naming (ens3, ens3nsby), and do not see
/run/systemd/network or /run/udev/rules.d as you describe.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1820929

Title:
  netplan should consider adding more udev attribute for exact matching
  of failover 3-netdev interfaces

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/1820929/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1959702] Re: Regression: ip6 ndp broken, host bridge doesn't add vlan guest entry to mdb

2022-02-09 Thread Jay Vosburgh
Harry,

I'm still working to reproduce this, without success.  I have set
the .autoconf sysctl to 0 (which controls creation of local addresses in
response to received Router Advertisements), as well as setting
.addr_gen_mode to 1 (to disable SLAAC (fe80::) addresses).

In any event, .autoconf=0 and .addr_gen_mode=1 still fails to
reproduce the issue on my test system.

I find that if I disable mcast_flood on the relevant bridge ports
(i.e., bridge link set dev vnet1 mcast_flood off) I do see the behavior
you describe, but in that case no variant that I've tried (no vid, and all
vids in use) of "bridge mdb add ... grp ff02::1:ff00:2" appears to permit
ND traffic to pass to the VM destination.

Can you provide more specifics of how exactly the bridge and ports
are configured?  Ideally, both the method to set it up, as well as the
configuration details when failing (i.e., "ip -s -d link show" for the
bridge and relevant bridge ports, "bridge vlan show", "bridge mdb show",
"bridge fdb show br [bridgename]")

Also, to answer a question from your original report, the default
setting in the kernel for multicast_snooping (enabled, i.e., 1) hasn't
changed recently (and quite possibly ever).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1959702

Title:
  Regression: ip6 ndp broken, host bridge doesn't add vlan guest entry
  to mdb

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1959702/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1959702] Re: Regression: ip6 ndp broken, host bridge doesn't add vlan guest entry to mdb

2022-02-05 Thread Jay Vosburgh
Harry,

I am attempting to reproduce the behavior you describe, but have been
unable to do so.  Could you clarify some of the configuration specifics,
as follows:

Starting with step 2,

"2. On the host, create a bridge and vlan with two ports, each with the
chosen vlan as PVID and egress untagged. Assign those ports one each to
the guests as the interface, use e1000. Be sure to NOT autoconfigure the
host side of the bridge ports with any ip4 or ip6 address (including
fe80::), [...]"

I have configured testbr0 with no addresses, i.e., "ip addr show":

15: testbr0:  mtu 1500 qdisc noqueue state UP 
group default qlen 1000
link/ether 8c:dc:d4:b3:cb:f1 brd ff:ff:ff:ff:ff:ff

and added vnet1 and vnet3 (the interfaces that connect to the VMs), to
testbr0, and removing their respective fe80: addresses.  I set the bridge
vlan behavior via

bridge vlan add dev vnet1 vid 1234 pvid untagged
bridge vlan add dev vnet3 vid 1234 pvid untagged
bridge vlan del dev vnet1 vid 1
bridge vlan del dev vnet3 vid 1

then added a separate Ethernet device to the testbr0, removed its fe80:
address, and set its bridge vlan as

bridge vlan add dev eno50 vid 1234
bridge vlan del dev eno50 vid 1

Adding addresses to the interfaces within the VMs results in ping between
the two functioning (even in the face of "ip neigh flush dev enp7s0").

At no time does "bridge mdb" affect the behavior (it lists no entries),
and it in unnecessary to add ff02:ffxx entries as you describe in step 6
(I presume that your mention of "fe02::ff..." is a typo for "ff02").

I am testing with the Ubuntu 5.11.0-46 kernel, which differs slightly from
your 5.11.0-49.  From which version did you upgrade (i.e., what was the
last known working version)?  I'm preparing to test with 5.11.0-50 (I am
unable to locate -49), but would also like to know if the above
description matches your configuration.

Thanks.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1959702

Title:
  Regression: ip6 ndp broken, host bridge doesn't add vlan guest entry
  to mdb

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1959702/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs