[Bug 1840619] Re: skb_warn_bad_offload kernel splat due to CHECKSUM target not compatible with GSO skbs

Matthew Ruffell Tue, 20 Aug 2019 20:26:24 -0700

** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1840619
  
  [Impact]
  
  In environments which have CHECKSUM iptables rules set, the following
  kernel call trace will be created when a GSO skb is processed by the
  CHECKSUM target:
  
  WARNING: CPU: 34 PID: 806048 at 
/build/linux-zdslHp/linux-4.4.0/net/core/dev.c:2456 
skb_warn_bad_offload+0xcf/0x110()
  qr-f78bfdf7-fe: caps=(0x000000000fdb58e9, 0x000000000fdb58e9) len=1955 
data_len=479 gso_size=1448 gso_type=1 ip_summed=3
  CPU: 34 PID: 806048 Comm: haproxy Tainted: G        W  OE   4.4.0-138-generic 
#164-Ubuntu
  Call Trace:
   dump_stack+0x63/0x90
   warn_slowpath_common+0x82/0xc0
   warn_slowpath_fmt+0x5c/0x80
   ? ___ratelimit+0xa2/0xe0
   skb_warn_bad_offload+0xcf/0x110
   skb_checksum_help+0x185/0x1a0
   checksum_tg+0x22/0x29 [xt_CHECKSUM]
   ipt_do_table+0x301/0x730 [ip_tables]
   ? ipt_do_table+0x349/0x730 [ip_tables]
   iptable_mangle_hook+0x39/0x107 [iptable_mangle]
   nf_iterate+0x68/0x80
   nf_hook_slow+0x73/0xd0
   ip_output+0xcf/0xe0
   ? __ip_flush_pending_frames.isra.43+0x90/0x90
   ip_local_out+0x3b/0x50
   ip_queue_xmit+0x154/0x390
   __tcp_transmit_skb+0x52b/0x9b0
   tcp_write_xmit+0x1dd/0xf50
   __tcp_push_pending_frames+0x31/0xd0
   tcp_push+0xec/0x110
   tcp_sendmsg+0x749/0xba0
   inet_sendmsg+0x6b/0xa0
   sock_sendmsg+0x3e/0x50
   SYSC_sendto+0x101/0x190
   ? __sys_sendmsg+0x51/0x90
   SyS_sendto+0xe/0x10
   entry_SYSCALL_64_fastpath+0x22/0xc1
  
  The CHECKSUM target does not support GSO skbs, and when a GSO skb is
  passed to skb_checksum_help(), it errors out and skb_warn_bad_offload()
  is called.
  
  The above call trace was found in a customer environment which has an
  Openstack deployment, with the following sorts of iptables rules set:
  
  -A neutron-l3-agent-POSTROUTING -o qr-+ -p tcp -m tcp --sport 9697 -j 
CHECKSUM --checksum-fill
  -A neutron-dhcp-age-POSTROUTING -p udp -m udp --dport 68 -j CHECKSUM 
--checksum-fill
  
  This was causing haproxy running on the node to crash and restart every
  time a GSO skb was processed by the CHECKSUM target.
  
  I recommend reading the netdev mailing list thread for more details:
  https://www.spinics.net/lists/netdev/msg517366.html
  
  [Fix]
  
  This was fixed in 4.19 upstream with the below commit:
  
  commit 10568f6c5761db24249c610c94d6e44d5505a0ba
  Author: Florian Westphal <[email protected]>
  Date:   Wed Aug 22 11:33:27 2018 +0200
  Subject: netfilter: xt_checksum: ignore gso skbs
  
  This commit adds a check to see if the current skb is a gso skb, and if
  it is, skips skb_checksum_help(). It then continues on to check if the
  packet uses udp, and if it does, exits early. Otherwise it prints a
  single warning that CHECKSUM should be avoided, and if really needed,
  only for use with outbound udp.
  
  Note, 10568f6c5761db24249c610c94d6e44d5505a0ba was included in upstream
  stable version 4.18.13, and was backported to bionic in 4.15.0-58.64 by
  LP #1836426.
  
  This patch required minor backporting for 4.4, by slightly adjusting the
  context in the final patch hunk.
  
  [Testcase]
  
  You can reproduce this by adding the following iptables rule to the
  mangle table:
  
  -A POSTROUTING -p tcp -m tcp --sport 80 -j CHECKSUM --checksum-fill
  
  and running traffic over port 80 with incorrect checksums in the ip
  header.
  
  I built a test kernel, which is available here:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf216537-test
  
  For unpatched kernels, this causes the process which was handeling the
  socket to crash, as seen by haproxy crashing on a node in production
  which hits this issue.
  
- On patched kernels you see the warning printed to dmesg and no crashes
- occur.
+ On patched kernels you see the below warning printed to dmesg and no
+ crashes occur.
+ 
+ xt_CHECKSUM: CHECKSUM should be avoided.  If really needed, restrict
+ with "-p udp" and only use in OUTPUT
  
  [Regression Potential]
  
  The changes are limited only to users which have CHECKSUM rules enabled
  in their iptables configs. Openstack commonly configures such rules on
  deployment, even though they are not necessary, as almost all packets
  have their checksum calculated by NICs these days, and CHECKSUM is only
  around to service old dhcp clients which would discard UDP packets with
  empty checksums.
  
  This commit was selected for upstream -stable 4.18.13, and has made its
  way into bionic 4.15.0-58.64 by LP #1836426. There have been no reported
  problems and those kernels would have had sufficient testing with
  Openstack and its configured iptables rules.
  
  If any users are affected by regression, then they can simply delete any
  CHECKSUM entries in their iptables configs.


** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1840619
  
  [Impact]
  
  In environments which have CHECKSUM iptables rules set, the following
  kernel call trace will be created when a GSO skb is processed by the
  CHECKSUM target:
  
  WARNING: CPU: 34 PID: 806048 at 
/build/linux-zdslHp/linux-4.4.0/net/core/dev.c:2456 
skb_warn_bad_offload+0xcf/0x110()
  qr-f78bfdf7-fe: caps=(0x000000000fdb58e9, 0x000000000fdb58e9) len=1955 
data_len=479 gso_size=1448 gso_type=1 ip_summed=3
  CPU: 34 PID: 806048 Comm: haproxy Tainted: G        W  OE   4.4.0-138-generic 
#164-Ubuntu
  Call Trace:
   dump_stack+0x63/0x90
   warn_slowpath_common+0x82/0xc0
   warn_slowpath_fmt+0x5c/0x80
   ? ___ratelimit+0xa2/0xe0
   skb_warn_bad_offload+0xcf/0x110
   skb_checksum_help+0x185/0x1a0
   checksum_tg+0x22/0x29 [xt_CHECKSUM]
   ipt_do_table+0x301/0x730 [ip_tables]
   ? ipt_do_table+0x349/0x730 [ip_tables]
   iptable_mangle_hook+0x39/0x107 [iptable_mangle]
   nf_iterate+0x68/0x80
   nf_hook_slow+0x73/0xd0
   ip_output+0xcf/0xe0
   ? __ip_flush_pending_frames.isra.43+0x90/0x90
   ip_local_out+0x3b/0x50
   ip_queue_xmit+0x154/0x390
   __tcp_transmit_skb+0x52b/0x9b0
   tcp_write_xmit+0x1dd/0xf50
   __tcp_push_pending_frames+0x31/0xd0
   tcp_push+0xec/0x110
   tcp_sendmsg+0x749/0xba0
   inet_sendmsg+0x6b/0xa0
   sock_sendmsg+0x3e/0x50
   SYSC_sendto+0x101/0x190
   ? __sys_sendmsg+0x51/0x90
   SyS_sendto+0xe/0x10
   entry_SYSCALL_64_fastpath+0x22/0xc1
  
  The CHECKSUM target does not support GSO skbs, and when a GSO skb is
  passed to skb_checksum_help(), it errors out and skb_warn_bad_offload()
  is called.
  
  The above call trace was found in a customer environment which has an
  Openstack deployment, with the following sorts of iptables rules set:
  
  -A neutron-l3-agent-POSTROUTING -o qr-+ -p tcp -m tcp --sport 9697 -j 
CHECKSUM --checksum-fill
  -A neutron-dhcp-age-POSTROUTING -p udp -m udp --dport 68 -j CHECKSUM 
--checksum-fill
  
  This was causing haproxy running on the node to crash and restart every
  time a GSO skb was processed by the CHECKSUM target.
  
  I recommend reading the netdev mailing list thread for more details:
  https://www.spinics.net/lists/netdev/msg517366.html
  
  [Fix]
  
  This was fixed in 4.19 upstream with the below commit:
  
  commit 10568f6c5761db24249c610c94d6e44d5505a0ba
  Author: Florian Westphal <[email protected]>
  Date:   Wed Aug 22 11:33:27 2018 +0200
  Subject: netfilter: xt_checksum: ignore gso skbs
  
  This commit adds a check to see if the current skb is a gso skb, and if
  it is, skips skb_checksum_help(). It then continues on to check if the
  packet uses udp, and if it does, exits early. Otherwise it prints a
  single warning that CHECKSUM should be avoided, and if really needed,
  only for use with outbound udp.
  
  Note, 10568f6c5761db24249c610c94d6e44d5505a0ba was included in upstream
  stable version 4.18.13, and was backported to bionic in 4.15.0-58.64 by
  LP #1836426.
  
  This patch required minor backporting for 4.4, by slightly adjusting the
  context in the final patch hunk.
  
  [Testcase]
  
  You can reproduce this by adding the following iptables rule to the
  mangle table:
  
- -A POSTROUTING -p tcp -m tcp --sport 80 -j CHECKSUM --checksum-fill
+ -t mangle -A POSTROUTING -p tcp -m tcp --sport 80 -j CHECKSUM
+ --checksum-fill
  
  and running traffic over port 80 with incorrect checksums in the ip
  header.
  
  I built a test kernel, which is available here:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf216537-test
  
  For unpatched kernels, this causes the process which was handeling the
  socket to crash, as seen by haproxy crashing on a node in production
  which hits this issue.
  
  On patched kernels you see the below warning printed to dmesg and no
  crashes occur.
  
  xt_CHECKSUM: CHECKSUM should be avoided.  If really needed, restrict
  with "-p udp" and only use in OUTPUT
  
  [Regression Potential]
  
  The changes are limited only to users which have CHECKSUM rules enabled
  in their iptables configs. Openstack commonly configures such rules on
  deployment, even though they are not necessary, as almost all packets
  have their checksum calculated by NICs these days, and CHECKSUM is only
  around to service old dhcp clients which would discard UDP packets with
  empty checksums.
  
  This commit was selected for upstream -stable 4.18.13, and has made its
  way into bionic 4.15.0-58.64 by LP #1836426. There have been no reported
  problems and those kernels would have had sufficient testing with
  Openstack and its configured iptables rules.
  
  If any users are affected by regression, then they can simply delete any
  CHECKSUM entries in their iptables configs.

** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1840619
  
  [Impact]
  
  In environments which have CHECKSUM iptables rules set, the following
  kernel call trace will be created when a GSO skb is processed by the
  CHECKSUM target:
  
  WARNING: CPU: 34 PID: 806048 at 
/build/linux-zdslHp/linux-4.4.0/net/core/dev.c:2456 
skb_warn_bad_offload+0xcf/0x110()
  qr-f78bfdf7-fe: caps=(0x000000000fdb58e9, 0x000000000fdb58e9) len=1955 
data_len=479 gso_size=1448 gso_type=1 ip_summed=3
  CPU: 34 PID: 806048 Comm: haproxy Tainted: G        W  OE   4.4.0-138-generic 
#164-Ubuntu
  Call Trace:
   dump_stack+0x63/0x90
   warn_slowpath_common+0x82/0xc0
   warn_slowpath_fmt+0x5c/0x80
   ? ___ratelimit+0xa2/0xe0
   skb_warn_bad_offload+0xcf/0x110
   skb_checksum_help+0x185/0x1a0
   checksum_tg+0x22/0x29 [xt_CHECKSUM]
   ipt_do_table+0x301/0x730 [ip_tables]
   ? ipt_do_table+0x349/0x730 [ip_tables]
   iptable_mangle_hook+0x39/0x107 [iptable_mangle]
   nf_iterate+0x68/0x80
   nf_hook_slow+0x73/0xd0
   ip_output+0xcf/0xe0
   ? __ip_flush_pending_frames.isra.43+0x90/0x90
   ip_local_out+0x3b/0x50
   ip_queue_xmit+0x154/0x390
   __tcp_transmit_skb+0x52b/0x9b0
   tcp_write_xmit+0x1dd/0xf50
   __tcp_push_pending_frames+0x31/0xd0
   tcp_push+0xec/0x110
   tcp_sendmsg+0x749/0xba0
   inet_sendmsg+0x6b/0xa0
   sock_sendmsg+0x3e/0x50
   SYSC_sendto+0x101/0x190
   ? __sys_sendmsg+0x51/0x90
   SyS_sendto+0xe/0x10
   entry_SYSCALL_64_fastpath+0x22/0xc1
  
  The CHECKSUM target does not support GSO skbs, and when a GSO skb is
  passed to skb_checksum_help(), it errors out and skb_warn_bad_offload()
  is called.
  
  The above call trace was found in a customer environment which has an
  Openstack deployment, with the following sorts of iptables rules set:
  
  -A neutron-l3-agent-POSTROUTING -o qr-+ -p tcp -m tcp --sport 9697 -j 
CHECKSUM --checksum-fill
  -A neutron-dhcp-age-POSTROUTING -p udp -m udp --dport 68 -j CHECKSUM 
--checksum-fill
  
  This was causing haproxy running on the node to crash and restart every
  time a GSO skb was processed by the CHECKSUM target.
  
  I recommend reading the netdev mailing list thread for more details:
  https://www.spinics.net/lists/netdev/msg517366.html
  
  [Fix]
  
  This was fixed in 4.19 upstream with the below commit:
  
  commit 10568f6c5761db24249c610c94d6e44d5505a0ba
  Author: Florian Westphal <[email protected]>
  Date:   Wed Aug 22 11:33:27 2018 +0200
  Subject: netfilter: xt_checksum: ignore gso skbs
  
  This commit adds a check to see if the current skb is a gso skb, and if
  it is, skips skb_checksum_help(). It then continues on to check if the
  packet uses udp, and if it does, exits early. Otherwise it prints a
  single warning that CHECKSUM should be avoided, and if really needed,
  only for use with outbound udp.
  
  Note, 10568f6c5761db24249c610c94d6e44d5505a0ba was included in upstream
  stable version 4.18.13, and was backported to bionic in 4.15.0-58.64 by
  LP #1836426.
  
  This patch required minor backporting for 4.4, by slightly adjusting the
  context in the final patch hunk.
  
  [Testcase]
  
  You can reproduce this by adding the following iptables rule to the
  mangle table:
  
  -t mangle -A POSTROUTING -p tcp -m tcp --sport 80 -j CHECKSUM
  --checksum-fill
  
  and running traffic over port 80 with incorrect checksums in the ip
  header.
  
  I built a test kernel, which is available here:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf216537-test
  
- For unpatched kernels, this causes the process which was handeling the
+ For unpatched kernels, this causes the process which was handling the
  socket to crash, as seen by haproxy crashing on a node in production
  which hits this issue.
  
  On patched kernels you see the below warning printed to dmesg and no
  crashes occur.
  
  xt_CHECKSUM: CHECKSUM should be avoided.  If really needed, restrict
  with "-p udp" and only use in OUTPUT
  
  [Regression Potential]
  
  The changes are limited only to users which have CHECKSUM rules enabled
  in their iptables configs. Openstack commonly configures such rules on
  deployment, even though they are not necessary, as almost all packets
  have their checksum calculated by NICs these days, and CHECKSUM is only
  around to service old dhcp clients which would discard UDP packets with
  empty checksums.
  
  This commit was selected for upstream -stable 4.18.13, and has made its
  way into bionic 4.15.0-58.64 by LP #1836426. There have been no reported
  problems and those kernels would have had sufficient testing with
  Openstack and its configured iptables rules.
  
  If any users are affected by regression, then they can simply delete any
  CHECKSUM entries in their iptables configs.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1840619

Title:
  skb_warn_bad_offload kernel splat due to CHECKSUM target not
  compatible with GSO skbs

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840619/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1840619] Re: skb_warn_bad_offload kernel splat due to CHECKSUM target not compatible with GSO skbs

Reply via email to