[Kernel-packages] [Bug 1819786] Re: 4.15 kernel ip_vs --ops causes performance and hang problem

2019-05-14 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-50.54

---
linux (4.15.0-50.54) bionic; urgency=medium

  * CVE-2018-12126 // CVE-2018-12127 // CVE-2018-12130
- Documentation/l1tf: Fix small spelling typo
- x86/cpu: Sanitize FAM6_ATOM naming
- kvm: x86: Report STIBP on GET_SUPPORTED_CPUID
- locking/atomics, asm-generic: Move some macros from  to a
  new  file
- tools include: Adopt linux/bits.h
- x86/msr-index: Cleanup bit defines
- x86/speculation: Consolidate CPU whitelists
- x86/speculation/mds: Add basic bug infrastructure for MDS
- x86/speculation/mds: Add BUG_MSBDS_ONLY
- x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
- x86/speculation/mds: Add mds_clear_cpu_buffers()
- x86/speculation/mds: Clear CPU buffers on exit to user
- x86/kvm/vmx: Add MDS protection when L1D Flush is not active
- x86/speculation/mds: Conditionally clear CPU buffers on idle entry
- x86/speculation/mds: Add mitigation control for MDS
- x86/speculation/mds: Add sysfs reporting for MDS
- x86/speculation/mds: Add mitigation mode VMWERV
- Documentation: Move L1TF to separate directory
- Documentation: Add MDS vulnerability documentation
- x86/speculation/mds: Add mds=full,nosmt cmdline option
- x86/speculation: Move arch_smt_update() call to after mitigation decisions
- x86/speculation/mds: Add SMT warning message
- x86/speculation/mds: Fix comment
- x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
- x86/speculation/mds: Add 'mitigations=' support for MDS

  * CVE-2017-5715 // CVE-2017-5753
- s390/speculation: Support 'mitigations=' cmdline option

  * CVE-2017-5715 // CVE-2017-5753 // CVE-2017-5754 // CVE-2018-3639
- powerpc/speculation: Support 'mitigations=' cmdline option

  * CVE-2017-5715 // CVE-2017-5754 // CVE-2018-3620 // CVE-2018-3639 //
CVE-2018-3646
- cpu/speculation: Add 'mitigations=' cmdline option
- x86/speculation: Support 'mitigations=' cmdline option

  * Packaging resync (LP: #1786013)
- [Packaging] resync git-ubuntu-log

linux (4.15.0-49.53) bionic; urgency=medium

  * linux: 4.15.0-49.53 -proposed tracker (LP: #1826358)

  * Backport support for software count cache flush Spectre v2 mitigation. (CVE)
(required for POWER9 DD2.3) (LP: #1822870)
- powerpc/64s: Add support for ori barrier_nospec patching
- powerpc/64s: Patch barrier_nospec in modules
- powerpc/64s: Enable barrier_nospec based on firmware settings
- powerpc: Use barrier_nospec in copy_from_user()
- powerpc/64: Use barrier_nospec in syscall entry
- powerpc/64s: Enhance the information in cpu_show_spectre_v1()
- powerpc/64: Disable the speculation barrier from the command line
- powerpc/64: Make stf barrier PPC_BOOK3S_64 specific.
- powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC
- powerpc/64: Call setup_barrier_nospec() from setup_arch()
- powerpc/64: Make meltdown reporting Book3S 64 specific
- powerpc/lib/code-patching: refactor patch_instruction()
- powerpc/lib/feature-fixups: use raw_patch_instruction()
- powerpc/asm: Add a patch_site macro & helpers for patching instructions
- powerpc/64s: Add new security feature flags for count cache flush
- powerpc/64s: Add support for software count cache flush
- powerpc/pseries: Query hypervisor for count cache flush settings
- powerpc/powernv: Query firmware for count cache flush settings
- powerpc/fsl: Add nospectre_v2 command line argument
- KVM: PPC: Book3S: Add count cache flush parameters to 
kvmppc_get_cpu_char()
- [Config] Add CONFIG_PPC_BARRIER_NOSPEC

  * Packaging resync (LP: #1786013)
- [Packaging] resync git-ubuntu-log

  * autopkgtests run too often, too much and don't skip enough (LP: #1823056)
- [Debian] Set +x on rebuild testcase.
- [Debian] Skip rebuild test, for regression-suite deps.
- [Debian] Make ubuntu-regression-suite skippable on unbootable kernels.
- [Debian] make rebuild use skippable error codes when skipping.
- [Debian] Only run regression-suite, if requested to.

  * bionic: fork out linux-snapdragon into its own topic kernel (LP: #1820868)
- [Packaging] remove arm64 snapdragon from getabis
- [Config] config changes for snapdragon split
- packaging: arm64: disable building the snapdragon flavour
- [Packaging] arm64: Drop snapdragon from kernel-versions

  * CVE-2017-5753
- KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()
- media: dvb_ca_en50221: prevent using slot_info for Spectre attacs
- sysvipc/sem: mitigate semnum index against spectre v1
- libahci: Fix possible Spectre-v1 pmp indexing in ahci_led_store()
- s390/keyboard: sanitize array index in do_kdsk_ioctl
- arm64: fix possible spectre-v1 write in ptrace_hbp_set_event()
- KVM: arm/arm64: vgic: Fix possible spectre-v1 write in 
vgic_mmio_write_apr()
- pktcdvd: Fix possible Spectre-v1 for 

[Kernel-packages] [Bug 1819786] Re: 4.15 kernel ip_vs --ops causes performance and hang problem

2019-04-29 Thread Marc Hasson
I'd like to note that I tested/verified BOTH these kernel version in
their respective Proposed states that have the ipvs fix.   We really
most need the 16.04 4.15 hwe kernel released, which appears to be in
progress but this is a bionic bug so its unclear if another step is
required.

These kernels passed my ipvs tests properly, the fix worked perfectly:

4.15.0-49.52~16.04.1 (xenial-proposed)
Linux director-16-04 4.15.0-49-generic #52~16.04.1-Ubuntu SMP Thu Apr 25 
18:54:26 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

4.15.0-49.53 (bionic-proposed)
Linux direct-18-04 4.15.0-49-generic #53-Ubuntu SMP Fri Apr 26 06:45:49 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux


** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1819786

Title:
  4.15 kernel ip_vs --ops causes performance and hang problem

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  === SRU Justification ===
  [Impact]
  From the commit message:
  "Connections in One-packet scheduling mode (-o, --ops) are
  removed with refcnt=0 because they are not hashed in conn table."

  [Fix]
  From the commit message:
  "To avoid refcount_dec reporting this as error, change them to be
  removed with refcount_dec_if_one as all other connections."

  [Test]
  The bug reporter has a reproducer and confirmed this commit fixes the
  issue.

  [Regression Potential]
  Low. Fix for a specific use case, and it's in upstream for a while.

  === Original Bug Report ===
  On our 16.04LTS (and earlier) systems we used the ipvsadm --ops UDP
  support (one-packet scheduling) to get a better distribution amongst
  our real servers behind the load-balancer for some small subset of
  applications.

  This has worked fine through the 4.4.0-xxx kernels. But when we started
  a program to upgrade systems to use the 4.15 series of kernels to take
  advantage of new facilities, the subset of systems which used the --ops
  option ran into problems. Everything else with the 4.15 kernels appeared
  to work well.

  This issue was reported in #1817247 against 16.04LTS with the HWE 4.15 kernel
  but has not received any acknowledgement after having been reported weeks
  ago.   So we have moved on to confirm that a stock 18.04LTS system with the
  latest expected/standard 4.15 kernel also has this issue as well and report
  that here.  Perhaps this will get more attention.

  The issue appears to have been the change in the ip_vs module from using
  "atomic_*()" increment/decrement functions in the 4.4 kernel to instead
  use "refcount_*()" functions in a later kernel, including the 4.15 one
  we switched to. Unfortunately, the simple refcount_dec() function was
  inadequate, in putting out a time-consuming message and handling when
  the refcount dropped to zero, which is expected in the case of --ops
  support that retains no state post packet delivery. I will upload an
  attachment with the sample messages that get put out at packet arrival
  rate, which destroys performance of course. This test VM reports the
  identical errors we see in our production servers, but at least throwing
  only a couple of test --ops packets at it doesn't crash/hang the 18.04 system
  as it did in the 16.04 VM reported earlier.   And in production, with the
  far greater packet rates, our systems fail since the attached call backtrace
  *** appears on every packet!! ***

  This issue was apparently already recognized as an error and has appeared
  as a fix in upstream kernels. This is a reference to the 4.17 version
  of the fix that we'd like to see incorporated into the next possible
  kernel maintenance release:

  
https://github.com/torvalds/linux/commit/a050d345cef0dc6249263540da1e902bba617e43
  #diff-75923493f6e3f314b196a8223b0d6342

  We have successfully used the livepatch facility to build a livepatch .ko
  with the above diffs on our 4.15.0-36 system and successfully demonstrated
  the contrast in good/bad behavior with/without the livepatch module
  loaded. But we'd rather not have to build a version of livepatch.ko for
  each kernel maintenance release, such as the 4.5.0-46 kernel here used to
  demonstrate the issue persists in the Ubuntu mainline distro.

  The problem is easy to generate, with only a couple of packets
  and a simple configuration. Here's a very basic test (addresses
  rewritten/obscured) version of an example configuration for 2 servers
  that worked on my test VM:

  ipvsadm -A -f 100 -s rr --ops
  ipvsadm -a -f 100 -r 10.129.131.227:0 -g -w 
  ipvsadm -a -f 100 -r 10.129.131.228:0 -g -w 
  iptables -t mangle -A PREROUTING -d 172.16.5.1/32 -j MARK --set-xmark 
0x64/0x
  ifconfig lo:0 172.16.5.1/32 up

  Routing and addressing to achieve the above, or adaptation for one's
  own test environment, is 

[Kernel-packages] [Bug 1819786] Re: 4.15 kernel ip_vs --ops causes performance and hang problem

2019-04-29 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1819786

Title:
  4.15 kernel ip_vs --ops causes performance and hang problem

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  === SRU Justification ===
  [Impact]
  From the commit message:
  "Connections in One-packet scheduling mode (-o, --ops) are
  removed with refcnt=0 because they are not hashed in conn table."

  [Fix]
  From the commit message:
  "To avoid refcount_dec reporting this as error, change them to be
  removed with refcount_dec_if_one as all other connections."

  [Test]
  The bug reporter has a reproducer and confirmed this commit fixes the
  issue.

  [Regression Potential]
  Low. Fix for a specific use case, and it's in upstream for a while.

  === Original Bug Report ===
  On our 16.04LTS (and earlier) systems we used the ipvsadm --ops UDP
  support (one-packet scheduling) to get a better distribution amongst
  our real servers behind the load-balancer for some small subset of
  applications.

  This has worked fine through the 4.4.0-xxx kernels. But when we started
  a program to upgrade systems to use the 4.15 series of kernels to take
  advantage of new facilities, the subset of systems which used the --ops
  option ran into problems. Everything else with the 4.15 kernels appeared
  to work well.

  This issue was reported in #1817247 against 16.04LTS with the HWE 4.15 kernel
  but has not received any acknowledgement after having been reported weeks
  ago.   So we have moved on to confirm that a stock 18.04LTS system with the
  latest expected/standard 4.15 kernel also has this issue as well and report
  that here.  Perhaps this will get more attention.

  The issue appears to have been the change in the ip_vs module from using
  "atomic_*()" increment/decrement functions in the 4.4 kernel to instead
  use "refcount_*()" functions in a later kernel, including the 4.15 one
  we switched to. Unfortunately, the simple refcount_dec() function was
  inadequate, in putting out a time-consuming message and handling when
  the refcount dropped to zero, which is expected in the case of --ops
  support that retains no state post packet delivery. I will upload an
  attachment with the sample messages that get put out at packet arrival
  rate, which destroys performance of course. This test VM reports the
  identical errors we see in our production servers, but at least throwing
  only a couple of test --ops packets at it doesn't crash/hang the 18.04 system
  as it did in the 16.04 VM reported earlier.   And in production, with the
  far greater packet rates, our systems fail since the attached call backtrace
  *** appears on every packet!! ***

  This issue was apparently already recognized as an error and has appeared
  as a fix in upstream kernels. This is a reference to the 4.17 version
  of the fix that we'd like to see incorporated into the next possible
  kernel maintenance release:

  
https://github.com/torvalds/linux/commit/a050d345cef0dc6249263540da1e902bba617e43
  #diff-75923493f6e3f314b196a8223b0d6342

  We have successfully used the livepatch facility to build a livepatch .ko
  with the above diffs on our 4.15.0-36 system and successfully demonstrated
  the contrast in good/bad behavior with/without the livepatch module
  loaded. But we'd rather not have to build a version of livepatch.ko for
  each kernel maintenance release, such as the 4.5.0-46 kernel here used to
  demonstrate the issue persists in the Ubuntu mainline distro.

  The problem is easy to generate, with only a couple of packets
  and a simple configuration. Here's a very basic test (addresses
  rewritten/obscured) version of an example configuration for 2 servers
  that worked on my test VM:

  ipvsadm -A -f 100 -s rr --ops
  ipvsadm -a -f 100 -r 10.129.131.227:0 -g -w 
  ipvsadm -a -f 100 -r 10.129.131.228:0 -g -w 
  iptables -t mangle -A PREROUTING -d 172.16.5.1/32 -j MARK --set-xmark 
0x64/0x
  ifconfig lo:0 172.16.5.1/32 up

  Routing and addressing to achieve the above, or adaptation for one's
  own test environment, is left to the tester.  I just added alias 10.129.131.x
  addresses on my "outbound" interface and a static 

[Kernel-packages] [Bug 1819786] Re: 4.15 kernel ip_vs --ops causes performance and hang problem

2019-04-14 Thread Khaled El Mously
** Changed in: linux (Ubuntu Bionic)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1819786

Title:
  4.15 kernel ip_vs --ops causes performance and hang problem

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  === SRU Justification ===
  [Impact]
  From the commit message:
  "Connections in One-packet scheduling mode (-o, --ops) are
  removed with refcnt=0 because they are not hashed in conn table."

  [Fix]
  From the commit message:
  "To avoid refcount_dec reporting this as error, change them to be
  removed with refcount_dec_if_one as all other connections."

  [Test]
  The bug reporter has a reproducer and confirmed this commit fixes the
  issue.

  [Regression Potential]
  Low. Fix for a specific use case, and it's in upstream for a while.

  === Original Bug Report ===
  On our 16.04LTS (and earlier) systems we used the ipvsadm --ops UDP
  support (one-packet scheduling) to get a better distribution amongst
  our real servers behind the load-balancer for some small subset of
  applications.

  This has worked fine through the 4.4.0-xxx kernels. But when we started
  a program to upgrade systems to use the 4.15 series of kernels to take
  advantage of new facilities, the subset of systems which used the --ops
  option ran into problems. Everything else with the 4.15 kernels appeared
  to work well.

  This issue was reported in #1817247 against 16.04LTS with the HWE 4.15 kernel
  but has not received any acknowledgement after having been reported weeks
  ago.   So we have moved on to confirm that a stock 18.04LTS system with the
  latest expected/standard 4.15 kernel also has this issue as well and report
  that here.  Perhaps this will get more attention.

  The issue appears to have been the change in the ip_vs module from using
  "atomic_*()" increment/decrement functions in the 4.4 kernel to instead
  use "refcount_*()" functions in a later kernel, including the 4.15 one
  we switched to. Unfortunately, the simple refcount_dec() function was
  inadequate, in putting out a time-consuming message and handling when
  the refcount dropped to zero, which is expected in the case of --ops
  support that retains no state post packet delivery. I will upload an
  attachment with the sample messages that get put out at packet arrival
  rate, which destroys performance of course. This test VM reports the
  identical errors we see in our production servers, but at least throwing
  only a couple of test --ops packets at it doesn't crash/hang the 18.04 system
  as it did in the 16.04 VM reported earlier.   And in production, with the
  far greater packet rates, our systems fail since the attached call backtrace
  *** appears on every packet!! ***

  This issue was apparently already recognized as an error and has appeared
  as a fix in upstream kernels. This is a reference to the 4.17 version
  of the fix that we'd like to see incorporated into the next possible
  kernel maintenance release:

  
https://github.com/torvalds/linux/commit/a050d345cef0dc6249263540da1e902bba617e43
  #diff-75923493f6e3f314b196a8223b0d6342

  We have successfully used the livepatch facility to build a livepatch .ko
  with the above diffs on our 4.15.0-36 system and successfully demonstrated
  the contrast in good/bad behavior with/without the livepatch module
  loaded. But we'd rather not have to build a version of livepatch.ko for
  each kernel maintenance release, such as the 4.5.0-46 kernel here used to
  demonstrate the issue persists in the Ubuntu mainline distro.

  The problem is easy to generate, with only a couple of packets
  and a simple configuration. Here's a very basic test (addresses
  rewritten/obscured) version of an example configuration for 2 servers
  that worked on my test VM:

  ipvsadm -A -f 100 -s rr --ops
  ipvsadm -a -f 100 -r 10.129.131.227:0 -g -w 
  ipvsadm -a -f 100 -r 10.129.131.228:0 -g -w 
  iptables -t mangle -A PREROUTING -d 172.16.5.1/32 -j MARK --set-xmark 
0x64/0x
  ifconfig lo:0 172.16.5.1/32 up

  Routing and addressing to achieve the above, or adaptation for one's
  own test environment, is left to the tester.  I just added alias 10.129.131.x
  addresses on my "outbound" interface and a static route for 172.16.5.1 to my
  client system so the test packets arrived on the "inbound" interface.

  I set up routing and addresses on my 2 NIC test such that packets arrived
  on my test machine's eth1 NIC and were directed by ip_vs out the eth2. To
  test, all I did was throw a few UDP packets via traceroute at the address
  on the iptables/firewall mark rule so that the eth1 interface of the
  test system was the traceroute system's default gateway:

    traceroute -m 2 172.16.5.1

  Without the fix my test ip_vs system either hangs or puts out messages
  as per 

[Kernel-packages] [Bug 1819786] Re: 4.15 kernel ip_vs --ops causes performance and hang problem

2019-04-08 Thread Kai-Heng Feng
** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Description changed:

+ === SRU Justification ===
+ [Impact]
+ From the commit message:
+ "Connections in One-packet scheduling mode (-o, --ops) are
+ removed with refcnt=0 because they are not hashed in conn table."
+ 
+ [Fix]
+ From the commit message:
+ "To avoid refcount_dec reporting this as error, change them to be
+ removed with refcount_dec_if_one as all other connections."
+ 
+ [Test]
+ The bug reporter has a reproducer and confirmed this commit fixes the
+ issue.
+ 
+ [Regression Potential]
+ Low. Fix for a specific use case, and it's in upstream for a while.
+ 
+ === Original Bug Report ===
  On our 16.04LTS (and earlier) systems we used the ipvsadm --ops UDP
  support (one-packet scheduling) to get a better distribution amongst
  our real servers behind the load-balancer for some small subset of
  applications.
  
  This has worked fine through the 4.4.0-xxx kernels. But when we started
  a program to upgrade systems to use the 4.15 series of kernels to take
  advantage of new facilities, the subset of systems which used the --ops
  option ran into problems. Everything else with the 4.15 kernels appeared
  to work well.
  
  This issue was reported in #1817247 against 16.04LTS with the HWE 4.15 kernel
  but has not received any acknowledgement after having been reported weeks
  ago.   So we have moved on to confirm that a stock 18.04LTS system with the
  latest expected/standard 4.15 kernel also has this issue as well and report
  that here.  Perhaps this will get more attention.
  
  The issue appears to have been the change in the ip_vs module from using
  "atomic_*()" increment/decrement functions in the 4.4 kernel to instead
  use "refcount_*()" functions in a later kernel, including the 4.15 one
  we switched to. Unfortunately, the simple refcount_dec() function was
  inadequate, in putting out a time-consuming message and handling when
  the refcount dropped to zero, which is expected in the case of --ops
  support that retains no state post packet delivery. I will upload an
  attachment with the sample messages that get put out at packet arrival
  rate, which destroys performance of course. This test VM reports the
  identical errors we see in our production servers, but at least throwing
  only a couple of test --ops packets at it doesn't crash/hang the 18.04 system
  as it did in the 16.04 VM reported earlier.   And in production, with the
  far greater packet rates, our systems fail since the attached call backtrace
  *** appears on every packet!! ***
  
  This issue was apparently already recognized as an error and has appeared
  as a fix in upstream kernels. This is a reference to the 4.17 version
  of the fix that we'd like to see incorporated into the next possible
  kernel maintenance release:
  
  
https://github.com/torvalds/linux/commit/a050d345cef0dc6249263540da1e902bba617e43
  #diff-75923493f6e3f314b196a8223b0d6342
  
  We have successfully used the livepatch facility to build a livepatch .ko
  with the above diffs on our 4.15.0-36 system and successfully demonstrated
  the contrast in good/bad behavior with/without the livepatch module
  loaded. But we'd rather not have to build a version of livepatch.ko for
  each kernel maintenance release, such as the 4.5.0-46 kernel here used to
  demonstrate the issue persists in the Ubuntu mainline distro.
  
  The problem is easy to generate, with only a couple of packets
  and a simple configuration. Here's a very basic test (addresses
  rewritten/obscured) version of an example configuration for 2 servers
  that worked on my test VM:
  
  ipvsadm -A -f 100 -s rr --ops
  ipvsadm -a -f 100 -r 10.129.131.227:0 -g -w 
  ipvsadm -a -f 100 -r 10.129.131.228:0 -g -w 
  iptables -t mangle -A PREROUTING -d 172.16.5.1/32 -j MARK --set-xmark 
0x64/0x
  ifconfig lo:0 172.16.5.1/32 up
  
  Routing and addressing to achieve the above, or adaptation for one's
  own test environment, is left to the tester.  I just added alias 10.129.131.x
  addresses on my "outbound" interface and a static route for 172.16.5.1 to my
  client system so the test packets arrived on the "inbound" interface.
  
  I set up routing and addresses on my 2 NIC test such that packets arrived
  on my test machine's eth1 NIC and were directed by ip_vs out the eth2. To
  test, all I did was throw a few UDP packets via traceroute at the address
  on the iptables/firewall mark rule so that the eth1 interface of the
  test system was the traceroute system's default gateway:
  
-   traceroute -m 2 172.16.5.1
+   traceroute -m 2 172.16.5.1
  
  Without the fix my test ip_vs system either hangs or puts out messages
  as per the attached. With our livepatch module using the above commit's
  contents, all is well. Both of the test ("real" as opposed to "virtual")
  servers configured above via ipvsadm, get packets and no errors are
  reported in the logs.
  
- Let me know of anything 

[Kernel-packages] [Bug 1819786] Re: 4.15 kernel ip_vs --ops causes performance and hang problem

2019-04-08 Thread Marc Hasson
Well, my apologies.   I retract my skepticism!   Your referenced -47
kernel above appears to have fixed the problem, while the stock -47
kernel showed the failure when I tested it this evening first (I already
had the distributed 4.15.0-47 kernel installed, then I removed it and
installed your references).

So, we will be looking forward to the official rollout of this fix.  But
even more important to us is the Xenial 4.15 kernel version, since it is
more widely distributed, as referenced in:

https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe/+bug/1817247

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1819786

Title:
  4.15 kernel ip_vs --ops causes performance and hang problem

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  On our 16.04LTS (and earlier) systems we used the ipvsadm --ops UDP
  support (one-packet scheduling) to get a better distribution amongst
  our real servers behind the load-balancer for some small subset of
  applications.

  This has worked fine through the 4.4.0-xxx kernels. But when we started
  a program to upgrade systems to use the 4.15 series of kernels to take
  advantage of new facilities, the subset of systems which used the --ops
  option ran into problems. Everything else with the 4.15 kernels appeared
  to work well.

  This issue was reported in #1817247 against 16.04LTS with the HWE 4.15 kernel
  but has not received any acknowledgement after having been reported weeks
  ago.   So we have moved on to confirm that a stock 18.04LTS system with the
  latest expected/standard 4.15 kernel also has this issue as well and report
  that here.  Perhaps this will get more attention.

  The issue appears to have been the change in the ip_vs module from using
  "atomic_*()" increment/decrement functions in the 4.4 kernel to instead
  use "refcount_*()" functions in a later kernel, including the 4.15 one
  we switched to. Unfortunately, the simple refcount_dec() function was
  inadequate, in putting out a time-consuming message and handling when
  the refcount dropped to zero, which is expected in the case of --ops
  support that retains no state post packet delivery. I will upload an
  attachment with the sample messages that get put out at packet arrival
  rate, which destroys performance of course. This test VM reports the
  identical errors we see in our production servers, but at least throwing
  only a couple of test --ops packets at it doesn't crash/hang the 18.04 system
  as it did in the 16.04 VM reported earlier.   And in production, with the
  far greater packet rates, our systems fail since the attached call backtrace
  *** appears on every packet!! ***

  This issue was apparently already recognized as an error and has appeared
  as a fix in upstream kernels. This is a reference to the 4.17 version
  of the fix that we'd like to see incorporated into the next possible
  kernel maintenance release:

  
https://github.com/torvalds/linux/commit/a050d345cef0dc6249263540da1e902bba617e43
  #diff-75923493f6e3f314b196a8223b0d6342

  We have successfully used the livepatch facility to build a livepatch .ko
  with the above diffs on our 4.15.0-36 system and successfully demonstrated
  the contrast in good/bad behavior with/without the livepatch module
  loaded. But we'd rather not have to build a version of livepatch.ko for
  each kernel maintenance release, such as the 4.5.0-46 kernel here used to
  demonstrate the issue persists in the Ubuntu mainline distro.

  The problem is easy to generate, with only a couple of packets
  and a simple configuration. Here's a very basic test (addresses
  rewritten/obscured) version of an example configuration for 2 servers
  that worked on my test VM:

  ipvsadm -A -f 100 -s rr --ops
  ipvsadm -a -f 100 -r 10.129.131.227:0 -g -w 
  ipvsadm -a -f 100 -r 10.129.131.228:0 -g -w 
  iptables -t mangle -A PREROUTING -d 172.16.5.1/32 -j MARK --set-xmark 
0x64/0x
  ifconfig lo:0 172.16.5.1/32 up

  Routing and addressing to achieve the above, or adaptation for one's
  own test environment, is left to the tester.  I just added alias 10.129.131.x
  addresses on my "outbound" interface and a static route for 172.16.5.1 to my
  client system so the test packets arrived on the "inbound" interface.

  I set up routing and addresses on my 2 NIC test such that packets arrived
  on my test machine's eth1 NIC and were directed by ip_vs out the eth2. To
  test, all I did was throw a few UDP packets via traceroute at the address
  on the iptables/firewall mark rule so that the eth1 interface of the
  test system was the traceroute system's default gateway:

traceroute -m 2 172.16.5.1

  Without the fix my test ip_vs system either hangs or puts out messages
  as per the attached. With our livepatch module using the above commit's
  contents, all is well. Both of the test ("real" as opposed to "virtual")
  

[Kernel-packages] [Bug 1819786] Re: 4.15 kernel ip_vs --ops causes performance and hang problem

2019-04-08 Thread Marc Hasson
Thanks for getting back to me on this.  Sorry for the slow response, did not 
get (or see?) an email notification about an update.

I'll try the -47 kernel you referenced but I'm skeptical since the
failure occurs on the -46 and the -47 doesn't show any changelog for the
ipvs refcount issue.  Nor does the -47 source I downloaded from Ubuntu
show the fix.

But on the off-chance that you're just asking to confirm to test the
latest, or that you included the fix below in the kernel you referenced
even though its named identically to the kernel we received last month,
I'll give it a try and let you know.

Again, for reference, the upstream fix/patch we are requesting for both
this launchpad report as well as
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe/+bug/1817247
is as follows:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a050d345cef0dc6249263540da1e902bba617e43


I'll let you know the results of testing this latest -47.   And I presume I'll 
have the -48 shortly as well, that one's changelog also gives no indication of 
having the fix yet.

Thanks.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1819786

Title:
  4.15 kernel ip_vs --ops causes performance and hang problem

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  On our 16.04LTS (and earlier) systems we used the ipvsadm --ops UDP
  support (one-packet scheduling) to get a better distribution amongst
  our real servers behind the load-balancer for some small subset of
  applications.

  This has worked fine through the 4.4.0-xxx kernels. But when we started
  a program to upgrade systems to use the 4.15 series of kernels to take
  advantage of new facilities, the subset of systems which used the --ops
  option ran into problems. Everything else with the 4.15 kernels appeared
  to work well.

  This issue was reported in #1817247 against 16.04LTS with the HWE 4.15 kernel
  but has not received any acknowledgement after having been reported weeks
  ago.   So we have moved on to confirm that a stock 18.04LTS system with the
  latest expected/standard 4.15 kernel also has this issue as well and report
  that here.  Perhaps this will get more attention.

  The issue appears to have been the change in the ip_vs module from using
  "atomic_*()" increment/decrement functions in the 4.4 kernel to instead
  use "refcount_*()" functions in a later kernel, including the 4.15 one
  we switched to. Unfortunately, the simple refcount_dec() function was
  inadequate, in putting out a time-consuming message and handling when
  the refcount dropped to zero, which is expected in the case of --ops
  support that retains no state post packet delivery. I will upload an
  attachment with the sample messages that get put out at packet arrival
  rate, which destroys performance of course. This test VM reports the
  identical errors we see in our production servers, but at least throwing
  only a couple of test --ops packets at it doesn't crash/hang the 18.04 system
  as it did in the 16.04 VM reported earlier.   And in production, with the
  far greater packet rates, our systems fail since the attached call backtrace
  *** appears on every packet!! ***

  This issue was apparently already recognized as an error and has appeared
  as a fix in upstream kernels. This is a reference to the 4.17 version
  of the fix that we'd like to see incorporated into the next possible
  kernel maintenance release:

  
https://github.com/torvalds/linux/commit/a050d345cef0dc6249263540da1e902bba617e43
  #diff-75923493f6e3f314b196a8223b0d6342

  We have successfully used the livepatch facility to build a livepatch .ko
  with the above diffs on our 4.15.0-36 system and successfully demonstrated
  the contrast in good/bad behavior with/without the livepatch module
  loaded. But we'd rather not have to build a version of livepatch.ko for
  each kernel maintenance release, such as the 4.5.0-46 kernel here used to
  demonstrate the issue persists in the Ubuntu mainline distro.

  The problem is easy to generate, with only a couple of packets
  and a simple configuration. Here's a very basic test (addresses
  rewritten/obscured) version of an example configuration for 2 servers
  that worked on my test VM:

  ipvsadm -A -f 100 -s rr --ops
  ipvsadm -a -f 100 -r 10.129.131.227:0 -g -w 
  ipvsadm -a -f 100 -r 10.129.131.228:0 -g -w 
  iptables -t mangle -A PREROUTING -d 172.16.5.1/32 -j MARK --set-xmark 
0x64/0x
  ifconfig lo:0 172.16.5.1/32 up

  Routing and addressing to achieve the above, or adaptation for one's
  own test environment, is left to the tester.  I just added alias 10.129.131.x
  addresses on my "outbound" interface and a static route for 172.16.5.1 to my
  client system so the test packets arrived on the "inbound" interface.

  I set up routing and addresses on my 2 NIC test such that packets 

[Kernel-packages] [Bug 1819786] Re: 4.15 kernel ip_vs --ops causes performance and hang problem

2019-04-02 Thread Kai-Heng Feng
Please test this kernel:
https://people.canonical.com/~khfeng/lp1819786/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1819786

Title:
  4.15 kernel ip_vs --ops causes performance and hang problem

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  On our 16.04LTS (and earlier) systems we used the ipvsadm --ops UDP
  support (one-packet scheduling) to get a better distribution amongst
  our real servers behind the load-balancer for some small subset of
  applications.

  This has worked fine through the 4.4.0-xxx kernels. But when we started
  a program to upgrade systems to use the 4.15 series of kernels to take
  advantage of new facilities, the subset of systems which used the --ops
  option ran into problems. Everything else with the 4.15 kernels appeared
  to work well.

  This issue was reported in #1817247 against 16.04LTS with the HWE 4.15 kernel
  but has not received any acknowledgement after having been reported weeks
  ago.   So we have moved on to confirm that a stock 18.04LTS system with the
  latest expected/standard 4.15 kernel also has this issue as well and report
  that here.  Perhaps this will get more attention.

  The issue appears to have been the change in the ip_vs module from using
  "atomic_*()" increment/decrement functions in the 4.4 kernel to instead
  use "refcount_*()" functions in a later kernel, including the 4.15 one
  we switched to. Unfortunately, the simple refcount_dec() function was
  inadequate, in putting out a time-consuming message and handling when
  the refcount dropped to zero, which is expected in the case of --ops
  support that retains no state post packet delivery. I will upload an
  attachment with the sample messages that get put out at packet arrival
  rate, which destroys performance of course. This test VM reports the
  identical errors we see in our production servers, but at least throwing
  only a couple of test --ops packets at it doesn't crash/hang the 18.04 system
  as it did in the 16.04 VM reported earlier.   And in production, with the
  far greater packet rates, our systems fail since the attached call backtrace
  *** appears on every packet!! ***

  This issue was apparently already recognized as an error and has appeared
  as a fix in upstream kernels. This is a reference to the 4.17 version
  of the fix that we'd like to see incorporated into the next possible
  kernel maintenance release:

  
https://github.com/torvalds/linux/commit/a050d345cef0dc6249263540da1e902bba617e43
  #diff-75923493f6e3f314b196a8223b0d6342

  We have successfully used the livepatch facility to build a livepatch .ko
  with the above diffs on our 4.15.0-36 system and successfully demonstrated
  the contrast in good/bad behavior with/without the livepatch module
  loaded. But we'd rather not have to build a version of livepatch.ko for
  each kernel maintenance release, such as the 4.5.0-46 kernel here used to
  demonstrate the issue persists in the Ubuntu mainline distro.

  The problem is easy to generate, with only a couple of packets
  and a simple configuration. Here's a very basic test (addresses
  rewritten/obscured) version of an example configuration for 2 servers
  that worked on my test VM:

  ipvsadm -A -f 100 -s rr --ops
  ipvsadm -a -f 100 -r 10.129.131.227:0 -g -w 
  ipvsadm -a -f 100 -r 10.129.131.228:0 -g -w 
  iptables -t mangle -A PREROUTING -d 172.16.5.1/32 -j MARK --set-xmark 
0x64/0x
  ifconfig lo:0 172.16.5.1/32 up

  Routing and addressing to achieve the above, or adaptation for one's
  own test environment, is left to the tester.  I just added alias 10.129.131.x
  addresses on my "outbound" interface and a static route for 172.16.5.1 to my
  client system so the test packets arrived on the "inbound" interface.

  I set up routing and addresses on my 2 NIC test such that packets arrived
  on my test machine's eth1 NIC and were directed by ip_vs out the eth2. To
  test, all I did was throw a few UDP packets via traceroute at the address
  on the iptables/firewall mark rule so that the eth1 interface of the
  test system was the traceroute system's default gateway:

traceroute -m 2 172.16.5.1

  Without the fix my test ip_vs system either hangs or puts out messages
  as per the attached. With our livepatch module using the above commit's
  contents, all is well. Both of the test ("real" as opposed to "virtual")
  servers configured above via ipvsadm, get packets and no errors are
  reported in the logs.

  Let me know of anything I can do to help accelerate addressing of this 
  issue or understanding. It seems that the fix incorporation is fairly
  straightforward, and is a performance disaster without it for anyone
  using the --ops facility to any significant degree.

  Thanks!

  $ lsb_release -rd
  Description:Ubuntu 18.04.2 LTS
  Release:18.04
  $ uname -a
  Linux direct-18-04