[Sts-sponsors] [Bug 1663280] Re: Serious performance degradation of math functions

2019-02-13 Thread Dan Streetman
dpkg -l | grep libc6:amd64
ii  libc6:amd64  2.23-0ubuntu10 
amd64GNU C Library: Shared libraries

$ time ./exp 
127781126.100057

real0m3.334s
user0m3.336s
sys 0m0.000s
$ time LD_BIND_NOW=1 ./exp
127781126.100057

real0m0.710s
user0m0.708s
sys 0m0.000s


$ dpkg -l | grep libc6:amd64
ii  libc6:amd64  2.23-0ubuntu11 
amd64GNU C Library: Shared libraries

$ time ./exp 
127781126.100057

real0m0.709s
user0m0.708s
sys 0m0.000s
$ time LD_BIND_NOW=1 ./exp
127781126.100057

real0m0.714s
user0m0.712s
sys 0m0.004s


** Tags removed: verification-needed verification-needed-xenial yakkety zesty
** Tags added: verification-done verification-done-xenial

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1663280

Title:
  Serious performance degradation of math functions

Status in GLibC:
  Fix Released
Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Xenial:
  Fix Committed
Status in glibc source package in Zesty:
  Won't Fix
Status in glibc package in Fedora:
  Fix Released

Bug description:
  SRU Justification
  =

  [Impact]

   * Severe performance hit on many maths-heavy workloads. For example,
  a user reports linpack performance of 13 Gflops on Trusty and Bionic
  and 3.9 Gflops on Xenial.

   * Because the impact is so large (>3x) and Xenial is supported until
  2021, the fix should be backported.

   * The fix avoids an AVX-SSE transition penalty. It stops
  _dl_runtime_resolve() from using AVX-256 instructions which touch the
  upper halves of various registers. This change means that the
  processor does not need to save and restore them.

  [Test Case]

  Firstly, you need a suitable Intel machine. Users report that Sandy
  Bridge, Ivy Bridge, Haswell, and Broadwell CPUs are affected, and I
  have been able to reproduce it on a Skylake CPU using a suitable Azure
  VM.

  Create the following C file, exp.c:

  #include 
  #include 

  int main () {
double a, b;
for (a = b = 0.0; b < 2.0; b += 0.0005) a += exp(b);
printf("%f\n", a);
return 0;
  }

  $ gcc -O3 -march=x86-64 -o exp exp.c -lm

  With the current version of glibc:

  $ time ./exp
  ...
  real0m1.349s
  user0m1.349s

  
  $ time LD_BIND_NOW=1 ./exp
  ...
  real0m0.625s
  user0m0.621s

  Observe that LD_BIND_NOW makes a big difference as it avoids the call
  to _dl_runtime_resolve.

  With the proposed update:

  $ time ./exp
  ...
  real0m0.625s
  user0m0.621s

  
  $ time LD_BIND_NOW=1 ./exp
  ...

  real0m0.631s
  user0m0.631s

  Observe that the normal case is faster, and LD_BIND_NOW makes a
  negligible difference.

  [Regression Potential]

  glibc is the nightmare case for regressions as could affect pretty much
  anything, and this patch touches a key part (dynamic libraries).

  We can be fairly confident in the fix generally - it's in the glibc in
  Bionic, Debian and some RPM-based distros. The backport is based on
  the patches in the release/2.23/master branch in the upstream glibc
  repository, and the backport was straightforward.

  Obviously that doesn't remove all risk. There is also a fair bit of
  Ubuntu-specific patching in glibc so other distros are of limited
  value for ruling out bugs. So I have done the following testing, and
  I'm happy to do more as required. All testing has been done:
   - on an Azure VM (affected by the change), with proposed package
   - on a local VM (not affected by the change), with proposed package

   * Boot with the upgraded libc6.

   * Watch a youtube video in Firefox over VNC.

   * Build some C code (debuild of zlib).

   * Test Java by installing and running Eclipse.

  Autopkgtest also passes.

  [Original Description]

  Bug [0] has been introduced in Glibc 2.23 [1] and fixed in Glibc 2.25
  [2]. All Ubuntu versions starting from 16.04 are affected because they
  use either Glibc 2.23 or 2.24. Bug introduces serious (2x-4x)
  performance degradation of math functions (pow, exp/exp2/exp10,
  log/log2/log10, sin/cos/sincos/tan, asin/acos/atan/atan2,
  sinh/cosh/tanh, asinh/acosh/atanh) provided by libm. Bug can be
  reproduced on any AVX-capable x86-64 machine.

  @strikov: According to a quite reliable source [5] all AMD CPUs and
  latest Intel CPUs (Skylake and Knights Landing) don't suffer from
  AVX/SSE transition penalty. It means that the scope of this bug
  becomes smaller and includes only the following generations of Intel's
  CPUs: Sandy Bridge, Ivy Bridge, Haswell, and Broadwell. Scope still
  remains quite large though.

  @strikov: Ubuntu 16.10/17.04 which use Glibc 2.24 may recieve the fix
  from upstream 2.24 branch (as Marcel pointed out, fix has been
  backported to 2.24 branch where Fedora took it successfully) if 

[Sts-sponsors] [Bug 1663280] Re: Serious performance degradation of math functions

2019-02-13 Thread Dan Streetman
pdns-recursor (s390x/i386): test fails the same way for only these 2
archs, on yakkety; failure unrelated to this sru, ignore.

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1663280

Title:
  Serious performance degradation of math functions

Status in GLibC:
  Fix Released
Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Xenial:
  Fix Committed
Status in glibc source package in Zesty:
  Won't Fix
Status in glibc package in Fedora:
  Fix Released

Bug description:
  SRU Justification
  =

  [Impact]

   * Severe performance hit on many maths-heavy workloads. For example,
  a user reports linpack performance of 13 Gflops on Trusty and Bionic
  and 3.9 Gflops on Xenial.

   * Because the impact is so large (>3x) and Xenial is supported until
  2021, the fix should be backported.

   * The fix avoids an AVX-SSE transition penalty. It stops
  _dl_runtime_resolve() from using AVX-256 instructions which touch the
  upper halves of various registers. This change means that the
  processor does not need to save and restore them.

  [Test Case]

  Firstly, you need a suitable Intel machine. Users report that Sandy
  Bridge, Ivy Bridge, Haswell, and Broadwell CPUs are affected, and I
  have been able to reproduce it on a Skylake CPU using a suitable Azure
  VM.

  Create the following C file, exp.c:

  #include 
  #include 

  int main () {
double a, b;
for (a = b = 0.0; b < 2.0; b += 0.0005) a += exp(b);
printf("%f\n", a);
return 0;
  }

  $ gcc -O3 -march=x86-64 -o exp exp.c -lm

  With the current version of glibc:

  $ time ./exp
  ...
  real0m1.349s
  user0m1.349s

  
  $ time LD_BIND_NOW=1 ./exp
  ...
  real0m0.625s
  user0m0.621s

  Observe that LD_BIND_NOW makes a big difference as it avoids the call
  to _dl_runtime_resolve.

  With the proposed update:

  $ time ./exp
  ...
  real0m0.625s
  user0m0.621s

  
  $ time LD_BIND_NOW=1 ./exp
  ...

  real0m0.631s
  user0m0.631s

  Observe that the normal case is faster, and LD_BIND_NOW makes a
  negligible difference.

  [Regression Potential]

  glibc is the nightmare case for regressions as could affect pretty much
  anything, and this patch touches a key part (dynamic libraries).

  We can be fairly confident in the fix generally - it's in the glibc in
  Bionic, Debian and some RPM-based distros. The backport is based on
  the patches in the release/2.23/master branch in the upstream glibc
  repository, and the backport was straightforward.

  Obviously that doesn't remove all risk. There is also a fair bit of
  Ubuntu-specific patching in glibc so other distros are of limited
  value for ruling out bugs. So I have done the following testing, and
  I'm happy to do more as required. All testing has been done:
   - on an Azure VM (affected by the change), with proposed package
   - on a local VM (not affected by the change), with proposed package

   * Boot with the upgraded libc6.

   * Watch a youtube video in Firefox over VNC.

   * Build some C code (debuild of zlib).

   * Test Java by installing and running Eclipse.

  Autopkgtest also passes.

  [Original Description]

  Bug [0] has been introduced in Glibc 2.23 [1] and fixed in Glibc 2.25
  [2]. All Ubuntu versions starting from 16.04 are affected because they
  use either Glibc 2.23 or 2.24. Bug introduces serious (2x-4x)
  performance degradation of math functions (pow, exp/exp2/exp10,
  log/log2/log10, sin/cos/sincos/tan, asin/acos/atan/atan2,
  sinh/cosh/tanh, asinh/acosh/atanh) provided by libm. Bug can be
  reproduced on any AVX-capable x86-64 machine.

  @strikov: According to a quite reliable source [5] all AMD CPUs and
  latest Intel CPUs (Skylake and Knights Landing) don't suffer from
  AVX/SSE transition penalty. It means that the scope of this bug
  becomes smaller and includes only the following generations of Intel's
  CPUs: Sandy Bridge, Ivy Bridge, Haswell, and Broadwell. Scope still
  remains quite large though.

  @strikov: Ubuntu 16.10/17.04 which use Glibc 2.24 may recieve the fix
  from upstream 2.24 branch (as Marcel pointed out, fix has been
  backported to 2.24 branch where Fedora took it successfully) if such
  synchronization will take place. Ubuntu 16.04 (the main target of this
  bug) uses Glibc 2.23 which hasn't been patched upstream and will
  suffer from performance degradation until we fix it manually.

  This bug is all about AVX-SSE transition penalty [3]. 256-bit YMM
  registers used by AVX-256 instructions extend 128-bit registers used
  by SSE (XMM0 is a low half of YMM0 and so on). Every time CPU executes
  SSE instruction after AVX-256 instruction it has to store upper half
  of the YMM register to the internal buffer and then restore it when
  execution returns back to AVX instructions. Store/restore is required
  because old-fashioned SSE knows nothing 

[Sts-sponsors] [Bug 1663280] Re: Serious performance degradation of math functions

2019-02-13 Thread Dan Streetman
mercurial (all archs): fails since security update, verified fails in
local test run, test failure introduced by security update.  ignore.

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1663280

Title:
  Serious performance degradation of math functions

Status in GLibC:
  Fix Released
Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Xenial:
  Fix Committed
Status in glibc source package in Zesty:
  Won't Fix
Status in glibc package in Fedora:
  Fix Released

Bug description:
  SRU Justification
  =

  [Impact]

   * Severe performance hit on many maths-heavy workloads. For example,
  a user reports linpack performance of 13 Gflops on Trusty and Bionic
  and 3.9 Gflops on Xenial.

   * Because the impact is so large (>3x) and Xenial is supported until
  2021, the fix should be backported.

   * The fix avoids an AVX-SSE transition penalty. It stops
  _dl_runtime_resolve() from using AVX-256 instructions which touch the
  upper halves of various registers. This change means that the
  processor does not need to save and restore them.

  [Test Case]

  Firstly, you need a suitable Intel machine. Users report that Sandy
  Bridge, Ivy Bridge, Haswell, and Broadwell CPUs are affected, and I
  have been able to reproduce it on a Skylake CPU using a suitable Azure
  VM.

  Create the following C file, exp.c:

  #include 
  #include 

  int main () {
double a, b;
for (a = b = 0.0; b < 2.0; b += 0.0005) a += exp(b);
printf("%f\n", a);
return 0;
  }

  $ gcc -O3 -march=x86-64 -o exp exp.c -lm

  With the current version of glibc:

  $ time ./exp
  ...
  real0m1.349s
  user0m1.349s

  
  $ time LD_BIND_NOW=1 ./exp
  ...
  real0m0.625s
  user0m0.621s

  Observe that LD_BIND_NOW makes a big difference as it avoids the call
  to _dl_runtime_resolve.

  With the proposed update:

  $ time ./exp
  ...
  real0m0.625s
  user0m0.621s

  
  $ time LD_BIND_NOW=1 ./exp
  ...

  real0m0.631s
  user0m0.631s

  Observe that the normal case is faster, and LD_BIND_NOW makes a
  negligible difference.

  [Regression Potential]

  glibc is the nightmare case for regressions as could affect pretty much
  anything, and this patch touches a key part (dynamic libraries).

  We can be fairly confident in the fix generally - it's in the glibc in
  Bionic, Debian and some RPM-based distros. The backport is based on
  the patches in the release/2.23/master branch in the upstream glibc
  repository, and the backport was straightforward.

  Obviously that doesn't remove all risk. There is also a fair bit of
  Ubuntu-specific patching in glibc so other distros are of limited
  value for ruling out bugs. So I have done the following testing, and
  I'm happy to do more as required. All testing has been done:
   - on an Azure VM (affected by the change), with proposed package
   - on a local VM (not affected by the change), with proposed package

   * Boot with the upgraded libc6.

   * Watch a youtube video in Firefox over VNC.

   * Build some C code (debuild of zlib).

   * Test Java by installing and running Eclipse.

  Autopkgtest also passes.

  [Original Description]

  Bug [0] has been introduced in Glibc 2.23 [1] and fixed in Glibc 2.25
  [2]. All Ubuntu versions starting from 16.04 are affected because they
  use either Glibc 2.23 or 2.24. Bug introduces serious (2x-4x)
  performance degradation of math functions (pow, exp/exp2/exp10,
  log/log2/log10, sin/cos/sincos/tan, asin/acos/atan/atan2,
  sinh/cosh/tanh, asinh/acosh/atanh) provided by libm. Bug can be
  reproduced on any AVX-capable x86-64 machine.

  @strikov: According to a quite reliable source [5] all AMD CPUs and
  latest Intel CPUs (Skylake and Knights Landing) don't suffer from
  AVX/SSE transition penalty. It means that the scope of this bug
  becomes smaller and includes only the following generations of Intel's
  CPUs: Sandy Bridge, Ivy Bridge, Haswell, and Broadwell. Scope still
  remains quite large though.

  @strikov: Ubuntu 16.10/17.04 which use Glibc 2.24 may recieve the fix
  from upstream 2.24 branch (as Marcel pointed out, fix has been
  backported to 2.24 branch where Fedora took it successfully) if such
  synchronization will take place. Ubuntu 16.04 (the main target of this
  bug) uses Glibc 2.23 which hasn't been patched upstream and will
  suffer from performance degradation until we fix it manually.

  This bug is all about AVX-SSE transition penalty [3]. 256-bit YMM
  registers used by AVX-256 instructions extend 128-bit registers used
  by SSE (XMM0 is a low half of YMM0 and so on). Every time CPU executes
  SSE instruction after AVX-256 instruction it has to store upper half
  of the YMM register to the internal buffer and then restore it when
  execution returns back to AVX instructions. Store/restore is required
  because old-fashioned SSE knows 

[Sts-sponsors] [Bug 1663280] Re: Serious performance degradation of math functions

2019-02-13 Thread Dan Streetman
all autopkgtest failures should be ignored based on above comments.

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1663280

Title:
  Serious performance degradation of math functions

Status in GLibC:
  Fix Released
Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Xenial:
  Fix Committed
Status in glibc source package in Zesty:
  Won't Fix
Status in glibc package in Fedora:
  Fix Released

Bug description:
  SRU Justification
  =

  [Impact]

   * Severe performance hit on many maths-heavy workloads. For example,
  a user reports linpack performance of 13 Gflops on Trusty and Bionic
  and 3.9 Gflops on Xenial.

   * Because the impact is so large (>3x) and Xenial is supported until
  2021, the fix should be backported.

   * The fix avoids an AVX-SSE transition penalty. It stops
  _dl_runtime_resolve() from using AVX-256 instructions which touch the
  upper halves of various registers. This change means that the
  processor does not need to save and restore them.

  [Test Case]

  Firstly, you need a suitable Intel machine. Users report that Sandy
  Bridge, Ivy Bridge, Haswell, and Broadwell CPUs are affected, and I
  have been able to reproduce it on a Skylake CPU using a suitable Azure
  VM.

  Create the following C file, exp.c:

  #include 
  #include 

  int main () {
double a, b;
for (a = b = 0.0; b < 2.0; b += 0.0005) a += exp(b);
printf("%f\n", a);
return 0;
  }

  $ gcc -O3 -march=x86-64 -o exp exp.c -lm

  With the current version of glibc:

  $ time ./exp
  ...
  real0m1.349s
  user0m1.349s

  
  $ time LD_BIND_NOW=1 ./exp
  ...
  real0m0.625s
  user0m0.621s

  Observe that LD_BIND_NOW makes a big difference as it avoids the call
  to _dl_runtime_resolve.

  With the proposed update:

  $ time ./exp
  ...
  real0m0.625s
  user0m0.621s

  
  $ time LD_BIND_NOW=1 ./exp
  ...

  real0m0.631s
  user0m0.631s

  Observe that the normal case is faster, and LD_BIND_NOW makes a
  negligible difference.

  [Regression Potential]

  glibc is the nightmare case for regressions as could affect pretty much
  anything, and this patch touches a key part (dynamic libraries).

  We can be fairly confident in the fix generally - it's in the glibc in
  Bionic, Debian and some RPM-based distros. The backport is based on
  the patches in the release/2.23/master branch in the upstream glibc
  repository, and the backport was straightforward.

  Obviously that doesn't remove all risk. There is also a fair bit of
  Ubuntu-specific patching in glibc so other distros are of limited
  value for ruling out bugs. So I have done the following testing, and
  I'm happy to do more as required. All testing has been done:
   - on an Azure VM (affected by the change), with proposed package
   - on a local VM (not affected by the change), with proposed package

   * Boot with the upgraded libc6.

   * Watch a youtube video in Firefox over VNC.

   * Build some C code (debuild of zlib).

   * Test Java by installing and running Eclipse.

  Autopkgtest also passes.

  [Original Description]

  Bug [0] has been introduced in Glibc 2.23 [1] and fixed in Glibc 2.25
  [2]. All Ubuntu versions starting from 16.04 are affected because they
  use either Glibc 2.23 or 2.24. Bug introduces serious (2x-4x)
  performance degradation of math functions (pow, exp/exp2/exp10,
  log/log2/log10, sin/cos/sincos/tan, asin/acos/atan/atan2,
  sinh/cosh/tanh, asinh/acosh/atanh) provided by libm. Bug can be
  reproduced on any AVX-capable x86-64 machine.

  @strikov: According to a quite reliable source [5] all AMD CPUs and
  latest Intel CPUs (Skylake and Knights Landing) don't suffer from
  AVX/SSE transition penalty. It means that the scope of this bug
  becomes smaller and includes only the following generations of Intel's
  CPUs: Sandy Bridge, Ivy Bridge, Haswell, and Broadwell. Scope still
  remains quite large though.

  @strikov: Ubuntu 16.10/17.04 which use Glibc 2.24 may recieve the fix
  from upstream 2.24 branch (as Marcel pointed out, fix has been
  backported to 2.24 branch where Fedora took it successfully) if such
  synchronization will take place. Ubuntu 16.04 (the main target of this
  bug) uses Glibc 2.23 which hasn't been patched upstream and will
  suffer from performance degradation until we fix it manually.

  This bug is all about AVX-SSE transition penalty [3]. 256-bit YMM
  registers used by AVX-256 instructions extend 128-bit registers used
  by SSE (XMM0 is a low half of YMM0 and so on). Every time CPU executes
  SSE instruction after AVX-256 instruction it has to store upper half
  of the YMM register to the internal buffer and then restore it when
  execution returns back to AVX instructions. Store/restore is required
  because old-fashioned SSE knows nothing about the upper halves of its
  registers and may damage them. 

[Sts-sponsors] [Bug 1663280] Re: Serious performance degradation of math functions

2019-02-13 Thread Dan Streetman
ruby2.3/s390x: test fails on all archs, but hinted always fails on other
archs.  should be hinted always fails on s390x as well.  ignore.

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1663280

Title:
  Serious performance degradation of math functions

Status in GLibC:
  Fix Released
Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Xenial:
  Fix Committed
Status in glibc source package in Zesty:
  Won't Fix
Status in glibc package in Fedora:
  Fix Released

Bug description:
  SRU Justification
  =

  [Impact]

   * Severe performance hit on many maths-heavy workloads. For example,
  a user reports linpack performance of 13 Gflops on Trusty and Bionic
  and 3.9 Gflops on Xenial.

   * Because the impact is so large (>3x) and Xenial is supported until
  2021, the fix should be backported.

   * The fix avoids an AVX-SSE transition penalty. It stops
  _dl_runtime_resolve() from using AVX-256 instructions which touch the
  upper halves of various registers. This change means that the
  processor does not need to save and restore them.

  [Test Case]

  Firstly, you need a suitable Intel machine. Users report that Sandy
  Bridge, Ivy Bridge, Haswell, and Broadwell CPUs are affected, and I
  have been able to reproduce it on a Skylake CPU using a suitable Azure
  VM.

  Create the following C file, exp.c:

  #include 
  #include 

  int main () {
double a, b;
for (a = b = 0.0; b < 2.0; b += 0.0005) a += exp(b);
printf("%f\n", a);
return 0;
  }

  $ gcc -O3 -march=x86-64 -o exp exp.c -lm

  With the current version of glibc:

  $ time ./exp
  ...
  real0m1.349s
  user0m1.349s

  
  $ time LD_BIND_NOW=1 ./exp
  ...
  real0m0.625s
  user0m0.621s

  Observe that LD_BIND_NOW makes a big difference as it avoids the call
  to _dl_runtime_resolve.

  With the proposed update:

  $ time ./exp
  ...
  real0m0.625s
  user0m0.621s

  
  $ time LD_BIND_NOW=1 ./exp
  ...

  real0m0.631s
  user0m0.631s

  Observe that the normal case is faster, and LD_BIND_NOW makes a
  negligible difference.

  [Regression Potential]

  glibc is the nightmare case for regressions as could affect pretty much
  anything, and this patch touches a key part (dynamic libraries).

  We can be fairly confident in the fix generally - it's in the glibc in
  Bionic, Debian and some RPM-based distros. The backport is based on
  the patches in the release/2.23/master branch in the upstream glibc
  repository, and the backport was straightforward.

  Obviously that doesn't remove all risk. There is also a fair bit of
  Ubuntu-specific patching in glibc so other distros are of limited
  value for ruling out bugs. So I have done the following testing, and
  I'm happy to do more as required. All testing has been done:
   - on an Azure VM (affected by the change), with proposed package
   - on a local VM (not affected by the change), with proposed package

   * Boot with the upgraded libc6.

   * Watch a youtube video in Firefox over VNC.

   * Build some C code (debuild of zlib).

   * Test Java by installing and running Eclipse.

  Autopkgtest also passes.

  [Original Description]

  Bug [0] has been introduced in Glibc 2.23 [1] and fixed in Glibc 2.25
  [2]. All Ubuntu versions starting from 16.04 are affected because they
  use either Glibc 2.23 or 2.24. Bug introduces serious (2x-4x)
  performance degradation of math functions (pow, exp/exp2/exp10,
  log/log2/log10, sin/cos/sincos/tan, asin/acos/atan/atan2,
  sinh/cosh/tanh, asinh/acosh/atanh) provided by libm. Bug can be
  reproduced on any AVX-capable x86-64 machine.

  @strikov: According to a quite reliable source [5] all AMD CPUs and
  latest Intel CPUs (Skylake and Knights Landing) don't suffer from
  AVX/SSE transition penalty. It means that the scope of this bug
  becomes smaller and includes only the following generations of Intel's
  CPUs: Sandy Bridge, Ivy Bridge, Haswell, and Broadwell. Scope still
  remains quite large though.

  @strikov: Ubuntu 16.10/17.04 which use Glibc 2.24 may recieve the fix
  from upstream 2.24 branch (as Marcel pointed out, fix has been
  backported to 2.24 branch where Fedora took it successfully) if such
  synchronization will take place. Ubuntu 16.04 (the main target of this
  bug) uses Glibc 2.23 which hasn't been patched upstream and will
  suffer from performance degradation until we fix it manually.

  This bug is all about AVX-SSE transition penalty [3]. 256-bit YMM
  registers used by AVX-256 instructions extend 128-bit registers used
  by SSE (XMM0 is a low half of YMM0 and so on). Every time CPU executes
  SSE instruction after AVX-256 instruction it has to store upper half
  of the YMM register to the internal buffer and then restore it when
  execution returns back to AVX instructions. Store/restore is required
  because old-fashioned SSE knows 

[Sts-sponsors] [Bug 1663280] Re: Serious performance degradation of math functions

2019-02-13 Thread Dan Streetman
autopkgtest regressions that should be ignored:

node-srs (all archs): test has not been run in 3 years, and fails in
local build with existing glibc; failure unrelated to this sru.  ignore.

bzr (all archs): test failed for last 2 years, since last bzr pkg
update; fails on local system with current glibc; ignore.

systemd/amd64: 2 failures:
  subprocess.CalledProcessError: Command '['modprobe', 'scsi_debug']' returned 
non-zero exit status 1
  logind   FAIL stderr: grep: /sys/power/state: No such file or 
directory

both unrelated to glibc (likely a change in the kernel api and/or pkging
introduced this test regression), ignore.

systemd/s390x: test always failed.  ignore.

linux-oracle/amd64: test fails consistently, ignore.

node-ws (all archs): test has not been run in 3 years, and fails in
local build with existing glibc; failure unrelated to this sru.  ignore.

fpc (i386/armhf): test always failed, ignore.

libreoffice/i386: test always failed, ignore.

apt (all archs): existing apt test bug, ignore:
  https://bugs.launchpad.net/ubuntu/+source/apt/+bug/1815750

nvidia-graphics-drivers-340/armhf: test always failed, ignore.

ruby-xmlparser (all archs): test failed for 3 years, fails locally with
current glibc; ignore.

linux (ppc64el/i386): tests flaky, fail more than succeed; ignore.

gearmand/armhf: test failed for last year, ignore.

node-groove (all archs): test has not been run in 3 years, and fails in
local build with existing glibc; failure unrelated to this sru.  ignore.

iscsitarget/armhf: test always failed, ignore.

libnih/armhf: test blacklisted, ignore.

nplan (all archs): test flaky, almost always fails; ignore.

node-leveldown (all archs): test has not been run in 3 years, and fails
in local build with existing glibc; failure unrelated to this sru.
ignore.

snapcraft: failed since pkg last updated, unrelated to this sru, ignore.

r-bioc-genomicalignments/s390x: test not run in 3 years, fails locally
with current glibc; ignore.

ruby-nokogiri (all archs): test not run for 3 years, fails locally with
current glibc; ignore.

ipset (all archs): test not run for 3 years, fails locally with current
glibc; ignore.

snapd (all archs): test flaky, almost always fails, ignore.

dadhi-linux/s390x: test always failed, ignore.


still reviewing:

mercurial (all archs)

ruby2.3/s390x

pdns-recursor (s390x/i386)

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1663280

Title:
  Serious performance degradation of math functions

Status in GLibC:
  Fix Released
Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Xenial:
  Fix Committed
Status in glibc source package in Zesty:
  Won't Fix
Status in glibc package in Fedora:
  Fix Released

Bug description:
  SRU Justification
  =

  [Impact]

   * Severe performance hit on many maths-heavy workloads. For example,
  a user reports linpack performance of 13 Gflops on Trusty and Bionic
  and 3.9 Gflops on Xenial.

   * Because the impact is so large (>3x) and Xenial is supported until
  2021, the fix should be backported.

   * The fix avoids an AVX-SSE transition penalty. It stops
  _dl_runtime_resolve() from using AVX-256 instructions which touch the
  upper halves of various registers. This change means that the
  processor does not need to save and restore them.

  [Test Case]

  Firstly, you need a suitable Intel machine. Users report that Sandy
  Bridge, Ivy Bridge, Haswell, and Broadwell CPUs are affected, and I
  have been able to reproduce it on a Skylake CPU using a suitable Azure
  VM.

  Create the following C file, exp.c:

  #include 
  #include 

  int main () {
double a, b;
for (a = b = 0.0; b < 2.0; b += 0.0005) a += exp(b);
printf("%f\n", a);
return 0;
  }

  $ gcc -O3 -march=x86-64 -o exp exp.c -lm

  With the current version of glibc:

  $ time ./exp
  ...
  real0m1.349s
  user0m1.349s

  
  $ time LD_BIND_NOW=1 ./exp
  ...
  real0m0.625s
  user0m0.621s

  Observe that LD_BIND_NOW makes a big difference as it avoids the call
  to _dl_runtime_resolve.

  With the proposed update:

  $ time ./exp
  ...
  real0m0.625s
  user0m0.621s

  
  $ time LD_BIND_NOW=1 ./exp
  ...

  real0m0.631s
  user0m0.631s

  Observe that the normal case is faster, and LD_BIND_NOW makes a
  negligible difference.

  [Regression Potential]

  glibc is the nightmare case for regressions as could affect pretty much
  anything, and this patch touches a key part (dynamic libraries).

  We can be fairly confident in the fix generally - it's in the glibc in
  Bionic, Debian and some RPM-based distros. The backport is based on
  the patches in the release/2.23/master branch in the upstream glibc
  repository, and the backport was straightforward.

  Obviously that doesn't remove all risk. There is also a fair bit of
  Ubuntu-specific patching in glibc so 

[Sts-sponsors] [Bug 1663280] Re: Serious performance degradation of math functions

2019-02-05 Thread Brian Murray
Hello Oleg, or anyone else affected,

Accepted glibc into xenial-proposed. The package will build now and be
available at https://launchpad.net/ubuntu/+source/glibc/2.23-0ubuntu11
in a few hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested and change the tag from
verification-needed-xenial to verification-done-xenial. If it does not
fix the bug for you, please add a comment stating that, and change the
tag to verification-failed-xenial. In either case, without details of
your testing we will not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: glibc (Ubuntu Xenial)
   Status: In Progress => Fix Committed

** Tags added: verification-needed verification-needed-xenial

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1663280

Title:
  Serious performance degradation of math functions

Status in GLibC:
  Fix Released
Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Xenial:
  Fix Committed
Status in glibc source package in Zesty:
  Won't Fix
Status in glibc package in Fedora:
  Fix Released

Bug description:
  SRU Justification
  =

  [Impact]

   * Severe performance hit on many maths-heavy workloads. For example,
  a user reports linpack performance of 13 Gflops on Trusty and Bionic
  and 3.9 Gflops on Xenial.

   * Because the impact is so large (>3x) and Xenial is supported until
  2021, the fix should be backported.

   * The fix avoids an AVX-SSE transition penalty. It stops
  _dl_runtime_resolve() from using AVX-256 instructions which touch the
  upper halves of various registers. This change means that the
  processor does not need to save and restore them.

  [Test Case]

  Firstly, you need a suitable Intel machine. Users report that Sandy
  Bridge, Ivy Bridge, Haswell, and Broadwell CPUs are affected, and I
  have been able to reproduce it on a Skylake CPU using a suitable Azure
  VM.

  Create the following C file, exp.c:

  #include 
  #include 

  int main () {
double a, b;
for (a = b = 0.0; b < 2.0; b += 0.0005) a += exp(b);
printf("%f\n", a);
return 0;
  }

  $ gcc -O3 -march=x86-64 -o exp exp.c -lm

  With the current version of glibc:

  $ time ./exp
  ...
  real0m1.349s
  user0m1.349s

  
  $ time LD_BIND_NOW=1 ./exp
  ...
  real0m0.625s
  user0m0.621s

  Observe that LD_BIND_NOW makes a big difference as it avoids the call
  to _dl_runtime_resolve.

  With the proposed update:

  $ time ./exp
  ...
  real0m0.625s
  user0m0.621s

  
  $ time LD_BIND_NOW=1 ./exp
  ...

  real0m0.631s
  user0m0.631s

  Observe that the normal case is faster, and LD_BIND_NOW makes a
  negligible difference.

  [Regression Potential]

  glibc is the nightmare case for regressions as could affect pretty much
  anything, and this patch touches a key part (dynamic libraries).

  We can be fairly confident in the fix generally - it's in the glibc in
  Bionic, Debian and some RPM-based distros. The backport is based on
  the patches in the release/2.23/master branch in the upstream glibc
  repository, and the backport was straightforward.

  Obviously that doesn't remove all risk. There is also a fair bit of
  Ubuntu-specific patching in glibc so other distros are of limited
  value for ruling out bugs. So I have done the following testing, and
  I'm happy to do more as required. All testing has been done:
   - on an Azure VM (affected by the change), with proposed package
   - on a local VM (not affected by the change), with proposed package

   * Boot with the upgraded libc6.

   * Watch a youtube video in Firefox over VNC.

   * Build some C code (debuild of zlib).

   * Test Java by installing and running Eclipse.

  Autopkgtest also passes.

  [Original Description]

  Bug [0] has been introduced in Glibc 2.23 [1] and fixed in Glibc 2.25
  [2]. All Ubuntu versions starting from 16.04 are affected because they
  use either Glibc 2.23 or 2.24. Bug introduces serious (2x-4x)
  performance degradation of math functions (pow, exp/exp2/exp10,
  log/log2/log10, sin/cos/sincos/tan, asin/acos/atan/atan2,
  sinh/cosh/tanh, asinh/acosh/atanh) provided by libm. Bug can be
  reproduced on any AVX-capable x86-64 machine.

  @strikov: According to a quite 

[Sts-sponsors] [Bug 1663280] Re: Serious performance degradation of math functions

2019-02-04 Thread Dan Streetman
** Changed in: glibc (Ubuntu Xenial)
   Importance: Low => High

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1663280

Title:
  Serious performance degradation of math functions

Status in GLibC:
  Fix Released
Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Xenial:
  In Progress
Status in glibc source package in Zesty:
  Won't Fix
Status in glibc package in Fedora:
  Fix Released

Bug description:
  SRU Justification
  =

  [Impact]

   * Severe performance hit on many maths-heavy workloads. For example,
  a user reports linpack performance of 13 Gflops on Trusty and Bionic
  and 3.9 Gflops on Xenial.

   * Because the impact is so large (>3x) and Xenial is supported until
  2021, the fix should be backported.

   * The fix avoids an AVX-SSE transition penalty. It stops
  _dl_runtime_resolve() from using AVX-256 instructions which touch the
  upper halves of various registers. This change means that the
  processor does not need to save and restore them.

  [Test Case]

  Firstly, you need a suitable Intel machine. Users report that Sandy
  Bridge, Ivy Bridge, Haswell, and Broadwell CPUs are affected, and I
  have been able to reproduce it on a Skylake CPU using a suitable Azure
  VM.

  Create the following C file, exp.c:

  #include 
  #include 

  int main () {
double a, b;
for (a = b = 0.0; b < 2.0; b += 0.0005) a += exp(b);
printf("%f\n", a);
return 0;
  }

  $ gcc -O3 -march=x86-64 -o exp exp.c -lm

  With the current version of glibc:

  $ time ./exp
  ...
  real0m1.349s
  user0m1.349s

  
  $ time LD_BIND_NOW=1 ./exp
  ...
  real0m0.625s
  user0m0.621s

  Observe that LD_BIND_NOW makes a big difference as it avoids the call
  to _dl_runtime_resolve.

  With the proposed update:

  $ time ./exp
  ...
  real0m0.625s
  user0m0.621s

  
  $ time LD_BIND_NOW=1 ./exp
  ...

  real0m0.631s
  user0m0.631s

  Observe that the normal case is faster, and LD_BIND_NOW makes a
  negligible difference.

  [Regression Potential]

  glibc is the nightmare case for regressions as could affect pretty much
  anything, and this patch touches a key part (dynamic libraries).

  We can be fairly confident in the fix generally - it's in the glibc in
  Bionic, Debian and some RPM-based distros. The backport is based on
  the patches in the release/2.23/master branch in the upstream glibc
  repository, and the backport was straightforward.

  Obviously that doesn't remove all risk. There is also a fair bit of
  Ubuntu-specific patching in glibc so other distros are of limited
  value for ruling out bugs. So I have done the following testing, and
  I'm happy to do more as required. All testing has been done:
   - on an Azure VM (affected by the change), with proposed package
   - on a local VM (not affected by the change), with proposed package

   * Boot with the upgraded libc6.

   * Watch a youtube video in Firefox over VNC.

   * Build some C code (debuild of zlib).

   * Test Java by installing and running Eclipse.

  Autopkgtest also passes.

  [Original Description]

  Bug [0] has been introduced in Glibc 2.23 [1] and fixed in Glibc 2.25
  [2]. All Ubuntu versions starting from 16.04 are affected because they
  use either Glibc 2.23 or 2.24. Bug introduces serious (2x-4x)
  performance degradation of math functions (pow, exp/exp2/exp10,
  log/log2/log10, sin/cos/sincos/tan, asin/acos/atan/atan2,
  sinh/cosh/tanh, asinh/acosh/atanh) provided by libm. Bug can be
  reproduced on any AVX-capable x86-64 machine.

  @strikov: According to a quite reliable source [5] all AMD CPUs and
  latest Intel CPUs (Skylake and Knights Landing) don't suffer from
  AVX/SSE transition penalty. It means that the scope of this bug
  becomes smaller and includes only the following generations of Intel's
  CPUs: Sandy Bridge, Ivy Bridge, Haswell, and Broadwell. Scope still
  remains quite large though.

  @strikov: Ubuntu 16.10/17.04 which use Glibc 2.24 may recieve the fix
  from upstream 2.24 branch (as Marcel pointed out, fix has been
  backported to 2.24 branch where Fedora took it successfully) if such
  synchronization will take place. Ubuntu 16.04 (the main target of this
  bug) uses Glibc 2.23 which hasn't been patched upstream and will
  suffer from performance degradation until we fix it manually.

  This bug is all about AVX-SSE transition penalty [3]. 256-bit YMM
  registers used by AVX-256 instructions extend 128-bit registers used
  by SSE (XMM0 is a low half of YMM0 and so on). Every time CPU executes
  SSE instruction after AVX-256 instruction it has to store upper half
  of the YMM register to the internal buffer and then restore it when
  execution returns back to AVX instructions. Store/restore is required
  because old-fashioned SSE knows nothing about the upper halves of its
  registers and may damage them. This 

[Sts-sponsors] [Bug 1663280] Re: Serious performance degradation of math functions

2018-12-14 Thread Fabio Augusto Miranda Martins
** Changed in: glibc (Ubuntu Xenial)
   Importance: Medium => Low

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1663280

Title:
  Serious performance degradation of math functions

Status in GLibC:
  Fix Released
Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Xenial:
  In Progress
Status in glibc source package in Zesty:
  Won't Fix
Status in glibc package in Fedora:
  Fix Released

Bug description:
  SRU Justification
  =

  [Impact]

   * Severe performance hit on many maths-heavy workloads. For example,
  a user reports linpack performance of 13 Gflops on Trusty and Bionic
  and 3.9 Gflops on Xenial.

   * Because the impact is so large (>3x) and Xenial is supported until
  2021, the fix should be backported.

   * The fix avoids an AVX-SSE transition penalty. It stops
  _dl_runtime_resolve() from using AVX-256 instructions which touch the
  upper halves of various registers. This change means that the
  processor does not need to save and restore them.

  [Test Case]

  Firstly, you need a suitable Intel machine. Users report that Sandy
  Bridge, Ivy Bridge, Haswell, and Broadwell CPUs are affected, and I
  have been able to reproduce it on a Skylake CPU using a suitable Azure
  VM.

  Create the following C file, exp.c:

  #include 
  #include 

  int main () {
double a, b;
for (a = b = 0.0; b < 2.0; b += 0.0005) a += exp(b);
printf("%f\n", a);
return 0;
  }

  $ gcc -O3 -march=x86-64 -o exp exp.c -lm

  With the current version of glibc:

  $ time ./exp
  ...
  real0m1.349s
  user0m1.349s

  
  $ time LD_BIND_NOW=1 ./exp
  ...
  real0m0.625s
  user0m0.621s

  Observe that LD_BIND_NOW makes a big difference as it avoids the call
  to _dl_runtime_resolve.

  With the proposed update:

  $ time ./exp
  ...
  real0m0.625s
  user0m0.621s

  
  $ time LD_BIND_NOW=1 ./exp
  ...

  real0m0.631s
  user0m0.631s

  Observe that the normal case is faster, and LD_BIND_NOW makes a
  negligible difference.

  [Regression Potential]

  glibc is the nightmare case for regressions as could affect pretty much
  anything, and this patch touches a key part (dynamic libraries).

  We can be fairly confident in the fix generally - it's in the glibc in
  Bionic, Debian and some RPM-based distros. The backport is based on
  the patches in the release/2.23/master branch in the upstream glibc
  repository, and the backport was straightforward.

  Obviously that doesn't remove all risk. There is also a fair bit of
  Ubuntu-specific patching in glibc so other distros are of limited
  value for ruling out bugs. So I have done the following testing, and
  I'm happy to do more as required. All testing has been done:
   - on an Azure VM (affected by the change), with proposed package
   - on a local VM (not affected by the change), with proposed package

   * Boot with the upgraded libc6.

   * Watch a youtube video in Firefox over VNC.

   * Build some C code (debuild of zlib).

   * Test Java by installing and running Eclipse.

  Autopkgtest also passes.

  [Original Description]

  Bug [0] has been introduced in Glibc 2.23 [1] and fixed in Glibc 2.25
  [2]. All Ubuntu versions starting from 16.04 are affected because they
  use either Glibc 2.23 or 2.24. Bug introduces serious (2x-4x)
  performance degradation of math functions (pow, exp/exp2/exp10,
  log/log2/log10, sin/cos/sincos/tan, asin/acos/atan/atan2,
  sinh/cosh/tanh, asinh/acosh/atanh) provided by libm. Bug can be
  reproduced on any AVX-capable x86-64 machine.

  @strikov: According to a quite reliable source [5] all AMD CPUs and
  latest Intel CPUs (Skylake and Knights Landing) don't suffer from
  AVX/SSE transition penalty. It means that the scope of this bug
  becomes smaller and includes only the following generations of Intel's
  CPUs: Sandy Bridge, Ivy Bridge, Haswell, and Broadwell. Scope still
  remains quite large though.

  @strikov: Ubuntu 16.10/17.04 which use Glibc 2.24 may recieve the fix
  from upstream 2.24 branch (as Marcel pointed out, fix has been
  backported to 2.24 branch where Fedora took it successfully) if such
  synchronization will take place. Ubuntu 16.04 (the main target of this
  bug) uses Glibc 2.23 which hasn't been patched upstream and will
  suffer from performance degradation until we fix it manually.

  This bug is all about AVX-SSE transition penalty [3]. 256-bit YMM
  registers used by AVX-256 instructions extend 128-bit registers used
  by SSE (XMM0 is a low half of YMM0 and so on). Every time CPU executes
  SSE instruction after AVX-256 instruction it has to store upper half
  of the YMM register to the internal buffer and then restore it when
  execution returns back to AVX instructions. Store/restore is required
  because old-fashioned SSE knows nothing about the upper halves of its
  registers and may damage them. This 

[Sts-sponsors] [Bug 1663280] Re: Serious performance degradation of math functions

2018-11-03 Thread Mathew Hodson
** Changed in: glibc (Ubuntu Xenial)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1663280

Title:
  Serious performance degradation of math functions

Status in GLibC:
  Fix Released
Status in glibc package in Ubuntu:
  Fix Released
Status in glibc source package in Xenial:
  Confirmed
Status in glibc source package in Zesty:
  Won't Fix
Status in glibc package in Fedora:
  Fix Released

Bug description:
  SRU Justification
  =

  [Impact]

   * Severe performance hit on many maths-heavy workloads. For example,
  a user reports linpack performance of 13 Gflops on Trusty and Bionic
  and 3.9 Gflops on Xenial.

   * Because the impact is so large (>3x) and Xenial is supported until
  2021, the fix should be backported.

   * The fix avoids an AVX-SSE transition penalty. It stops
  _dl_runtime_resolve() from using AVX-256 instructions which touch the
  upper halves of various registers. This change means that the
  processor does not need to save and restore them.

  [Test Case]

  Firstly, you need a suitable Intel machine. Users report that Sandy
  Bridge, Ivy Bridge, Haswell, and Broadwell CPUs are affected, and I
  have been able to reproduce it on a Skylake CPU using a suitable Azure
  VM.

  Create the following C file, exp.c:

  #include 
  #include 

  int main () {
double a, b;
for (a = b = 0.0; b < 2.0; b += 0.0005) a += exp(b);
printf("%f\n", a);
return 0;
  }

  $ gcc -O3 -march=x86-64 -o exp exp.c -lm

  With the current version of glibc:

  $ time ./exp
  ...
  real0m1.349s
  user0m1.349s

  
  $ time LD_BIND_NOW=1 ./exp
  ...
  real0m0.625s
  user0m0.621s

  Observe that LD_BIND_NOW makes a big difference as it avoids the call
  to _dl_runtime_resolve.

  With the proposed update:

  $ time ./exp
  ...
  real0m0.625s
  user0m0.621s

  
  $ time LD_BIND_NOW=1 ./exp
  ...

  real0m0.631s
  user0m0.631s

  Observe that the normal case is faster, and LD_BIND_NOW makes a
  negligible difference.

  [Regression Potential]

  glibc is the nightmare case for regressions as could affect pretty much
  anything, and this patch touches a key part (dynamic libraries).

  We can be fairly confident in the fix generally - it's in the glibc in
  Bionic, Debian and some RPM-based distros. The backport is based on
  the patches in the release/2.23/master branch in the upstream glibc
  repository, and the backport was straightforward.

  Obviously that doesn't remove all risk. There is also a fair bit of
  Ubuntu-specific patching in glibc so other distros are of limited
  value for ruling out bugs. So I have done the following testing, and
  I'm happy to do more as required. All testing has been done:
   - on an Azure VM (affected by the change), with proposed package
   - on a local VM (not affected by the change), with proposed package

   * Boot with the upgraded libc6.

   * Watch a youtube video in Firefox over VNC.

   * Build some C code (debuild of zlib).

   * Test Java by installing and running Eclipse.

  Autopkgtest also passes.

  [Original Description]

  Bug [0] has been introduced in Glibc 2.23 [1] and fixed in Glibc 2.25
  [2]. All Ubuntu versions starting from 16.04 are affected because they
  use either Glibc 2.23 or 2.24. Bug introduces serious (2x-4x)
  performance degradation of math functions (pow, exp/exp2/exp10,
  log/log2/log10, sin/cos/sincos/tan, asin/acos/atan/atan2,
  sinh/cosh/tanh, asinh/acosh/atanh) provided by libm. Bug can be
  reproduced on any AVX-capable x86-64 machine.

  @strikov: According to a quite reliable source [5] all AMD CPUs and
  latest Intel CPUs (Skylake and Knights Landing) don't suffer from
  AVX/SSE transition penalty. It means that the scope of this bug
  becomes smaller and includes only the following generations of Intel's
  CPUs: Sandy Bridge, Ivy Bridge, Haswell, and Broadwell. Scope still
  remains quite large though.

  @strikov: Ubuntu 16.10/17.04 which use Glibc 2.24 may recieve the fix
  from upstream 2.24 branch (as Marcel pointed out, fix has been
  backported to 2.24 branch where Fedora took it successfully) if such
  synchronization will take place. Ubuntu 16.04 (the main target of this
  bug) uses Glibc 2.23 which hasn't been patched upstream and will
  suffer from performance degradation until we fix it manually.

  This bug is all about AVX-SSE transition penalty [3]. 256-bit YMM
  registers used by AVX-256 instructions extend 128-bit registers used
  by SSE (XMM0 is a low half of YMM0 and so on). Every time CPU executes
  SSE instruction after AVX-256 instruction it has to store upper half
  of the YMM register to the internal buffer and then restore it when
  execution returns back to AVX instructions. Store/restore is required
  because old-fashioned SSE knows nothing about the upper halves of its
  registers and may damage them.