Bug#863672: performance critical libyuv built with Os

2017-06-02 Thread Laurent Bigonville

tag 863672 + patch fixed-upstream
thanks

On Mon, 29 May 2017 23:14:38 +0200 Julian Taylor 
 wrote:


>
> libyuv which is a performance critical library for firefix is built with
> -Os which is horrible for performance for it.
> In particular row_common.cc which contains the generic parts of the
> color transformation code:
>
> See:
> 
https://buildd.debian.org/status/fetch.php?pkg=firefox=amd64=53.0.is.52.0.2-1=1492644908=0

>
> /usr/bin/g++ -std=gnu++11 -o row_common.o -c ... -fPIC
> -DMOZILLA_CLIENT -include
> /PKGBUILDDIR/build-browser/mozilla-config.h -MD -MP -MF
> .deps/row_common.o.pp -Wdate-time -D_FORTIFY_SOURCE=2 -Wall
> -Wc++11-compat -Wempty-body -Wignored-qualifiers -Woverloaded-virtual
> -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code
> -Wwrite-strings -Wno-invalid-offsetof -Wc++14-compat
> -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations
> -Wno-error=array-bounds -fno-lifetime-dse -fstack-protector-strong
> -Wformat -Werror=format-security -fno-schedule-insns2 -fno-lifetime-dse
> -fno-delete-null-pointer-checks -fno-exceptions -fno-strict-aliasing
> -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions
> -fno-math-errno -pthread -pipe -g -freorder-blocks -Os
> -fomit-frame-pointer
> /PKGBUILDDIR/media/libyuv/source/row_common.cc
>
>
> The problematic part is the YuvPixel function which is called in loops
> and in turn calls tiny clamp functions.
> Os disables inlining so this causes massive overhead.
> This is the top cpu profile on sites which e.g. display videos.
> 17.25% libxul.so [.] YuvPixel ▒
> 6.58% libxul.so [.] Clamp ▒
> 6.46% libxul.so [.] clamp255
>
> The problem is not as bad as it looks as this generic code is only
> executed on machines that do not have SSSE3, AVX2 or NEON (see
> convert_argb.cc)
> But there are still plenty useful cpus that do not have these
> instruction sets and are crippled by the compiler flags used.
>
> Is it possible to compile this library with O3 to allow the compiler to
> vectorize it with the best available generic instruction set (e.g. SSE2
> on x64).

FTR, this is fixed upstream now, -O2 is used by default on desktop build:

https://hg.mozilla.org/integration/autoland/rev/8fdb9e30b6a7



Bug#863672: performance critical libyuv built with Os

2017-05-29 Thread Julian Taylor
Package: firefox
Version:  53.0.is.52.0.2-1
Severity: normal


libyuv which is a performance critical library for firefix is built with
-Os which is horrible for performance for it.
In particular row_common.cc which contains the generic parts of the
color transformation code:

See:
https://buildd.debian.org/status/fetch.php?pkg=firefox=amd64=53.0.is.52.0.2-1=1492644908=0

/usr/bin/g++ -std=gnu++11 -o row_common.o -c  ...   -fPIC
-DMOZILLA_CLIENT -include
/PKGBUILDDIR/build-browser/mozilla-config.h -MD -MP -MF
.deps/row_common.o.pp -Wdate-time -D_FORTIFY_SOURCE=2 -Wall
-Wc++11-compat -Wempty-body -Wignored-qualifiers -Woverloaded-virtual
-Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code
-Wwrite-strings -Wno-invalid-offsetof -Wc++14-compat
-Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations
-Wno-error=array-bounds -fno-lifetime-dse -fstack-protector-strong
-Wformat -Werror=format-security -fno-schedule-insns2 -fno-lifetime-dse
-fno-delete-null-pointer-checks -fno-exceptions -fno-strict-aliasing
-fno-rtti -ffunction-sections -fdata-sections -fno-exceptions
-fno-math-errno -pthread -pipe  -g -freorder-blocks -Os
-fomit-frame-pointer
/PKGBUILDDIR/media/libyuv/source/row_common.cc


The problematic part is the YuvPixel function which is called in loops
and in turn calls tiny clamp functions.
Os disables inlining so this causes massive overhead.
This is the top cpu profile on sites which e.g. display videos.
  17.25%  libxul.so   [.] YuvPixel▒
   6.58%  libxul.so   [.] Clamp   ▒
   6.46%  libxul.so   [.] clamp255

The problem is not as bad as it looks as this generic code is only
executed on machines that do not have SSSE3, AVX2 or NEON (see
convert_argb.cc)
But there are still plenty useful cpus that do not have these
instruction sets and are crippled by the compiler flags used.

Is it possible to compile this library with O3 to allow the compiler to
vectorize it with the best available generic instruction set (e.g. SSE2
on x64).

cheers,
Julian Taylor



signature.asc
Description: OpenPGP digital signature