On Sun, Nov 25, 2018 at 1:19 AM Waldek Kozaczuk <[email protected]>
wrote:

> I manually disabled -ffast-math option in x265 CMakefile and the problem
> went away. I still wonder why the distribution version was not built with
> -ffast-math option. I did read somewhere that fast-math may make some math
> operations faster but not necessarily correct and it is not IEEE floating
> point compliant.
>

I'll try to answer several of the questions you raised in the several mails
you asked in this thread, tell me if I forgot something.

libmvec is a new (3 year old) part of glibc (see
https://sourceware.org/glibc/wiki/libmvec, https://lwn.net/Articles/654605/).
As it is part of glibc, we're indeed supposed to implement it in OSv, and
unless we're lucky, we cannot just copy the host's libmvec.so into the
image (although, it might be worth investigating - maybe with minimal
changes to OSv to include some missing glibc-internal stuff, it may end up
working. Certainly re-implementing this library from scratch would be a
pain. Perhaps Musl would consider taking on this challenge?).

As to what libmvec *does*, it is a library for vector math function
(various arithmetic operations on a whole vector of floating point numbers)
using new SIMD (i.e., SSE / AVX) hardware in modern processors. But there
is a snag: Using these operations may yield slightly different results from
the traditional floating point operations. For example, there is no
instruction for calculating log() for a vector of numbers. But a library
could implement log() in terms of a bunch of other vectorized operations
(multiplication, addition, etc.). This is what the  _ZGVdN4v_log() which
you encountered does. The result is a good and fast implementation for
log(), but one which is *different* from glibc's classic log()
implementation so applications may suddenly see (slightly) different
results.

This is why gcc never uses these functions unless you use the "--fastmath"
option, which tells gcc you want faster implementation even at the cost of
slightly different results (these are not "wrong" results, of course, just
different).
We can encounter a need for this libmvec in code compiled with --fastmath,
or in code which uses it directly (I've never seen one, but it's possible).

As to why the pre-compiled code you saw was *not* compiled with fastmath, I
am guessing your distribution wanted to create an executable that can run
on older x86 without new SIMD hardware. Or maybe they thought fastmath to
not be "safe" enough and didn't consider the performance benefit high
enough.


> At the same time I also disabled numa support in x265 and transcoding same
> video is almost twice faster now. I am guessing the code relying on libnuma
> (if compiled with numa enabled) when failing due to some limited numa
> support in OSv also disabled proper threading which made it slower possibly.
>

It would be nice to check why this happens. This code shouldn't need
complete NUMA support (i.e., support for multiple NUMA nodes), all it
should need is the ability to query the current configuration and
especially the number of nodes. We probably missed one or more of these
query functions, and I think it is very likely that with a very small patch
to OSv, this can be fixed. You just need to look at that x265 code and see
which NUMA-related functions it calls, and what they do (e.g., call some
system call) and what goes wrong on OSv.

Nadav.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to