On Sun, Nov 25, 2018 at 1:19 AM Waldek Kozaczuk <[email protected]> wrote:
> I manually disabled -ffast-math option in x265 CMakefile and the problem > went away. I still wonder why the distribution version was not built with > -ffast-math option. I did read somewhere that fast-math may make some math > operations faster but not necessarily correct and it is not IEEE floating > point compliant. > I'll try to answer several of the questions you raised in the several mails you asked in this thread, tell me if I forgot something. libmvec is a new (3 year old) part of glibc (see https://sourceware.org/glibc/wiki/libmvec, https://lwn.net/Articles/654605/). As it is part of glibc, we're indeed supposed to implement it in OSv, and unless we're lucky, we cannot just copy the host's libmvec.so into the image (although, it might be worth investigating - maybe with minimal changes to OSv to include some missing glibc-internal stuff, it may end up working. Certainly re-implementing this library from scratch would be a pain. Perhaps Musl would consider taking on this challenge?). As to what libmvec *does*, it is a library for vector math function (various arithmetic operations on a whole vector of floating point numbers) using new SIMD (i.e., SSE / AVX) hardware in modern processors. But there is a snag: Using these operations may yield slightly different results from the traditional floating point operations. For example, there is no instruction for calculating log() for a vector of numbers. But a library could implement log() in terms of a bunch of other vectorized operations (multiplication, addition, etc.). This is what the _ZGVdN4v_log() which you encountered does. The result is a good and fast implementation for log(), but one which is *different* from glibc's classic log() implementation so applications may suddenly see (slightly) different results. This is why gcc never uses these functions unless you use the "--fastmath" option, which tells gcc you want faster implementation even at the cost of slightly different results (these are not "wrong" results, of course, just different). We can encounter a need for this libmvec in code compiled with --fastmath, or in code which uses it directly (I've never seen one, but it's possible). As to why the pre-compiled code you saw was *not* compiled with fastmath, I am guessing your distribution wanted to create an executable that can run on older x86 without new SIMD hardware. Or maybe they thought fastmath to not be "safe" enough and didn't consider the performance benefit high enough. > At the same time I also disabled numa support in x265 and transcoding same > video is almost twice faster now. I am guessing the code relying on libnuma > (if compiled with numa enabled) when failing due to some limited numa > support in OSv also disabled proper threading which made it slower possibly. > It would be nice to check why this happens. This code shouldn't need complete NUMA support (i.e., support for multiple NUMA nodes), all it should need is the ability to query the current configuration and especially the number of nodes. We probably missed one or more of these query functions, and I think it is very likely that with a very small patch to OSv, this can be fixed. You just need to look at that x265 code and see which NUMA-related functions it calls, and what they do (e.g., call some system call) and what goes wrong on OSv. Nadav. -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
