Sent on the go...
> On 15 Jan 2016, at 21:09, Peter Pearson <peter.pear...@gmail.com> wrote: > > Replies inline... > >> On > > Also, reading and writing of values in OpenEXR goes through ImfXdr.h's > conversion routines doing bitshifting for I assume endianness conversion? - I > guess the x86 port for OpenEXR had to convert this, whereas the SGI versions > didn't, and we're stuck with it now? There are certainly some case where even the non xdr paths are potentially slower than needed, sometimes it calls stdlib memory routines, other time it is implemented as a basic loop. > > On top of that, in the multi-threading scenario, while using a LUT for > half->float conversion is faster than not using it, it causes absolute havoc > in terms of L1/L2 cache thrashing - from disk I've sometimes found reading > full float EXRs faster than half EXRs due to this, but that's probably only > when the OS disk cache has them, so in general it's not a huge issue given > the IO saving that'll happen in most real-world usage for big facilities... It would be nice if the copypixels and similar calls supported cpu specific implementations and there was an f16c implementation for the half conversions, once I patched the avx detection code in configure for gcc4.1, I got the big win mentioned in the dwa white paper, at lest in Nuke the conversion function inside the DWA parts of the library totally dropped to the bottom of the profiler hot spots and the performance jumped. Karl, would that explain some of your differences with compiler versions? I did think about rewriting the code myself but my assembler experience is somewhere around 80386 time frame (or ARM2/3, PDP8) so I put that near the bottom of my pile Kevin _______________________________________________ Oiio-dev mailing list Oiio-dev@lists.openimageio.org http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org