Sent on the go...

> On 15 Jan 2016, at 21:09, Peter Pearson <peter.pear...@gmail.com> wrote:
> 
> Replies inline...
> 
>> On 
> 
> Also, reading and writing of values in OpenEXR goes through ImfXdr.h's 
> conversion routines doing bitshifting for I assume endianness conversion? - I 
> guess the x86 port for OpenEXR had to convert this, whereas the SGI versions 
> didn't, and we're stuck with it now?

There are certainly some case where even the non xdr paths are potentially 
slower than needed, sometimes it calls stdlib memory routines, other time it is 
implemented as a basic loop.
> 
> On top of that, in the multi-threading scenario, while using a LUT for 
> half->float conversion is faster than not using it, it causes absolute havoc 
> in terms of L1/L2 cache thrashing - from disk I've sometimes found reading 
> full float EXRs faster than half EXRs due to this, but that's probably only 
> when the OS disk cache has them, so in general it's not a huge issue given 
> the IO saving that'll happen in most real-world usage for big facilities...

It would be nice if the copypixels and similar calls supported cpu specific 
implementations and there was an f16c implementation for the half conversions, 
once I patched the avx detection code in configure for gcc4.1, I got the big 
win mentioned in the dwa white paper, at lest in Nuke the conversion function 
inside the DWA parts of the library totally dropped to the bottom of the 
profiler hot spots and the performance jumped.

Karl, would that explain some of your differences with compiler versions?

I did think about rewriting the code myself but my assembler experience is 
somewhere around 80386 time frame (or ARM2/3, PDP8) so I put that near the 
bottom of my pile

Kevin
_______________________________________________
Oiio-dev mailing list
Oiio-dev@lists.openimageio.org
http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

Reply via email to