another data point.

When I first experimented with adding DWA to Nuke using OpenEXR 2.2.0 I had
to patch the configure so I could enable f16c instructions for gcc 4.1.2,
after doing so vtune pointed to the copyFromFrameBuffer function when going
from half to float for ~30+% of the CPU when reading files from local SSD.
(Aside, there were a number of other namespace related fixes that were
needed too, all of these are in the latest OpenEXR versions). I came to the
conclusion that to make the performance any better it would need a f16c
based half to float conversion function rather than going via the LUT, at
least for those CPUs supporting those instructions. I also have some notes
about testing memory mapped reading, but no conclusions.

This was not the case when f16c were disabled as other functions appeared
higher in the profile - the total performance was lower without f16c (no
surprise), it was only because the other functions got reduced by the f16c
that bubbled copyFromFrameBuffer to the top.

I didn't try RLE compression.
Kevin
_______________________________________________
Openexr-devel mailing list
Openexr-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/openexr-devel

Reply via email to