Hi, Following up on this, we tried the suggestion of using the ILMBASE_FORCE_CXX03 but did not make any difference in our measurement. So I went on and investigated the issue closer. I've eliminated a few differences before I got something that might be the cause of the problem.
So, our OpenEXR build, 2.2.0 was built with the standard compiler at the time, gcc 4.1.2 if I'm right. I have compared this to the OpenEXR 2.3.0 which I built with GCC 4.8.5. In order to eliminate the differences, I've rebuilt OpenEXR 2.2.0/2.3.0 with gcc 4.8.5 (vfx17) and gcc 6.3.1 (vfx19). The results seem to indicate that OpenEXR 2.2.0 experiences the same slowdown as 2.3.0 with these compilers, as compared to the one built with gcc 4.1.2. When investigated with VTune, I found that the function that rises the most significantly in execution time, is Imf2_X::copyFromFrameBuffer(). This is true for both 2.3.0 and 2.2.0. So it seems that the slowdown. We would normally expect speed ups with newer compilers, but of course it's never a linear path. There was some slowdown also in the rleCompress function, but with generally less impact (at the moment I'm focusing on RLE compression as in our experience this suffered the most significant slowdown, but now I suspect it might as well be a relative portion of slowdown that made this compression type slower). I also read on some threads that people experienced significant gains on DWA performance when switching gcc 4.1.x to 4.8, which confirms our finding as well. Any experience with different compilers? The copyFromFrameBuffer function seems to be very complex and it might have been an overly optimistic optimisation back in 4.1.2 that might have been removed since. On Thu, 4 Apr 2019 at 13:22, Tony Micilotta <tony.micilo...@foundry.com> wrote: > Hi Peter and Nick, > > > > Thank you for the suggestions and performance findings. We rebuilt with > ILMBASE_FORCE_CXX03 > defined, however the write speed changes remain unchanged . We’ll look at > your suggestions too Peter and will also take a peak with vtune. > > > > Regards, > > Tony > > > > *From: *Peter Hillman <pet...@wetafx.co.nz> > *Date: *Wednesday, 3 April 2019 at 07:30 > *To: *Nick Porcino <mesh...@hotmail.com>, Tony Micilotta < > tony.micilo...@foundry.com>, "openexr-devel@nongnu.org" < > openexr-devel@nongnu.org> > *Subject: *Re: [Openexr-devel] OpenExr 2.3 - slower write speeds for > Uncompressed and Zip1 > > > > I've just been running tests myself along the same lines. > > I tried comparing writing various schemes with EXR-2.2.1, EXR-2.3.0 and > EXR-2.3.0 using an IlmBase build with ILMBASE_FORCE_CXX03 (passing > --enable-stdcxx=03 to ./configure) > > This attempts to use the API in a similar way to the Nuke exrWriter > example (my code attached) > > > > Conclusions: > > > > I don't think ILMBASE_FORCE_CXX03 has a difference in my test. > > I find no reliable speed difference between 2.2.1 and 2.3.0 where no > threadpool is used (so running the attached test using *testEXR zips none > 1 /tmp/test.exr*) or the number of threads is close to the number of CPU > cores regardless of compression scheme (e.g. with 4 cores try *testEXR > zips 4 20 /tmp/test.exr*) . > > I *did* manage to make 2.3.0 go slower than 2.2.1, by running *testEXR > zips 100 10 /tmp/test.exr* which means there are 100 threads allocated, > but only a (random?) 10 of those have anything to do each time, since only > 10 scanlines are written at once, and those 10 are fighting for (in my > case) 4 cores. Both 2.2.1 and 2.3.0 run slower with excessive threadpools, > but the effect is significantly more in 2.3.0. > > > > I wonder if somehow Nuke's other threads are interfering with EXRs > threadpool operation in 2.3.0 in a way they aren't in 2.2.1? It might be > interesting to implement a test like the attached as a self-contained > method triggered from a button press in a Nuke node, so it runs within > Nuke's existing framework but doesn't use any Nuke functionality at all, > and see if there's a difference between that and running the function as a > standalone binary outside of Nuke. > > > > I'm intrigued that, according to the graph, the compression schemes that > write single scanlines at once have slowed down, and the ones that write > many scanlines at once have sped up. Schemes that write multiple scanlines > per block have lower overhead through IlmThread. In my case I didn't see > any significant speedups with 2.3.0 in any scheme. > > > > Peter > > > ------------------------------ > > *From:* Openexr-devel <openexr-devel-bounces+peterh= > wetafx.co...@nongnu.org> on behalf of Nick Porcino <mesh...@hotmail.com> > *Sent:* Wednesday, 3 April 2019 5:05 PM > *To:* Tony Micilotta; openexr-devel@nongnu.org > *Subject:* Re: [Openexr-devel] OpenExr 2.3 - slower write speeds for > Uncompressed and Zip1 > > > > > > One of the big differences between 2.2 and 2.3 is that we moved from > pthread to std::thread, and made a number of corrections for thread safety. > > > > e.g. > > > > > https://github.com/openexr/openexr/commit/eea1e607177e339e05daa1a2ec969a9dd12f2497 > > > > > https://github.com/openexr/openexr/commit/bf0cb8cdce32fce36017107c9982e1e5db2fb3fa > > > > The old pthread/windows native implementation can be re-enabled by setting > the preprocessor define ILMBASE_FORCE_CXX03. I am wondering if you might > be able to benchmark against that version of the code as an experiment to > get more information? > > > > > ------------------------------ > > *From:* Openexr-devel <openexr-devel-bounces+meshula= > hotmail....@nongnu.org> on behalf of Tony Micilotta < > tony.micilo...@foundry.com> > *Sent:* Monday, April 1, 2019 8:27:29 AM > *To:* openexr-devel@nongnu.org > *Subject:* [Openexr-devel] OpenExr 2.3 - slower write speeds for > Uncompressed and Zip1 > > > > Hi, > > > > At Foundry we’ve started testing the read performance of OpenExr 2.3 with > Boost 1.66 as we move Nuke towards conforming to VFX Reference Platform > 2019. Although the release notes don’t mention any changes to file writing, > we have charted (see graphic below) OpenExr 2.3 write performance against > OpenExr 2.2 as we noticed a slowdown in our playback cache generation which > uses OpenExr. > > > > From Nuke, a checkerboard/colourwheel combination is generated and > animated over 100 frames in order to eliminate file reading from the > measurements. These 100 frames are written to disk using each compression > type, and the timings are averaged over 10 test runs. Certain compression > types show an improvement which is great, however we’re concerned that > Uncompressed and Zip1 are 13% and 9% slower respectively: > > > > [image: > /var/folders/40/8tkm41ts6pl0g3qvshsrrnpc0001xc/T/com.microsoft.Outlook/Content.MSO/515C417.tmp] > > > > Have you benchmarked this on your side too, and have you noticed similar > metrics? > > > > Regards, > > Tony > > > > > > > *Tony Micilotta* > > Senior Technical Product Manager - Nuke Family > > [image: > https://lh6.googleusercontent.com/x8JxqXcMjULW_8dLygamrZxWcjpFchCVThRNxLrFSEsE_i1aMPBED6FYrUWu6lT54vMN4z1x3D59uPxzh1ughmlVxH5-6yflHx_SZlGAUZV3NA3FatIrHplrrBasCyGa3MhHz8Ti] > > *Tel*: +44 (0)20 7479 4350 > > *Web: *www.foundry.com > > > The Foundry Visionmongers Ltd. - Registered in England and Wales No: > 4642027 - Address: 5 Golden Square, London, W1F 9HT - VAT No: 672742224 > > >
_______________________________________________ Openexr-devel mailing list Openexr-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/openexr-devel