Re: [Openexr-devel] EXR texture memory overhead

Alexandre Fri, 16 Sep 2016 11:50:08 -0700

—— 
This PR (141) addresses the scan-line part only because they only use 
scan-lines internally inside of Nuke, but we may extend it for tiled files as 
well, the point is still valid.
Overall I wonder why I see users in the nature with so many untiled EXR files 
(and what application is producing them) where tiled files seem to be much more 
efficient for any application that is not scan-line based.


> On 16 Sep 2016, at 20:16, Larry Gritz <l...@larrygritz.com> wrote:
> 
> Underscoring once again how critical it is that we start processing the long 
> list of pending PRs. There are a lot of good ideas, bug fixes, and 
> performance improvements just rotting in people's private repos, waiting for 
> somebody to fold them into official OpenEXR releases.
> 
> Thanks, Alexandre. I haven't read your patch in detail, but I'm definitely 
> interested in additions to IlmImf internals that could be employed for my use 
> case in order to cut down on unnecessary copies and redundant buffer 
> allocations. Though in my quick scan, it looks scanline-specific, whereas for 
> my use case we're dealing primarily with tiled files, so a second patch will 
> be necessary to do the equivalent for tiles.
> 
> 
>> On Sep 16, 2016, at 10:51 AM, Alexandre <alexandre.gauthier-foic...@inria.fr 
>> <mailto:alexandre.gauthier-foic...@inria.fr>> wrote:
>> 
>> Forget my point the use case is entirely different from my experience. `
>> This is a separate issue and has nothing to do with the original request.
>> —— 
>> 
>> In our use case the fact of not being able to control OpenEXR threads 
>> (assuming the thread pool is used) and not being able to know how much 
>> memory is used is enough to our application slow because it doesn’t know 
>> what is going on. We cannot block other tasks from happening because some 
>> part of the application has started decoding an OpenEXR file. 
>> To fix our issue the fact of being able to use the original application’s 
>> threads to do the decompressing and optionally to provide the buffers so 
>> that it can know which resources are taken is enough to fix the matter.
>> The issue is only visible when reading untiled and multi-layered files 
>> because their size is significant enough to take up to most of the resources 
>> of what the application offers.
>> 
>> On OpenEXR side this is implemented by 
>> https://github.com/openexr/openexr/pull/141 
>> <https://github.com/openexr/openexr/pull/141>
>> 
>> On OIIO side I dont think there’s much to do to implement this:
>> Extract the decompressing part of  LineBufferTask::execute 
>> <https://github.com/openexr/openexr/blob/develop/OpenEXR/IlmImf/ImfScanLineInputFile.cpp#L515>
>>  and do it in the OIIO read_native_scanlines function as a replacement of 
>> the setFramebuffer/read_pixels pair. Combined with the right call from the 
>> application using OIIO so that internally it can use the provided 
>> application buffers;
>> 
>> I am going to work on it when I get time and will notify you Larry if we get 
>> significant performance gains. 
>> 
>>> On 16 Sep 2016, at 19:12, Larry Gritz <l...@larrygritz.com 
>>> <mailto:l...@larrygritz.com>> wrote:
>>> 
>>> It is true that using OIIO's ImageCache to read a single file sequentially 
>>> can have wasteful memory consequences -- right after you've read the image, 
>>> you have a copy in the app's buffer that you requested, you still have a 
>>> copy in the ImageCache waiting around for the next time you need it, and 
>>> you may have a third copy of some or all of the pixels within libIlmImf's 
>>> internal data structures (if the file is still open). That's not really 
>>> what ImageCache is designed for, and I'm confident that's not how Soren is 
>>> trying to use it.
>>> 
>>> Soren is dealing with a texture system within a renderer. So that waste I 
>>> described above will disappear -- as the app requests additional texture 
>>> data, what's filling the cache will be paged out, and new pixels will come 
>>> in. The cache has a fixed maximum size. Also, in the context of an OIIO 
>>> TextureCache, there is no "app buffer", the IC's tile data itself is where 
>>> the texture is directly accessed from when doing texture filtering 
>>> operations.
>>> 
>>> It's clear that Soren's case is already dealing with tiled and MIP-mapped 
>>> files (right, Soren?). And if you're going to make tiles for use with 
>>> ImageCache, it's much better to use OIIO's "maketx" rather than OpenEXR's 
>>> "exrmaketiled". The maketx does a number of additional things besides just 
>>> tiling, including computing SHA-1 hashes on the file and storing that in 
>>> the header, so that the TextureSystem can automatically notice duplicate 
>>> textures and not read from the redundant files. That won't happen properly 
>>> if you use exrmaketiled.
>>> 
>>> We routinely use OIIO's texture cache to render frames that reference 1-2 
>>> TB of texture, spread over 10,000 or more files, using a maximum of 1GB 
>>> tile memory and 1000 max files open at once. Works smooth as can be. If 
>>> your use of ImageCache is resulting in "blowing up computer's RAM + swap" 
>>> and the kernel has to kill the app, either you're setting something wrong, 
>>> or there is a bug (or use case I haven't considered) that I desperately 
>>> want to examine and make better. I would love a detailed description of how 
>>> to reproduce this, so I can fix it.
>>> 
>>> All that is a red herring. What Soren is describing is a very real effect, 
>>> which is two-fold and completely independent of OIIO:
>>> 
>>> 1. The amount of memory that libIlmImf holds *per open file* as overhead or 
>>> internal buffers or whatever (I haven't tracked down exactly what it is) is 
>>> much larger than what libtiff holds as overhead per open file.
>>> 
>>> 2. libIlmImf seems to have a substantial amount of memory overhead *per 
>>> thread*, and that can really add up if you have a large thread pool. In 
>>> contrast, libtiff doesn't have a thread pool (for better or for worse), so 
>>> there isn't a per-thread component to its memory overhead.
>>> 
>>> 
>>> 
>>>> On Sep 16, 2016, at 6:13 AM, Alexandre 
>>>> <alexandre.gauthier-foic...@inria.fr 
>>>> <mailto:alexandre.gauthier-foic...@inria.fr>> wrote:
>>>> 
>>>> I think the bottleneck is in OpenImageIO's ImageCache rather than OpenEXR 
>>>> by itself. 
>>>> 
>>>> I’ve spent quite some time debugging OpenImageIO in this regard. The worst 
>>>> case scenario you can give to OpenImageIO is when trying to read untiled 
>>>> multi-layered EXR files.
>>>> Most of people seem to only be working with zip scanlines because this 
>>>> suits Nuke scan-line architecture perfectly but it is a nightmare in 
>>>> reality for all other applications that don’t work with scan-lines. 
>>>> 
>>>> The OpenImageIO cache can be set in auto-tile mode, in which case it will 
>>>> open/close the file multiple times to decode (so it is slower) but can use 
>>>> less memory because it doesn’t require to allocate as much big chunks of 
>>>> memory. 
>>>> When not set to auto-tile it will just decode the full image, meaning that 
>>>> OpenEXR will allocate a big chunk of memory to decompress, OpenImageIO 
>>>> will allocate a big chunk of memory to convert to the user requested data 
>>>> format. And here is the worst part, OpenImageIO will leave the file opened 
>>>> in the cache on the thread local storage of the calling thread.
>>>> 
>>>> And you might even go worse than that if you’ve got multiple threads 
>>>> trying to decode different untiled EXR files concurrently, then 
>>>> OpenImageIO will just blow up your computer’s RAM + swap and the kernel 
>>>> should kill your app very quickly.
>>>> 
>>>> 
>>>> There are a couple of workarounds:
>>>> 
>>>> - Make all your files go through an initial pass of converting them to 
>>>> tiled EXR files (with exrmaketiled)
>>>> - Don’t use OpenImageIO cache at all
>>>> 
>>>> The Foundry has come up with an extension (in a pull request) to let a 
>>>> chance to the application calling OpenEXR to pass its own buffers  
>>>> (instead of the ones used internally) so that the decompression of the EXR 
>>>> files could happen outside of OpenEXR itself in the calling application 
>>>> controlled memory and threads. 
>>>> 
>>>> This is very important if your application is going to do other stuff 
>>>> concurrently than just reading your 1 EXR file. 
>>>> 
>>>> On our side we are going to try and implement that in OpenImageIO so that 
>>>> in the same way you could pass your own buffers and threads to OpenImageIO 
>>>> which would in turn pass them to OpenEXR.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 16 Sep 2016, at 12:34, Søren Ragsdale <sor...@gmail.com 
>>>>> <mailto:sor...@gmail.com>> wrote:
>>>>> 
>>>>> Hello, OpenEXR devs. I've been doing some comparative rendering tests 
>>>>> I've found something a bit surprising. 
>>>>> 
>>>>> TIFF and EXR texture access *times* seems more or less the same, which is 
>>>>> fine because the underlying data is equivalent. (Same data type, 
>>>>> compression, tile size, etc.) But the RAM overhead seems much higher for 
>>>>> EXRs. We've got a 9GB render using TIFFs and a 13GB render using EXRs.
>>>>> 
>>>>> Does anyone have some theories why EXR texture access is requiring 4GB 
>>>>> more memory?
>>>>> 
>>>>> 
>>>>> Prman-20.11, OSL shaders, OIIO/TIFF textures:
>>>>> real 00:21:46
>>>>> VmRSS 9,063.45 MB
>>>>> OpenImageIO ImageCache statistics (shared) ver 1.7.3dev
>>>>> Options:  max_memory_MB=4000.0 max_open_files=100 autotile=64
>>>>>           autoscanline=0 automip=1 forcefloat=0 accept_untiled=1
>>>>>           accept_unmipped=1 read_before_insert=0 deduplicate=1
>>>>>           unassociatedalpha=0 failure_retries=0 
>>>>> Images : 1957 unique
>>>>>   ImageInputs : 136432 created, 100 current, 796 peak
>>>>>   Total size of all images referenced : 166.0 GB
>>>>>   Read from disk : 55.5 GB
>>>>>   File I/O time : 7h 2m 33.9s (16m 54.2s average per thread)
>>>>>   File open time only : 27m 44.0s
>>>>> 
>>>>> 
>>>>> Prman-20.11, OSL shaders, OIIO/EXR textures:
>>>>> real 00:21:14
>>>>> VmRSS 12,938.83 MB
>>>>> OpenImageIO ImageCache statistics (shared) ver 1.7.3dev
>>>>> Options:  max_memory_MB=4000.0 max_open_files=100 autotile=64
>>>>>           autoscanline=0 automip=1 forcefloat=0 accept_untiled=1
>>>>>           accept_unmipped=1 read_before_insert=0 deduplicate=1
>>>>>           unassociatedalpha=0 failure_retries=0 
>>>>> Images : 1957 unique
>>>>>   ImageInputs : 133168 created, 100 current, 771 peak
>>>>>   Total size of all images referenced : 166.0 GB
>>>>>   Read from disk : 55.5 GB
>>>>>   File I/O time : 6h 15m 42.1s (15m 1.7s average per thread)
>>>>>   File open time only : 1m 22.5s
>>>>> 
>>>>> _______________________________________________
>>>>> Openexr-devel mailing list
>>>>> Openexr-devel@nongnu.org <mailto:Openexr-devel@nongnu.org>
>>>>> https://lists.nongnu.org/mailman/listinfo/openexr-devel 
>>>>> <https://lists.nongnu.org/mailman/listinfo/openexr-devel>
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Openexr-devel mailing list
>>>> Openexr-devel@nongnu.org <mailto:Openexr-devel@nongnu.org>
>>>> https://lists.nongnu.org/mailman/listinfo/openexr-devel 
>>>> <https://lists.nongnu.org/mailman/listinfo/openexr-devel>
>>> 
>>> --
>>> Larry Gritz
>>> l...@larrygritz.com <mailto:l...@larrygritz.com>
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Openexr-devel mailing list
>>> Openexr-devel@nongnu.org <mailto:Openexr-devel@nongnu.org>
>>> https://lists.nongnu.org/mailman/listinfo/openexr-devel
>> 
> 
> --
> Larry Gritz
> l...@larrygritz.com <mailto:l...@larrygritz.com>
> 
>

_______________________________________________
Openexr-devel mailing list
Openexr-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/openexr-devel

Re: [Openexr-devel] EXR texture memory overhead

Reply via email to