Re: [Openexr-devel] EXR texture memory overhead

2016-09-16 Thread Ed Hanway
Regarding OpenEXR pull requests in general, we've been making some slow
progress on the backlog, handling the trivial ones first.  The develop
branch has the latest.

Regarding PR 141 specifically, looking back at my notes at the time, there
was some concern that the change originally proposed didn't catch all
unsupported cases, such as memory-mapped streams, but that doesn't appear
to be the case with the PR.  Being limited to scan-line files, it won't
help in the current case, but it can establish the pattern for another
change.  I'll merge it now.


On Fri, Sep 16, 2016 at 11:55 AM, Larry Gritz  wrote:

> So it's holding two tiles, per thread, per open tiled input file!
>
> 2 x RGBA half 64^2 tiles -> 64k per thread per file
> x 1000 files x 16 threads -> 1 GB, just for this source of overhead, not
> counting anything else like header data or other allocations
>
> For 64k (two reasonably sized tiles), maybe it would be better to do a
> stack allocation just when the extra decode buffer is needed, so there
> would be no call to malloc/free and no retained memory. Switch back to a
> true malloc only for the rare case of huge tiles where it doesn't seem safe
> to do a stack allocation.\?
>
>
> On Sep 16, 2016, at 11:45 AM, Karl Rasche  wrote:
>
>
>
> But it's not optimal for a use pattern like TextureSystem where the
>> typical request is ONE tile, and the next tile it wants may not even be
>> adjacent.
>>
>
> Whoops. What I pointed at look like its only the case if you read through
> Imf::InputFile.  If you use Imf::TiledInputFile (like in exrinput.cpp), I
> don't think you hit that buffering.
>
>
>
> Wait, I'm not quite sure how threads play into this. Is this allocated
>> framebuffer part of the ImageInptut itself? Do threads lock to use it? Or
>> is this per thread, per file?
>>
>
> I think the per-thread part is around ImfTiledInputFile.cpp::267
> .
> Each TileBuffer has an uncompressedData ptr which is what the compressor
> fills during decode.
>
> This *should* just be a tile per thread, but it does look like it's held
> over the lifetime of the ImfTiledInputFile.
>
>
>
> --
> Larry Gritz
> l...@larrygritz.com
>
>
>
> ___
> Openexr-devel mailing list
> Openexr-devel@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/openexr-devel
>
>


-- 
Ed Hanway
R Supervisor // ILM // SF
___
Openexr-devel mailing list
Openexr-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/openexr-devel


Re: [Openexr-devel] EXR texture memory overhead

2016-09-16 Thread Larry Gritz
So it's holding two tiles, per thread, per open tiled input file!

2 x RGBA half 64^2 tiles -> 64k per thread per file
x 1000 files x 16 threads -> 1 GB, just for this source of overhead, not 
counting anything else like header data or other allocations

For 64k (two reasonably sized tiles), maybe it would be better to do a stack 
allocation just when the extra decode buffer is needed, so there would be no 
call to malloc/free and no retained memory. Switch back to a true malloc only 
for the rare case of huge tiles where it doesn't seem safe to do a stack 
allocation.\?


> On Sep 16, 2016, at 11:45 AM, Karl Rasche  wrote:
> 
> 
> 
> But it's not optimal for a use pattern like TextureSystem where the typical 
> request is ONE tile, and the next tile it wants may not even be adjacent.
> 
> Whoops. What I pointed at look like its only the case if you read through 
> Imf::InputFile.  If you use Imf::TiledInputFile (like in exrinput.cpp), I 
> don't think you hit that buffering.
> 
> 
> 
> Wait, I'm not quite sure how threads play into this. Is this allocated 
> framebuffer part of the ImageInptut itself? Do threads lock to use it? Or is 
> this per thread, per file?
> 
> I think the per-thread part is around ImfTiledInputFile.cpp::267 
> .
>  Each TileBuffer has an uncompressedData ptr which is what the compressor 
> fills during decode. 
> 
> This *should* just be a tile per thread, but it does look like it's held over 
> the lifetime of the ImfTiledInputFile. 
> 
> 

--
Larry Gritz
l...@larrygritz.com


___
Openexr-devel mailing list
Openexr-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/openexr-devel


Re: [Openexr-devel] EXR texture memory overhead

2016-09-16 Thread Karl Rasche
But it's not optimal for a use pattern like TextureSystem where the typical
> request is ONE tile, and the next tile it wants may not even be adjacent.
>

Whoops. What I pointed at look like its only the case if you read through
Imf::InputFile.  If you use Imf::TiledInputFile (like in exrinput.cpp), I
don't think you hit that buffering.



Wait, I'm not quite sure how threads play into this. Is this allocated
> framebuffer part of the ImageInptut itself? Do threads lock to use it? Or
> is this per thread, per file?
>

I think the per-thread part is around ImfTiledInputFile.cpp::267
.
Each TileBuffer has an uncompressedData ptr which is what the compressor
fills during decode.

This *should* just be a tile per thread, but it does look like it's held
over the lifetime of the ImfTiledInputFile.
___
Openexr-devel mailing list
Openexr-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/openexr-devel


Re: [Openexr-devel] EXR texture memory overhead

2016-09-16 Thread Larry Gritz
Underscoring once again how critical it is that we start processing the long 
list of pending PRs. There are a lot of good ideas, bug fixes, and performance 
improvements just rotting in people's private repos, waiting for somebody to 
fold them into official OpenEXR releases.

Thanks, Alexandre. I haven't read your patch in detail, but I'm definitely 
interested in additions to IlmImf internals that could be employed for my use 
case in order to cut down on unnecessary copies and redundant buffer 
allocations. Though in my quick scan, it looks scanline-specific, whereas for 
my use case we're dealing primarily with tiled files, so a second patch will be 
necessary to do the equivalent for tiles.


> On Sep 16, 2016, at 10:51 AM, Alexandre  
> wrote:
> 
> Forget my point the use case is entirely different from my experience. `
> This is a separate issue and has nothing to do with the original request.
> —— 
> 
> In our use case the fact of not being able to control OpenEXR threads 
> (assuming the thread pool is used) and not being able to know how much memory 
> is used is enough to our application slow because it doesn’t know what is 
> going on. We cannot block other tasks from happening because some part of the 
> application has started decoding an OpenEXR file. 
> To fix our issue the fact of being able to use the original application’s 
> threads to do the decompressing and optionally to provide the buffers so that 
> it can know which resources are taken is enough to fix the matter.
> The issue is only visible when reading untiled and multi-layered files 
> because their size is significant enough to take up to most of the resources 
> of what the application offers.
> 
> On OpenEXR side this is implemented by 
> https://github.com/openexr/openexr/pull/141 
> 
> 
> On OIIO side I dont think there’s much to do to implement this:
> Extract the decompressing part of  LineBufferTask::execute 
> 
>  and do it in the OIIO read_native_scanlines function as a replacement of the 
> setFramebuffer/read_pixels pair. Combined with the right call from the 
> application using OIIO so that internally it can use the provided application 
> buffers;
> 
> I am going to work on it when I get time and will notify you Larry if we get 
> significant performance gains. 
> 
>> On 16 Sep 2016, at 19:12, Larry Gritz > > wrote:
>> 
>> It is true that using OIIO's ImageCache to read a single file sequentially 
>> can have wasteful memory consequences -- right after you've read the image, 
>> you have a copy in the app's buffer that you requested, you still have a 
>> copy in the ImageCache waiting around for the next time you need it, and you 
>> may have a third copy of some or all of the pixels within libIlmImf's 
>> internal data structures (if the file is still open). That's not really what 
>> ImageCache is designed for, and I'm confident that's not how Soren is trying 
>> to use it.
>> 
>> Soren is dealing with a texture system within a renderer. So that waste I 
>> described above will disappear -- as the app requests additional texture 
>> data, what's filling the cache will be paged out, and new pixels will come 
>> in. The cache has a fixed maximum size. Also, in the context of an OIIO 
>> TextureCache, there is no "app buffer", the IC's tile data itself is where 
>> the texture is directly accessed from when doing texture filtering 
>> operations.
>> 
>> It's clear that Soren's case is already dealing with tiled and MIP-mapped 
>> files (right, Soren?). And if you're going to make tiles for use with 
>> ImageCache, it's much better to use OIIO's "maketx" rather than OpenEXR's 
>> "exrmaketiled". The maketx does a number of additional things besides just 
>> tiling, including computing SHA-1 hashes on the file and storing that in the 
>> header, so that the TextureSystem can automatically notice duplicate 
>> textures and not read from the redundant files. That won't happen properly 
>> if you use exrmaketiled.
>> 
>> We routinely use OIIO's texture cache to render frames that reference 1-2 TB 
>> of texture, spread over 10,000 or more files, using a maximum of 1GB tile 
>> memory and 1000 max files open at once. Works smooth as can be. If your use 
>> of ImageCache is resulting in "blowing up computer's RAM + swap" and the 
>> kernel has to kill the app, either you're setting something wrong, or there 
>> is a bug (or use case I haven't considered) that I desperately want to 
>> examine and make better. I would love a detailed description of how to 
>> reproduce this, so I can fix it.
>> 
>> All that is a red herring. What Soren is describing is a very real effect, 
>> which is two-fold and completely independent of OIIO:
>> 
>> 1. The amount of memory that libIlmImf holds *per open file* 

Re: [Openexr-devel] EXR texture memory overhead

2016-09-16 Thread Alexandre
Forget my point the use case is entirely different from my experience. `
This is a separate issue and has nothing to do with the original request.
—— 

In our use case the fact of not being able to control OpenEXR threads (assuming 
the thread pool is used) and not being able to know how much memory is used is 
enough to our application slow because it doesn’t know what is going on. We 
cannot block other tasks from happening because some part of the application 
has started decoding an OpenEXR file. 
To fix our issue the fact of being able to use the original application’s 
threads to do the decompressing and optionally to provide the buffers so that 
it can know which resources are taken is enough to fix the matter.
The issue is only visible when reading untiled and multi-layered files because 
their size is significant enough to take up to most of the resources of what 
the application offers.

On OpenEXR side this is implemented by 
https://github.com/openexr/openexr/pull/141

On OIIO side I dont think there’s much to do to implement this:
Extract the decompressing part of  LineBufferTask::execute 

 and do it in the OIIO read_native_scanlines function as a replacement of the 
setFramebuffer/read_pixels pair. Combined with the right call from the 
application using OIIO so that internally it can use the provided application 
buffers;

I am going to work on it when I get time and will notify you Larry if we get 
significant performance gains. 

> On 16 Sep 2016, at 19:12, Larry Gritz  wrote:
> 
> It is true that using OIIO's ImageCache to read a single file sequentially 
> can have wasteful memory consequences -- right after you've read the image, 
> you have a copy in the app's buffer that you requested, you still have a copy 
> in the ImageCache waiting around for the next time you need it, and you may 
> have a third copy of some or all of the pixels within libIlmImf's internal 
> data structures (if the file is still open). That's not really what 
> ImageCache is designed for, and I'm confident that's not how Soren is trying 
> to use it.
> 
> Soren is dealing with a texture system within a renderer. So that waste I 
> described above will disappear -- as the app requests additional texture 
> data, what's filling the cache will be paged out, and new pixels will come 
> in. The cache has a fixed maximum size. Also, in the context of an OIIO 
> TextureCache, there is no "app buffer", the IC's tile data itself is where 
> the texture is directly accessed from when doing texture filtering operations.
> 
> It's clear that Soren's case is already dealing with tiled and MIP-mapped 
> files (right, Soren?). And if you're going to make tiles for use with 
> ImageCache, it's much better to use OIIO's "maketx" rather than OpenEXR's 
> "exrmaketiled". The maketx does a number of additional things besides just 
> tiling, including computing SHA-1 hashes on the file and storing that in the 
> header, so that the TextureSystem can automatically notice duplicate textures 
> and not read from the redundant files. That won't happen properly if you use 
> exrmaketiled.
> 
> We routinely use OIIO's texture cache to render frames that reference 1-2 TB 
> of texture, spread over 10,000 or more files, using a maximum of 1GB tile 
> memory and 1000 max files open at once. Works smooth as can be. If your use 
> of ImageCache is resulting in "blowing up computer's RAM + swap" and the 
> kernel has to kill the app, either you're setting something wrong, or there 
> is a bug (or use case I haven't considered) that I desperately want to 
> examine and make better. I would love a detailed description of how to 
> reproduce this, so I can fix it.
> 
> All that is a red herring. What Soren is describing is a very real effect, 
> which is two-fold and completely independent of OIIO:
> 
> 1. The amount of memory that libIlmImf holds *per open file* as overhead or 
> internal buffers or whatever (I haven't tracked down exactly what it is) is 
> much larger than what libtiff holds as overhead per open file.
> 
> 2. libIlmImf seems to have a substantial amount of memory overhead *per 
> thread*, and that can really add up if you have a large thread pool. In 
> contrast, libtiff doesn't have a thread pool (for better or for worse), so 
> there isn't a per-thread component to its memory overhead.
> 
> 
> 
>> On Sep 16, 2016, at 6:13 AM, Alexandre  
>> wrote:
>> 
>> I think the bottleneck is in OpenImageIO's ImageCache rather than OpenEXR by 
>> itself. 
>> 
>> I’ve spent quite some time debugging OpenImageIO in this regard. The worst 
>> case scenario you can give to OpenImageIO is when trying to read untiled 
>> multi-layered EXR files.
>> Most of people seem to only be working with zip scanlines because this suits 
>> Nuke scan-line architecture perfectly but it is a nightmare in 

Re: [Openexr-devel] EXR texture memory overhead

2016-09-16 Thread Larry Gritz
Aha, thanks, Karl.

So it's allocating an internal (retained, as long as the file is kept open) 
framebuffer that's a whole *row* of tiles. This is probably fine when you have 
a small number of files open at once and they are mostly reading whole images 
(so, sure, the app is probably asking for whole rows, or more, of tiles at a 
time).

But it's not optimal for a use pattern like TextureSystem where the typical 
request is ONE tile, and the next tile it wants may not even be adjacent.

Back-of-envelope: let's say our textures are RGBA, half data, 4k x 4k, with 
64x64 tiles. So a row of tiles is 2MB of this internal framebuffer overhead, 
per open texture file (which, as we've said, could be hundreds or thousands at 
once, so could easily be multiple GB of unaccounted overhead).

Perhaps, in light of this, it might be a good idea for this kind of access 
pattern if IlmImf had a way to communicate that a particular file was going to 
tend to read individual tiles independently, in which case the internal 
framebuffer scratch space could be just a single tile's worth, not a whole 
"tile row" of scratch space?

Wait, I'm not quite sure how threads play into this. Is this allocated 
framebuffer part of the ImageInptut itself? Do threads lock to use it? Or is 
this per thread, per file?


> On Sep 16, 2016, at 10:31 AM, Karl Rasche  wrote:
> 
> 1. The amount of memory that libIlmImf holds *per open file* as overhead or 
> internal buffers or whatever (I haven't tracked down exactly what it is) is 
> much larger than what libtiff holds as overhead per open file.
> 
> I *think* this is related to what you're seeing, at least in part
> 
> ImfInputFile.cpp line 678 
> 
>  
> 2. libIlmImf seems to have a substantial amount of memory overhead *per 
> thread*, and that can really add up if you have a large thread pool. In 
> contrast, libtiff doesn't have a thread pool (for better or for worse), so 
> there isn't a per-thread component to its memory overhead.
> 
> Some of that probably stems from the framebuffer model -- You don't decode 
> directly into the user-provided buffer, but instead into a temp buffer which 
> is copied into the user-provided buffer and reformatted as requested.
> 
> That avoids things like tons of extra decodes of a scanline strip of you are 
> walking it scanline by scanline. But in the context of texturing, where 
> you're always reading a full tile into cache, it's just overhead. There might 
> be a sneaky way to flush that backing data, like assigning a null framebuffer 
> or something. 
> 
> 
> 
> 

--
Larry Gritz
l...@larrygritz.com


___
Openexr-devel mailing list
Openexr-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/openexr-devel


Re: [Openexr-devel] EXR texture memory overhead

2016-09-16 Thread Larry Gritz
IIRC from last time I tested this (please don't trust my memory 100%), it was a 
substantial per-thread overhead issue. 

I have not combed through the IlmImf code to verify this, but what it smelled 
like to me is if each thread in the pool has a retained buffer that it uses as 
scratch space for the decoding/uncompresion, data format conversions, and so 
on. (You know, you have to read the compressed data into some buffer, and then 
perform the decompression into another buffer... that sort of thing.) I get the 
feeling that each thread in the pool has its own area for this, and it's 
probably quite large so that it can do these decode/conversions on a big batch 
of pixels at once.

Soren, I think it would be worth repeating your experiment with the renderer 
and OpenEXR set to use just one thread (or maybe 1, 2, 4, ...). Obviously, it's 
not quite apples-to-apples because the access pattern will be totally different 
as you change thread counts. But if you see this "unaccounted overhead" in the 
process size growing steadily with the number of threads (while the number of 
textures you access and the maximum open at once stays the same), then that 
tells you something.

Do you know how many threads you were using?

Remember, also, that Soren is not comparing ImageCache to *nothing*, he's 
comparing ImageCache with exr files to the same ImageCache with tiff files, 
rendering the same frame with the same texture access patterns. This really is 
a direct test of speed and overhead of the two file format libraries.


> On Sep 16, 2016, at 10:12 AM, Søren Ragsdale  wrote:
> 
> Thanks, Larry! My EXR textures are already mipmapped and tiled so I don't 
> think we're using any extra memory on auto-tile, but it certainly was true 
> that we were using OIIO with the default 100 file handle cache size. We'll 
> give it a go with a much higher value.
> 
> I'm still surprised that libIlmImf requires that much more memory. If our 
> handle cache size was limited to 100 files at a time, that's an extra 38.75 
> MB per texture. Even if it's caching all 1957 texture handles at once that's 
> still an extra 1.98 MB per texture.
> 
> On Fri, Sep 16, 2016 at 5:56 PM, Larry Gritz  > wrote:
> See how there were 1957 unique textures, but you created 144168 ImageInputs? 
> That means your file (handle) cache was too small, you were accessing the 
> files in an incoherent order and the "working set" was much larger than the 
> file cache size (100), so you ended up closing and reopening files a massive 
> number of times (73 times each, on average).
> 
> The solution to this is to raise the TextureSystem's (that is, the underlying 
> ImageCache's) "max_open_files" significantly. The default of 100 is extremely 
> conservative -- trying to keep you out of trouble on operating systems with a 
> low handle count per process, or with ImageCache's file operations being 
> potentially only a small portion of your process's file needs.
> 
> On a modern Linux system, you should be able to have several thousand files 
> open simultaneously. I would do that and see what it does to all your 
> figures. At the very least, it should substantially lower the "file open 
> time" and maybe the file I/O time overall.
> 
> But, it may make the memory issues worse -- if libIlmImf is internally 
> holding more memory per open file than libtiff does (my tests indicate this 
> as well), then having more files open at a time may exacerbate the problem. I 
> haven't tested this for a while, though, but I think that Karl is correct 
> that part of this is that each of the threads in OpenEXR's pool is holding a 
> bit of working memory, and that adds up, so the higher your thread count, the 
> more memory use overhead there will be for libIlmImf, and it's very hard for 
> the app to account for that or control it.
> 
> It is simply a fact that on a per-thread, per-file basis, libIlmImf uses more 
> overhead memory per open file than libtiff does. I have not looked into it 
> deeply enough to know if that's a good design tradeoff or not (for example, 
> maybe that memory is put to good use, and helps speed up some operations).
> 
> This is the kind of thing you would never notice reading OpenEXR files 
> sequentially, but in something like a texture system where you may have 
> thousands of files open simultaneously, an extra 256KB of overhead per file 
> adds up fast.
> 
> 
> > On Sep 16, 2016, at 3:34 AM, Søren Ragsdale  > > wrote:
> >
> > Hello, OpenEXR devs. I've been doing some comparative rendering tests I've 
> > found something a bit surprising.
> >
> > TIFF and EXR texture access *times* seems more or less the same, which is 
> > fine because the underlying data is equivalent. (Same data type, 
> > compression, tile size, etc.) But the RAM overhead seems much higher for 
> > EXRs. We've got a 9GB render using TIFFs and a 13GB render 

Re: [Openexr-devel] EXR texture memory overhead

2016-09-16 Thread Søren Ragsdale
Thanks, Larry! My EXR textures are already mipmapped and tiled so I don't
think we're using any extra memory on auto-tile, but it certainly was true
that we were using OIIO with the default 100 file handle cache size. We'll
give it a go with a much higher value.

I'm still surprised that libIlmImf requires that much more memory. If our
handle cache size was limited to 100 files at a time, that's an extra 38.75
MB per texture. Even if it's caching all 1957 texture handles at once
that's still an extra 1.98 MB per texture.

On Fri, Sep 16, 2016 at 5:56 PM, Larry Gritz  wrote:

> See how there were 1957 unique textures, but you created 144168
> ImageInputs? That means your file (handle) cache was too small, you were
> accessing the files in an incoherent order and the "working set" was much
> larger than the file cache size (100), so you ended up closing and
> reopening files a massive number of times (73 times each, on average).
>
> The solution to this is to raise the TextureSystem's (that is, the
> underlying ImageCache's) "max_open_files" significantly. The default of 100
> is extremely conservative -- trying to keep you out of trouble on operating
> systems with a low handle count per process, or with ImageCache's file
> operations being potentially only a small portion of your process's file
> needs.
>
> On a modern Linux system, you should be able to have several thousand
> files open simultaneously. I would do that and see what it does to all your
> figures. At the very least, it should substantially lower the "file open
> time" and maybe the file I/O time overall.
>
> But, it may make the memory issues worse -- if libIlmImf is internally
> holding more memory per open file than libtiff does (my tests indicate this
> as well), then having more files open at a time may exacerbate the problem.
> I haven't tested this for a while, though, but I think that Karl is correct
> that part of this is that each of the threads in OpenEXR's pool is holding
> a bit of working memory, and that adds up, so the higher your thread count,
> the more memory use overhead there will be for libIlmImf, and it's very
> hard for the app to account for that or control it.
>
> It is simply a fact that on a per-thread, per-file basis, libIlmImf uses
> more overhead memory per open file than libtiff does. I have not looked
> into it deeply enough to know if that's a good design tradeoff or not (for
> example, maybe that memory is put to good use, and helps speed up some
> operations).
>
> This is the kind of thing you would never notice reading OpenEXR files
> sequentially, but in something like a texture system where you may have
> thousands of files open simultaneously, an extra 256KB of overhead per file
> adds up fast.
>
>
> > On Sep 16, 2016, at 3:34 AM, Søren Ragsdale  wrote:
> >
> > Hello, OpenEXR devs. I've been doing some comparative rendering tests
> I've found something a bit surprising.
> >
> > TIFF and EXR texture access *times* seems more or less the same, which
> is fine because the underlying data is equivalent. (Same data type,
> compression, tile size, etc.) But the RAM overhead seems much higher for
> EXRs. We've got a 9GB render using TIFFs and a 13GB render using EXRs.
> >
> > Does anyone have some theories why EXR texture access is requiring 4GB
> more memory?
> >
> >
> > Prman-20.11, OSL shaders, OIIO/TIFF textures:
> > real 00:21:46
> > VmRSS 9,063.45 MB
> > OpenImageIO ImageCache statistics (shared) ver 1.7.3dev
> >   Options:  max_memory_MB=4000.0 max_open_files=100 autotile=64
> > autoscanline=0 automip=1 forcefloat=0 accept_untiled=1
> > accept_unmipped=1 read_before_insert=0 deduplicate=1
> > unassociatedalpha=0 failure_retries=0
> >   Images : 1957 unique
> > ImageInputs : 136432 created, 100 current, 796 peak
> > Total size of all images referenced : 166.0 GB
> > Read from disk : 55.5 GB
> > File I/O time : 7h 2m 33.9s (16m 54.2s average per thread)
> > File open time only : 27m 44.0s
> >
> >
> > Prman-20.11, OSL shaders, OIIO/EXR textures:
> > real 00:21:14
> > VmRSS 12,938.83 MB
> > OpenImageIO ImageCache statistics (shared) ver 1.7.3dev
> >   Options:  max_memory_MB=4000.0 max_open_files=100 autotile=64
> > autoscanline=0 automip=1 forcefloat=0 accept_untiled=1
> > accept_unmipped=1 read_before_insert=0 deduplicate=1
> > unassociatedalpha=0 failure_retries=0
> >   Images : 1957 unique
> > ImageInputs : 133168 created, 100 current, 771 peak
> > Total size of all images referenced : 166.0 GB
> > Read from disk : 55.5 GB
> > File I/O time : 6h 15m 42.1s (15m 1.7s average per thread)
> > File open time only : 1m 22.5s
> >
> > ___
> > Openexr-devel mailing list
> > Openexr-devel@nongnu.org
> > https://lists.nongnu.org/mailman/listinfo/openexr-devel
>
> --
> Larry Gritz
> l...@larrygritz.com
>
>
>

Re: [Openexr-devel] EXR texture memory overhead

2016-09-16 Thread Larry Gritz
It is true that using OIIO's ImageCache to read a single file sequentially can 
have wasteful memory consequences -- right after you've read the image, you 
have a copy in the app's buffer that you requested, you still have a copy in 
the ImageCache waiting around for the next time you need it, and you may have a 
third copy of some or all of the pixels within libIlmImf's internal data 
structures (if the file is still open). That's not really what ImageCache is 
designed for, and I'm confident that's not how Soren is trying to use it.

Soren is dealing with a texture system within a renderer. So that waste I 
described above will disappear -- as the app requests additional texture data, 
what's filling the cache will be paged out, and new pixels will come in. The 
cache has a fixed maximum size. Also, in the context of an OIIO TextureCache, 
there is no "app buffer", the IC's tile data itself is where the texture is 
directly accessed from when doing texture filtering operations.

It's clear that Soren's case is already dealing with tiled and MIP-mapped files 
(right, Soren?). And if you're going to make tiles for use with ImageCache, 
it's much better to use OIIO's "maketx" rather than OpenEXR's "exrmaketiled". 
The maketx does a number of additional things besides just tiling, including 
computing SHA-1 hashes on the file and storing that in the header, so that the 
TextureSystem can automatically notice duplicate textures and not read from the 
redundant files. That won't happen properly if you use exrmaketiled.

We routinely use OIIO's texture cache to render frames that reference 1-2 TB of 
texture, spread over 10,000 or more files, using a maximum of 1GB tile memory 
and 1000 max files open at once. Works smooth as can be. If your use of 
ImageCache is resulting in "blowing up computer's RAM + swap" and the kernel 
has to kill the app, either you're setting something wrong, or there is a bug 
(or use case I haven't considered) that I desperately want to examine and make 
better. I would love a detailed description of how to reproduce this, so I can 
fix it.

All that is a red herring. What Soren is describing is a very real effect, 
which is two-fold and completely independent of OIIO:

1. The amount of memory that libIlmImf holds *per open file* as overhead or 
internal buffers or whatever (I haven't tracked down exactly what it is) is 
much larger than what libtiff holds as overhead per open file.

2. libIlmImf seems to have a substantial amount of memory overhead *per 
thread*, and that can really add up if you have a large thread pool. In 
contrast, libtiff doesn't have a thread pool (for better or for worse), so 
there isn't a per-thread component to its memory overhead.



> On Sep 16, 2016, at 6:13 AM, Alexandre  
> wrote:
> 
> I think the bottleneck is in OpenImageIO's ImageCache rather than OpenEXR by 
> itself. 
> 
> I’ve spent quite some time debugging OpenImageIO in this regard. The worst 
> case scenario you can give to OpenImageIO is when trying to read untiled 
> multi-layered EXR files.
> Most of people seem to only be working with zip scanlines because this suits 
> Nuke scan-line architecture perfectly but it is a nightmare in reality for 
> all other applications that don’t work with scan-lines. 
> 
> The OpenImageIO cache can be set in auto-tile mode, in which case it will 
> open/close the file multiple times to decode (so it is slower) but can use 
> less memory because it doesn’t require to allocate as much big chunks of 
> memory. 
> When not set to auto-tile it will just decode the full image, meaning that 
> OpenEXR will allocate a big chunk of memory to decompress, OpenImageIO will 
> allocate a big chunk of memory to convert to the user requested data format. 
> And here is the worst part, OpenImageIO will leave the file opened in the 
> cache on the thread local storage of the calling thread.
> 
> And you might even go worse than that if you’ve got multiple threads trying 
> to decode different untiled EXR files concurrently, then OpenImageIO will 
> just blow up your computer’s RAM + swap and the kernel should kill your app 
> very quickly.
> 
> 
> There are a couple of workarounds:
> 
> - Make all your files go through an initial pass of converting them to tiled 
> EXR files (with exrmaketiled)
> - Don’t use OpenImageIO cache at all
> 
> The Foundry has come up with an extension (in a pull request) to let a chance 
> to the application calling OpenEXR to pass its own buffers  (instead of the 
> ones used internally) so that the decompression of the EXR files could happen 
> outside of OpenEXR itself in the calling application controlled memory and 
> threads. 
> 
> This is very important if your application is going to do other stuff 
> concurrently than just reading your 1 EXR file. 
> 
> On our side we are going to try and implement that in OpenImageIO so that in 
> the same way you could pass your own buffers and threads to 

Re: [Openexr-devel] EXR texture memory overhead

2016-09-16 Thread Larry Gritz
See how there were 1957 unique textures, but you created 144168 ImageInputs? 
That means your file (handle) cache was too small, you were accessing the files 
in an incoherent order and the "working set" was much larger than the file 
cache size (100), so you ended up closing and reopening files a massive number 
of times (73 times each, on average). 

The solution to this is to raise the TextureSystem's (that is, the underlying 
ImageCache's) "max_open_files" significantly. The default of 100 is extremely 
conservative -- trying to keep you out of trouble on operating systems with a 
low handle count per process, or with ImageCache's file operations being 
potentially only a small portion of your process's file needs.

On a modern Linux system, you should be able to have several thousand files 
open simultaneously. I would do that and see what it does to all your figures. 
At the very least, it should substantially lower the "file open time" and maybe 
the file I/O time overall.

But, it may make the memory issues worse -- if libIlmImf is internally holding 
more memory per open file than libtiff does (my tests indicate this as well), 
then having more files open at a time may exacerbate the problem. I haven't 
tested this for a while, though, but I think that Karl is correct that part of 
this is that each of the threads in OpenEXR's pool is holding a bit of working 
memory, and that adds up, so the higher your thread count, the more memory use 
overhead there will be for libIlmImf, and it's very hard for the app to account 
for that or control it.

It is simply a fact that on a per-thread, per-file basis, libIlmImf uses more 
overhead memory per open file than libtiff does. I have not looked into it 
deeply enough to know if that's a good design tradeoff or not (for example, 
maybe that memory is put to good use, and helps speed up some operations).

This is the kind of thing you would never notice reading OpenEXR files 
sequentially, but in something like a texture system where you may have 
thousands of files open simultaneously, an extra 256KB of overhead per file 
adds up fast.


> On Sep 16, 2016, at 3:34 AM, Søren Ragsdale  wrote:
> 
> Hello, OpenEXR devs. I've been doing some comparative rendering tests I've 
> found something a bit surprising. 
> 
> TIFF and EXR texture access *times* seems more or less the same, which is 
> fine because the underlying data is equivalent. (Same data type, compression, 
> tile size, etc.) But the RAM overhead seems much higher for EXRs. We've got a 
> 9GB render using TIFFs and a 13GB render using EXRs.
> 
> Does anyone have some theories why EXR texture access is requiring 4GB more 
> memory?
> 
> 
> Prman-20.11, OSL shaders, OIIO/TIFF textures:
> real 00:21:46
> VmRSS 9,063.45 MB
> OpenImageIO ImageCache statistics (shared) ver 1.7.3dev
>   Options:  max_memory_MB=4000.0 max_open_files=100 autotile=64
> autoscanline=0 automip=1 forcefloat=0 accept_untiled=1
> accept_unmipped=1 read_before_insert=0 deduplicate=1
> unassociatedalpha=0 failure_retries=0 
>   Images : 1957 unique
> ImageInputs : 136432 created, 100 current, 796 peak
> Total size of all images referenced : 166.0 GB
> Read from disk : 55.5 GB
> File I/O time : 7h 2m 33.9s (16m 54.2s average per thread)
> File open time only : 27m 44.0s
> 
> 
> Prman-20.11, OSL shaders, OIIO/EXR textures:
> real 00:21:14
> VmRSS 12,938.83 MB
> OpenImageIO ImageCache statistics (shared) ver 1.7.3dev
>   Options:  max_memory_MB=4000.0 max_open_files=100 autotile=64
> autoscanline=0 automip=1 forcefloat=0 accept_untiled=1
> accept_unmipped=1 read_before_insert=0 deduplicate=1
> unassociatedalpha=0 failure_retries=0 
>   Images : 1957 unique
> ImageInputs : 133168 created, 100 current, 771 peak
> Total size of all images referenced : 166.0 GB
> Read from disk : 55.5 GB
> File I/O time : 6h 15m 42.1s (15m 1.7s average per thread)
> File open time only : 1m 22.5s
> 
> ___
> Openexr-devel mailing list
> Openexr-devel@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/openexr-devel

--
Larry Gritz
l...@larrygritz.com



___
Openexr-devel mailing list
Openexr-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/openexr-devel


Re: [Openexr-devel] EXR texture memory overhead

2016-09-16 Thread Karl Rasche
Do you have any control over the decode thread pool size?

If so, you might try twisting that knob and see if it makes a difference. I
suppose you could be seeing differences due to the exr concurrency model.

(Is it disconcering to be spending nearly a half-hour opening files?)

Karl

On Friday, September 16, 2016, Søren Ragsdale  wrote:

> Hello, OpenEXR devs. I've been doing some comparative rendering tests I've
> found something a bit surprising.
>
> TIFF and EXR texture access *times* seems more or less the same, which is
> fine because the underlying data is equivalent. (Same data type,
> compression, tile size, etc.) But the RAM overhead seems much higher for
> EXRs. We've got a 9GB render using TIFFs and a 13GB render using EXRs.
>
> Does anyone have some theories why EXR texture access is requiring 4GB
> more memory?
>
>
> Prman-20.11, OSL shaders, OIIO/TIFF textures:
> real 00:21:46
> VmRSS 9,063.45 MB
> OpenImageIO ImageCache statistics (shared) ver 1.7.3dev
>   Options:  max_memory_MB=4000.0 max_open_files=100 autotile=64
> autoscanline=0 automip=1 forcefloat=0 accept_untiled=1
> accept_unmipped=1 read_before_insert=0 deduplicate=1
> unassociatedalpha=0 failure_retries=0
>   Images : 1957 unique
> ImageInputs : 136432 created, 100 current, 796 peak
> Total size of all images referenced : 166.0 GB
> Read from disk : 55.5 GB
> File I/O time : 7h 2m 33.9s (16m 54.2s average per thread)
> File open time only : 27m 44.0s
>
>
> Prman-20.11, OSL shaders, OIIO/EXR textures:
> real 00:21:14
> VmRSS 12,938.83 MB
> OpenImageIO ImageCache statistics (shared) ver 1.7.3dev
>   Options:  max_memory_MB=4000.0 max_open_files=100 autotile=64
> autoscanline=0 automip=1 forcefloat=0 accept_untiled=1
> accept_unmipped=1 read_before_insert=0 deduplicate=1
> unassociatedalpha=0 failure_retries=0
>   Images : 1957 unique
> ImageInputs : 133168 created, 100 current, 771 peak
> Total size of all images referenced : 166.0 GB
> Read from disk : 55.5 GB
> File I/O time : 6h 15m 42.1s (15m 1.7s average per thread)
> File open time only : 1m 22.5s
>
>
___
Openexr-devel mailing list
Openexr-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/openexr-devel