Aha, thanks, Karl.
So it's allocating an internal (retained, as long as the file is kept open)
framebuffer that's a whole *row* of tiles. This is probably fine when you have
a small number of files open at once and they are mostly reading whole images
(so, sure, the app is probably asking for whole rows, or more, of tiles at a
But it's not optimal for a use pattern like TextureSystem where the typical
request is ONE tile, and the next tile it wants may not even be adjacent.
Back-of-envelope: let's say our textures are RGBA, half data, 4k x 4k, with
64x64 tiles. So a row of tiles is 2MB of this internal framebuffer overhead,
per open texture file (which, as we've said, could be hundreds or thousands at
once, so could easily be multiple GB of unaccounted overhead).
Perhaps, in light of this, it might be a good idea for this kind of access
pattern if IlmImf had a way to communicate that a particular file was going to
tend to read individual tiles independently, in which case the internal
framebuffer scratch space could be just a single tile's worth, not a whole
"tile row" of scratch space?
Wait, I'm not quite sure how threads play into this. Is this allocated
framebuffer part of the ImageInptut itself? Do threads lock to use it? Or is
this per thread, per file?
> On Sep 16, 2016, at 10:31 AM, Karl Rasche <karlras...@gmail.com> wrote:
> 1. The amount of memory that libIlmImf holds *per open file* as overhead or
> internal buffers or whatever (I haven't tracked down exactly what it is) is
> much larger than what libtiff holds as overhead per open file.
> I *think* this is related to what you're seeing, at least in part
> ImfInputFile.cpp line 678
> 2. libIlmImf seems to have a substantial amount of memory overhead *per
> thread*, and that can really add up if you have a large thread pool. In
> contrast, libtiff doesn't have a thread pool (for better or for worse), so
> there isn't a per-thread component to its memory overhead.
> Some of that probably stems from the framebuffer model -- You don't decode
> directly into the user-provided buffer, but instead into a temp buffer which
> is copied into the user-provided buffer and reformatted as requested.
> That avoids things like tons of extra decodes of a scanline strip of you are
> walking it scanline by scanline. But in the context of texturing, where
> you're always reading a full tile into cache, it's just overhead. There might
> be a sneaky way to flush that backing data, like assigning a null framebuffer
> or something.
Openexr-devel mailing list