Thank you so much for this super detailed answer!
Makes a lot of sense and I can confirm that your solutions work like charm.

Thanks again!

On Fri, Nov 12, 2021 at 9:02 AM Larry Gritz <l...@larrygritz.com> wrote:

> OK, I have a definitive answer for you, as well as a solution. It requires
> a bit of a deep dive into OIIO behavior and design.
>
> So, as background, you need to know a couple of things that are the key to
> it all:
>
> ImageBuf when reading files from disk, by default under most
> circumstances, is backed by ImageCache. That is, it usually does not read
> in the whole image, but accesses pixels through the cache. This helps in a
> number of ways: an ImageBuf with a huge image (or for that matter, a large
> number of reading ImageBufs simultaneously active in your program) does not
> consume memory of the full size of each image of every ImageBuf, but rather
> the total memory is the cache size (which is adjustable but defaults to 256
> MB!). Also, because it's only reading parts of the image as needed, access
> patterns that only need to touch a sub-area of the image do a lot less I/O
> than if they'd had to read the whole image in.
>
> (Aside: we do pay a bit of a performance penalty for this going through
> the ImageCache all the time, but it's not very large for most access
> patterns, and it's a fair tradeoff to have the ability to literally have
> thousands of IB's that appear to hold TBs of pixels, but actually they are
> all together taking as little as a few GB in RAM.)
>
> It just so happens that this large image (2372 x 1250, 71 channels of
> float) would total 803 MB. Remember I said that the cache size defaults to
> 256MB? Now, the fact that this one image is larger than the cache size
> would not be a problem... for most use cases where you access the pixels
> coherently. For example, if you access pixel (0,0), do everything you need
> with that, then pixel (1,0), and so on, never needing to return once you
> move on to the next pixel, you will basically see no penalty at all for the
> caching, because for that access pattern, each tile only is read once.
>
> Unfortunately, the way the ImageBufAlgo::channels() function works is that
> it copies one listed channel at a time. This is the worst case for this
> caching situation, if the image can't fully reside in the cache, because
> you first copy Red, starting with (0,0) and making your way to (2371,1249),
> and by the time you finish, you've had to throw the early tiles out
> already. Now you copy green, and when it wants green for (0,0) it's gone,
> so it has to read that tile again, and so on.
>
> So a single image that's bigger than the cache size, plus this particular
> operation ("channels") that causes you to have to traverse the whole image
> multiple times, is the "perfect storm of badness" for the cache.
>
> Why didn't this affect oiiotool? Basically, because oiiotool initializes
> the cache to 4 GB by default. The image fits, so it's super fast! In fact,
> we can ask oiiotool to print some stats for us (I'm reproducing only the
> relevant parts here):
>
> $ oiiotool *-runstats* test_multichannel_black.exr --ch R,G,B,A -o out.exr
>
> oiiotool runtime statistics:
>   Total time: 3.35s
>   Peak memory:    933.0 MB
>
> OpenImageIO ImageCache statistics (shared) ver 2.4.0dev
>     Total pixel data size of all images referenced : 803.1 MB
>     Pixel data read : 843.1 MB
>     File I/O time : 30.0s (1.9s average per thread, for 16 threads)
>   Tiles: 861 created, 760 current, 760 peak
>     total tile requests : 47500
>     main cache misses : 861 (1.81263%)
>     redundant reads: 0 tiles, 0 B
>     Peak cache memory : 843.1 MB
>
>
> A few things to note: The total memory in the image was 803 MB as I noted
> earlier, and it did 843 MB of I/O (I suspect the reason for the discrepancy
> is that the resolution 2372x1250 are not multiples of the tile size 64x64,
> so probably my statistics are over-counting the right-most column and
> bottom-most row of partially-filled tiles as if they were whole tiles). And
> also notice that the "main cache misses" are pretty low at 1.8%, and it
> says there are 0 tiles / 0 bytes of redundant reads -- that means no tiles
> were read from disk more than once. There's 30s of I/O, but that's spread
> over 16 threads and largely happening in parallel, so total runtime on my
> laptop was only 3.35s.
>
> Just to test this theory, we can set oiiotool's cache size lower, to match
> the ImageCache default:
>
> $ oiiotool -runstats *--cache 256* test_multichannel_black.exr --ch
> R,G,B,A -o out.exr
>
> oiiotool runtime statistics:
>   Total time: 1m 51.85s
>   Peak memory:    593.1 MB
>
> OpenImageIO ImageCache statistics (shared) ver 2.4.0dev
>     Total pixel data size of all images referenced : 803.1 MB
>     Pixel data read : 31.7 GB
>     File I/O time : 18m 28.6s (1m 9.3s average per thread, for 16 threads)
>   Tiles: 35247 created, 230 current, 243 peak
>     total tile requests : 47500
>     micro-cache misses : 47500 (100%)
>     main cache misses : 35247 (74.2042%)
>     redundant reads: 28491 tiles, 30.9 GB
>     Peak cache memory : 255.2 MB
>
>
> Yes, this is the slow version like when we use your python script. And we
> can see exactly what's happening in the stats: instead of reading in the
> neighbourhood of 800MB, we're read 31.7 GB, and 30.9 GB of that is
> redundant! Wow, crazy, and so the total I/O time of 18.5 minutes (across
> all threads) and wall-clock runtime of almost 2 minutes should not be
> surprising.
>
> With me so far?
>
> Now we just need to figure out how to fix the problem for your python
> script. There are two ways, and they are both easy!
>
> Method #1: increase the cache size
>
> *oiio.ImageCache().attribute("max_memory_MB", 1000)*
> buf = oiio.ImageBuf('test_multichannel_black.exr')
> obuf = oiio.ImageBufAlgo.channels(buf, ('R', 'G', 'B', 'A'))
> obuf.write('out.exr')
>
>
> Just grab the "default" global cache, and set the "max_memory_MB"
> attribute to 1000 (instead of 256). Boom, runs in 3.7s.
>
> Method #2: force an immediate read
>
> buf = oiio.ImageBuf('test_multichannel_black.exr')
> *buf.read(force = True)*
> obuf = oiio.ImageBufAlgo.channels(buf, ('R', 'G', 'B', 'A'))
> obuf.write('out.exr')
>
>
> In your original script, you declare the ImageBuf and point it to the
> file... but that doesn't read it. The read happens implicitly in the
> channels() call -- as it tries to access pixels, they are requested from
> the cache, and actual disk reads happen lazily as they are needed. That's
> the behavior we usually want.
>
> But there is an explicit ImageBuf.read() call with which you can declare
> that you really do want to read the whole image now... more or less --
> depending on a number of conditions, it may still fall back on the
> ImageCache. But it does have an optional 'force' parameter, and if you pass
> force=True, then it means "bypass the ImageCache, read the whole image into
> local ImageBuf-owned memory NOW, even if it's large." (Obviously, the
> possible down-side is that if the image is truly huge, or you have a lot of
> ImageBuf's around, maybe it's not a good idea to force them all to have
> their images fully resident in RAM, you don't get the memory management
> benefits of their using the ImageCache. But you're an adult, you can make
> that choice if you want. In fact, oiiotool also cheats a little and for
> images that are smaller than 4k x 2k x 4 chans, it also forces immediate
> reads just for speed sake.)
>
> So when we say buf.read(force=True), we're bypassing the cache, just
> reading the whole thing, and so again, we end up with a really fast
> runtime, in fact about 2 seconds on my laptop. (When you completely bypass
> the cache, there are a number of important shortcuts and other
> optimizations.)
>
>
> That was a little long-winded, but I figured that there was a lot to gain
> from really explaining all the gory details.
>
> Oh, one more thing. I also tried the oiiotool version with --cache 512,
> i.e. a 512MB cache (note: smaller than the image!) and although there is a
> little bit of redundant I/O visible in the stats, it's is still super fast,
> a little more than 3 seconds, basically the same speed as when we used the
> 4GB cache. So even in the pathological case and with the poor access
> pattern of channels(), it's not critical to use a cache that's larger than
> the images! Cache size 400 MB also is fine, but I/O is up to 4 seconds. At
> 350MB, it takes 7 seconds. At 300 MB, it's 11 seconds. The cache tends to
> work pretty well even when stressed. But it has its limits, and by the time
> we lower it all the way down to 256 MB, for this case, it crosses some
> threshold where it's just thrashing and performance is ruined, and the
> operation takes a couple of minutes to complete.
>
> -- lg
>
>
> On Nov 10, 2021, at 2:08 PM, Daniel Flehner Heen <flehnerhee...@gmail.com>
> wrote:
>
> Brilliant! Thanks!
> It's still slow.
> I'll send you a file off list.
>
> On Wed, Nov 10, 2021 at 10:46 PM Larry Gritz <l...@larrygritz.com> wrote:
>
>> oiiotool original.exr --mulc 0.0* --eraseattrib ".*"* -o black.exr
>>
>> But double check the timing -- just in case it is somehow the metadata
>> that is messing with the performance!
>>
>>
>> On Nov 10, 2021, at 1:26 PM, Daniel Flehner Heen <flehnerhee...@gmail.com>
>> wrote:
>>
>> I created the black exr and it is indeed small. Still takes a long time
>> extracting the channels, so it's good for testing.
>> My only concern is that some of the metadata contains confidential
>> information. Is there an easy way of wiping the custom metadata with
>> otiotool?
>>
>> In parallel with writing this email, I'm checking with my supervisors if
>> I can send you the image (preferably off list.)
>>
>> On Wed, Nov 10, 2021 at 9:55 PM Larry Gritz <l...@larrygritz.com> wrote:
>>
>>> Interesting! As long as there's some easily measurable perf difference
>>> (like 2x is fine), I should be able to figure out what's going on. Though
>>> it's weird that your original image is more like 30x.
>>>
>>> Let's try something else. Assuming your original image is not something
>>> you can share, what about this:
>>>
>>>     oiiotool original.exr --mulc 0.0 -o black.exr
>>>
>>> That should make the pixels black, but retain all the metadata, channel
>>> names, and data types of the original that had a large performance
>>> difference. It should be a small file (black compresses very well) and will
>>> obviously contain no imagery that is secret.
>>>
>>> In one "part" of an exr file, all channels are either deep or flat, so
>>> "deep" is all or none of the channels in that part/subimage.
>>>
>>>
>>> On Nov 10, 2021, at 12:30 PM, Daniel Flehner Heen <
>>> flehnerhee...@gmail.com> wrote:
>>>
>>> Hi, Larry!
>>>
>>> Thanks for looking into this. I created an image with IBA.noise() in 70
>>> channels and there still is a performance difference.
>>> Not that big though: 2.3 seconds (python) vs 0.9 (oiiotool)
>>>
>>> Not sure if it makes a difference, but the slow image has a mix of float
>>> and half channels and is tiled 64x64.
>>> Don't know exactly if any of the channels are deep. Will ImageBuf.deep
>>> return True if any channel is deep?
>>> a buf.deep call returns False.
>>>
>>>
>>> On Wed, Nov 10, 2021 at 6:16 PM Larry Gritz <l...@larrygritz.com> wrote:
>>>
>>>> Well that sure is weird!  Usually, both the python bindings and the
>>>> oiiotool operation are pretty thin wrappers around the corresponding
>>>> ImageBufAlgo function which is doing almost all the work. A big perf
>>>> difference is not expected.
>>>>
>>>> Can I assume that this is reproducible by any 70 channel image? Like I
>>>> can just use oiiotool to generate a black image and see a big speed
>>>> disparity like this? Or is it in any way specific to the particular image?
>>>>
>>>> Wednesdays are pretty packed, but I'l try to take a look at this today
>>>> if I can.
>>>>
>>>>
>>>> On Nov 10, 2021, at 4:08 AM, Daniel Flehner Heen <
>>>> flehnerhee...@gmail.com> wrote:
>>>>
>>>> Hi!
>>>>
>>>> I'm attempting to extract only the RGBA channels of an EXR with 70+
>>>> channels.
>>>>
>>>> Using:
>>>> oiiotool -i:ch=R,G,B,A /path/to/gigantic.exr -o manageable.exr
>>>> takes about 8 seconds.
>>>>
>>>> In python (2.7 !caugh..):
>>>> buf = oiio.ImageBuf('/path/to/gigantic.exr')
>>>> obuf = oiio.ImageBufAlgo.channels(buf, ('R', 'G', 'B', 'A'))
>>>> obuf.write('manageable.exr')
>>>> takes 4+ minutes
>>>>
>>>> I tried only extracting one channel and it took the exact amount of
>>>> time. I expect a performance hit using python but this seems a bit off. I
>>>> suspect the python bindings are reading all the channels even though I'm
>>>> asking for a specific few.
>>>> I might of course be going about this completely wrong, in which case a
>>>> pointer in the right direction would be highly appreciated :)
>>>>
>>>> Straight read and write of the same file is done quickly.
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>> -Daniel
>>>> _______________________________________________
>>>> Oiio-dev mailing list
>>>> Oiio-dev@lists.openimageio.org
>>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>>>>
>>>>
>>>> --
>>>> Larry Gritz
>>>> l...@larrygritz.com
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Oiio-dev mailing list
>>>> Oiio-dev@lists.openimageio.org
>>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>>>>
>>>
>>>
>>> --
>>> -Daniel
>>> _______________________________________________
>>> Oiio-dev mailing list
>>> Oiio-dev@lists.openimageio.org
>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>>>
>>>
>>> --
>>> Larry Gritz
>>> l...@larrygritz.com
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Oiio-dev mailing list
>>> Oiio-dev@lists.openimageio.org
>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>>>
>>
>>
>> --
>> -Daniel
>> _______________________________________________
>> Oiio-dev mailing list
>> Oiio-dev@lists.openimageio.org
>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>>
>>
>> --
>> Larry Gritz
>> l...@larrygritz.com
>>
>>
>>
>>
>> _______________________________________________
>> Oiio-dev mailing list
>> Oiio-dev@lists.openimageio.org
>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>>
>
>
> --
> -Daniel
> _______________________________________________
> Oiio-dev mailing list
> Oiio-dev@lists.openimageio.org
> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>
>
> --
> Larry Gritz
> l...@larrygritz.com
>
>
>
>
> _______________________________________________
> Oiio-dev mailing list
> Oiio-dev@lists.openimageio.org
> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
>


-- 
-Daniel
_______________________________________________
Oiio-dev mailing list
Oiio-dev@lists.openimageio.org
http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

Reply via email to