OK, I have a definitive answer for you, as well as a solution. It requires a 
bit of a deep dive into OIIO behavior and design.

So, as background, you need to know a couple of things that are the key to it 
all:

ImageBuf when reading files from disk, by default under most circumstances, is 
backed by ImageCache. That is, it usually does not read in the whole image, but 
accesses pixels through the cache. This helps in a number of ways: an ImageBuf 
with a huge image (or for that matter, a large number of reading ImageBufs 
simultaneously active in your program) does not consume memory of the full size 
of each image of every ImageBuf, but rather the total memory is the cache size 
(which is adjustable but defaults to 256 MB!). Also, because it's only reading 
parts of the image as needed, access patterns that only need to touch a 
sub-area of the image do a lot less I/O than if they'd had to read the whole 
image in.

(Aside: we do pay a bit of a performance penalty for this going through the 
ImageCache all the time, but it's not very large for most access patterns, and 
it's a fair tradeoff to have the ability to literally have thousands of IB's 
that appear to hold TBs of pixels, but actually they are all together taking as 
little as a few GB in RAM.)

It just so happens that this large image (2372 x 1250, 71 channels of float) 
would total 803 MB. Remember I said that the cache size defaults to 256MB? Now, 
the fact that this one image is larger than the cache size would not be a 
problem... for most use cases where you access the pixels coherently. For 
example, if you access pixel (0,0), do everything you need with that, then 
pixel (1,0), and so on, never needing to return once you move on to the next 
pixel, you will basically see no penalty at all for the caching, because for 
that access pattern, each tile only is read once.

Unfortunately, the way the ImageBufAlgo::channels() function works is that it 
copies one listed channel at a time. This is the worst case for this caching 
situation, if the image can't fully reside in the cache, because you first copy 
Red, starting with (0,0) and making your way to (2371,1249), and by the time 
you finish, you've had to throw the early tiles out already. Now you copy 
green, and when it wants green for (0,0) it's gone, so it has to read that tile 
again, and so on.

So a single image that's bigger than the cache size, plus this particular 
operation ("channels") that causes you to have to traverse the whole image 
multiple times, is the "perfect storm of badness" for the cache.

Why didn't this affect oiiotool? Basically, because oiiotool initializes the 
cache to 4 GB by default. The image fits, so it's super fast! In fact, we can 
ask oiiotool to print some stats for us (I'm reproducing only the relevant 
parts here):

$ oiiotool -runstats test_multichannel_black.exr --ch R,G,B,A -o out.exr

oiiotool runtime statistics:
  Total time: 3.35s
  Peak memory:    933.0 MB

OpenImageIO ImageCache statistics (shared) ver 2.4.0dev
    Total pixel data size of all images referenced : 803.1 MB
    Pixel data read : 843.1 MB
    File I/O time : 30.0s (1.9s average per thread, for 16 threads)
  Tiles: 861 created, 760 current, 760 peak
    total tile requests : 47500
    main cache misses : 861 (1.81263%)
    redundant reads: 0 tiles, 0 B
    Peak cache memory : 843.1 MB

A few things to note: The total memory in the image was 803 MB as I noted 
earlier, and it did 843 MB of I/O (I suspect the reason for the discrepancy is 
that the resolution 2372x1250 are not multiples of the tile size 64x64, so 
probably my statistics are over-counting the right-most column and bottom-most 
row of partially-filled tiles as if they were whole tiles). And also notice 
that the "main cache misses" are pretty low at 1.8%, and it says there are 0 
tiles / 0 bytes of redundant reads -- that means no tiles were read from disk 
more than once. There's 30s of I/O, but that's spread over 16 threads and 
largely happening in parallel, so total runtime on my laptop was only 3.35s.

Just to test this theory, we can set oiiotool's cache size lower, to match the 
ImageCache default:

$ oiiotool -runstats --cache 256 test_multichannel_black.exr --ch R,G,B,A -o 
out.exr

oiiotool runtime statistics:
  Total time: 1m 51.85s
  Peak memory:    593.1 MB

OpenImageIO ImageCache statistics (shared) ver 2.4.0dev
    Total pixel data size of all images referenced : 803.1 MB
    Pixel data read : 31.7 GB
    File I/O time : 18m 28.6s (1m 9.3s average per thread, for 16 threads)
  Tiles: 35247 created, 230 current, 243 peak
    total tile requests : 47500
    micro-cache misses : 47500 (100%)
    main cache misses : 35247 (74.2042%)
    redundant reads: 28491 tiles, 30.9 GB
    Peak cache memory : 255.2 MB

Yes, this is the slow version like when we use your python script. And we can 
see exactly what's happening in the stats: instead of reading in the 
neighbourhood of 800MB, we're read 31.7 GB, and 30.9 GB of that is redundant! 
Wow, crazy, and so the total I/O time of 18.5 minutes (across all threads) and 
wall-clock runtime of almost 2 minutes should not be surprising.

With me so far?

Now we just need to figure out how to fix the problem for your python script. 
There are two ways, and they are both easy!

Method #1: increase the cache size

oiio.ImageCache().attribute("max_memory_MB", 1000)
buf = oiio.ImageBuf('test_multichannel_black.exr')
obuf = oiio.ImageBufAlgo.channels(buf, ('R', 'G', 'B', 'A'))
obuf.write('out.exr')

Just grab the "default" global cache, and set the "max_memory_MB" attribute to 
1000 (instead of 256). Boom, runs in 3.7s.

Method #2: force an immediate read

buf = oiio.ImageBuf('test_multichannel_black.exr')
buf.read(force = True)
obuf = oiio.ImageBufAlgo.channels(buf, ('R', 'G', 'B', 'A'))
obuf.write('out.exr')

In your original script, you declare the ImageBuf and point it to the file... 
but that doesn't read it. The read happens implicitly in the channels() call -- 
as it tries to access pixels, they are requested from the cache, and actual 
disk reads happen lazily as they are needed. That's the behavior we usually 
want.

But there is an explicit ImageBuf.read() call with which you can declare that 
you really do want to read the whole image now... more or less -- depending on 
a number of conditions, it may still fall back on the ImageCache. But it does 
have an optional 'force' parameter, and if you pass force=True, then it means 
"bypass the ImageCache, read the whole image into local ImageBuf-owned memory 
NOW, even if it's large." (Obviously, the possible down-side is that if the 
image is truly huge, or you have a lot of ImageBuf's around, maybe it's not a 
good idea to force them all to have their images fully resident in RAM, you 
don't get the memory management benefits of their using the ImageCache. But 
you're an adult, you can make that choice if you want. In fact, oiiotool also 
cheats a little and for images that are smaller than 4k x 2k x 4 chans, it also 
forces immediate reads just for speed sake.)

So when we say buf.read(force=True), we're bypassing the cache, just reading 
the whole thing, and so again, we end up with a really fast runtime, in fact 
about 2 seconds on my laptop. (When you completely bypass the cache, there are 
a number of important shortcuts and other optimizations.)


That was a little long-winded, but I figured that there was a lot to gain from 
really explaining all the gory details.

Oh, one more thing. I also tried the oiiotool version with --cache 512, i.e. a 
512MB cache (note: smaller than the image!) and although there is a little bit 
of redundant I/O visible in the stats, it's is still super fast, a little more 
than 3 seconds, basically the same speed as when we used the 4GB cache. So even 
in the pathological case and with the poor access pattern of channels(), it's 
not critical to use a cache that's larger than the images! Cache size 400 MB 
also is fine, but I/O is up to 4 seconds. At 350MB, it takes 7 seconds. At 300 
MB, it's 11 seconds. The cache tends to work pretty well even when stressed. 
But it has its limits, and by the time we lower it all the way down to 256 MB, 
for this case, it crosses some threshold where it's just thrashing and 
performance is ruined, and the operation takes a couple of minutes to complete.

        -- lg


> On Nov 10, 2021, at 2:08 PM, Daniel Flehner Heen <flehnerhee...@gmail.com> 
> wrote:
> 
> Brilliant! Thanks!
> It's still slow.
> I'll send you a file off list. 
> 
> On Wed, Nov 10, 2021 at 10:46 PM Larry Gritz <l...@larrygritz.com 
> <mailto:l...@larrygritz.com>> wrote:
> oiiotool original.exr --mulc 0.0 --eraseattrib ".*" -o black.exr
> 
> But double check the timing -- just in case it is somehow the metadata that 
> is messing with the performance!
> 
> 
>> On Nov 10, 2021, at 1:26 PM, Daniel Flehner Heen <flehnerhee...@gmail.com 
>> <mailto:flehnerhee...@gmail.com>> wrote:
>> 
>> I created the black exr and it is indeed small. Still takes a long time 
>> extracting the channels, so it's good for testing. 
>> My only concern is that some of the metadata contains confidential 
>> information. Is there an easy way of wiping the custom metadata with 
>> otiotool?
>> 
>> In parallel with writing this email, I'm checking with my supervisors if I 
>> can send you the image (preferably off list.)  
>> 
>> On Wed, Nov 10, 2021 at 9:55 PM Larry Gritz <l...@larrygritz.com 
>> <mailto:l...@larrygritz.com>> wrote:
>> Interesting! As long as there's some easily measurable perf difference (like 
>> 2x is fine), I should be able to figure out what's going on. Though it's 
>> weird that your original image is more like 30x.
>> 
>> Let's try something else. Assuming your original image is not something you 
>> can share, what about this:
>> 
>>     oiiotool original.exr --mulc 0.0 -o black.exr
>> 
>> That should make the pixels black, but retain all the metadata, channel 
>> names, and data types of the original that had a large performance 
>> difference. It should be a small file (black compresses very well) and will 
>> obviously contain no imagery that is secret.
>> 
>> In one "part" of an exr file, all channels are either deep or flat, so 
>> "deep" is all or none of the channels in that part/subimage.
>> 
>> 
>>> On Nov 10, 2021, at 12:30 PM, Daniel Flehner Heen <flehnerhee...@gmail.com 
>>> <mailto:flehnerhee...@gmail.com>> wrote:
>>> 
>>> Hi, Larry!
>>> 
>>> Thanks for looking into this. I created an image with IBA.noise() in 70 
>>> channels and there still is a performance difference. 
>>> Not that big though: 2.3 seconds (python) vs 0.9 (oiiotool)
>>> 
>>> Not sure if it makes a difference, but the slow image has a mix of float 
>>> and half channels and is tiled 64x64.
>>> Don't know exactly if any of the channels are deep. Will ImageBuf.deep 
>>> return True if any channel is deep? 
>>> a buf.deep call returns False.
>>>  
>>> 
>>> On Wed, Nov 10, 2021 at 6:16 PM Larry Gritz <l...@larrygritz.com 
>>> <mailto:l...@larrygritz.com>> wrote:
>>> Well that sure is weird!  Usually, both the python bindings and the 
>>> oiiotool operation are pretty thin wrappers around the corresponding 
>>> ImageBufAlgo function which is doing almost all the work. A big perf 
>>> difference is not expected.
>>> 
>>> Can I assume that this is reproducible by any 70 channel image? Like I can 
>>> just use oiiotool to generate a black image and see a big speed disparity 
>>> like this? Or is it in any way specific to the particular image?
>>> 
>>> Wednesdays are pretty packed, but I'l try to take a look at this today if I 
>>> can.
>>> 
>>> 
>>>> On Nov 10, 2021, at 4:08 AM, Daniel Flehner Heen <flehnerhee...@gmail.com 
>>>> <mailto:flehnerhee...@gmail.com>> wrote:
>>>> 
>>>> Hi!
>>>> 
>>>> I'm attempting to extract only the RGBA channels of an EXR with 70+ 
>>>> channels. 
>>>> 
>>>> Using:
>>>> oiiotool -i:ch=R,G,B,A /path/to/gigantic.exr -o manageable.exr 
>>>> takes about 8 seconds.
>>>> 
>>>> In python (2.7 !caugh..):
>>>> buf = oiio.ImageBuf('/path/to/gigantic.exr')
>>>> obuf = oiio.ImageBufAlgo.channels(buf, ('R', 'G', 'B', 'A'))
>>>> obuf.write('manageable.exr')
>>>> takes 4+ minutes
>>>> 
>>>> I tried only extracting one channel and it took the exact amount of time. 
>>>> I expect a performance hit using python but this seems a bit off. I 
>>>> suspect the python bindings are reading all the channels even though I'm 
>>>> asking for a specific few.
>>>> I might of course be going about this completely wrong, in which case a 
>>>> pointer in the right direction would be highly appreciated :)
>>>>  
>>>> Straight read and write of the same file is done quickly.
>>>> 
>>>> Thanks!
>>>> 
>>>> -- 
>>>> -Daniel
>>>> _______________________________________________
>>>> Oiio-dev mailing list
>>>> Oiio-dev@lists.openimageio.org <mailto:Oiio-dev@lists.openimageio.org>
>>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org 
>>>> <http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org>
>>> 
>>> --
>>> Larry Gritz
>>> l...@larrygritz.com <mailto:l...@larrygritz.com>
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Oiio-dev mailing list
>>> Oiio-dev@lists.openimageio.org <mailto:Oiio-dev@lists.openimageio.org>
>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org 
>>> <http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org>
>>> 
>>> 
>>> -- 
>>> -Daniel
>>> _______________________________________________
>>> Oiio-dev mailing list
>>> Oiio-dev@lists.openimageio.org <mailto:Oiio-dev@lists.openimageio.org>
>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org 
>>> <http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org>
>> 
>> --
>> Larry Gritz
>> l...@larrygritz.com <mailto:l...@larrygritz.com>
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Oiio-dev mailing list
>> Oiio-dev@lists.openimageio.org <mailto:Oiio-dev@lists.openimageio.org>
>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org 
>> <http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org>
>> 
>> 
>> -- 
>> -Daniel
>> _______________________________________________
>> Oiio-dev mailing list
>> Oiio-dev@lists.openimageio.org <mailto:Oiio-dev@lists.openimageio.org>
>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org 
>> <http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org>
> 
> --
> Larry Gritz
> l...@larrygritz.com <mailto:l...@larrygritz.com>
> 
> 
> 
> 
> _______________________________________________
> Oiio-dev mailing list
> Oiio-dev@lists.openimageio.org <mailto:Oiio-dev@lists.openimageio.org>
> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org 
> <http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org>
> 
> 
> -- 
> -Daniel
> _______________________________________________
> Oiio-dev mailing list
> Oiio-dev@lists.openimageio.org
> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

--
Larry Gritz
l...@larrygritz.com




_______________________________________________
Oiio-dev mailing list
Oiio-dev@lists.openimageio.org
http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

Reply via email to