Thank you so much for this super detailed answer! Makes a lot of sense and I can confirm that your solutions work like charm.
Thanks again! On Fri, Nov 12, 2021 at 9:02 AM Larry Gritz <l...@larrygritz.com> wrote: > OK, I have a definitive answer for you, as well as a solution. It requires > a bit of a deep dive into OIIO behavior and design. > > So, as background, you need to know a couple of things that are the key to > it all: > > ImageBuf when reading files from disk, by default under most > circumstances, is backed by ImageCache. That is, it usually does not read > in the whole image, but accesses pixels through the cache. This helps in a > number of ways: an ImageBuf with a huge image (or for that matter, a large > number of reading ImageBufs simultaneously active in your program) does not > consume memory of the full size of each image of every ImageBuf, but rather > the total memory is the cache size (which is adjustable but defaults to 256 > MB!). Also, because it's only reading parts of the image as needed, access > patterns that only need to touch a sub-area of the image do a lot less I/O > than if they'd had to read the whole image in. > > (Aside: we do pay a bit of a performance penalty for this going through > the ImageCache all the time, but it's not very large for most access > patterns, and it's a fair tradeoff to have the ability to literally have > thousands of IB's that appear to hold TBs of pixels, but actually they are > all together taking as little as a few GB in RAM.) > > It just so happens that this large image (2372 x 1250, 71 channels of > float) would total 803 MB. Remember I said that the cache size defaults to > 256MB? Now, the fact that this one image is larger than the cache size > would not be a problem... for most use cases where you access the pixels > coherently. For example, if you access pixel (0,0), do everything you need > with that, then pixel (1,0), and so on, never needing to return once you > move on to the next pixel, you will basically see no penalty at all for the > caching, because for that access pattern, each tile only is read once. > > Unfortunately, the way the ImageBufAlgo::channels() function works is that > it copies one listed channel at a time. This is the worst case for this > caching situation, if the image can't fully reside in the cache, because > you first copy Red, starting with (0,0) and making your way to (2371,1249), > and by the time you finish, you've had to throw the early tiles out > already. Now you copy green, and when it wants green for (0,0) it's gone, > so it has to read that tile again, and so on. > > So a single image that's bigger than the cache size, plus this particular > operation ("channels") that causes you to have to traverse the whole image > multiple times, is the "perfect storm of badness" for the cache. > > Why didn't this affect oiiotool? Basically, because oiiotool initializes > the cache to 4 GB by default. The image fits, so it's super fast! In fact, > we can ask oiiotool to print some stats for us (I'm reproducing only the > relevant parts here): > > $ oiiotool *-runstats* test_multichannel_black.exr --ch R,G,B,A -o out.exr > > oiiotool runtime statistics: > Total time: 3.35s > Peak memory: 933.0 MB > > OpenImageIO ImageCache statistics (shared) ver 2.4.0dev > Total pixel data size of all images referenced : 803.1 MB > Pixel data read : 843.1 MB > File I/O time : 30.0s (1.9s average per thread, for 16 threads) > Tiles: 861 created, 760 current, 760 peak > total tile requests : 47500 > main cache misses : 861 (1.81263%) > redundant reads: 0 tiles, 0 B > Peak cache memory : 843.1 MB > > > A few things to note: The total memory in the image was 803 MB as I noted > earlier, and it did 843 MB of I/O (I suspect the reason for the discrepancy > is that the resolution 2372x1250 are not multiples of the tile size 64x64, > so probably my statistics are over-counting the right-most column and > bottom-most row of partially-filled tiles as if they were whole tiles). And > also notice that the "main cache misses" are pretty low at 1.8%, and it > says there are 0 tiles / 0 bytes of redundant reads -- that means no tiles > were read from disk more than once. There's 30s of I/O, but that's spread > over 16 threads and largely happening in parallel, so total runtime on my > laptop was only 3.35s. > > Just to test this theory, we can set oiiotool's cache size lower, to match > the ImageCache default: > > $ oiiotool -runstats *--cache 256* test_multichannel_black.exr --ch > R,G,B,A -o out.exr > > oiiotool runtime statistics: > Total time: 1m 51.85s > Peak memory: 593.1 MB > > OpenImageIO ImageCache statistics (shared) ver 2.4.0dev > Total pixel data size of all images referenced : 803.1 MB > Pixel data read : 31.7 GB > File I/O time : 18m 28.6s (1m 9.3s average per thread, for 16 threads) > Tiles: 35247 created, 230 current, 243 peak > total tile requests : 47500 > micro-cache misses : 47500 (100%) > main cache misses : 35247 (74.2042%) > redundant reads: 28491 tiles, 30.9 GB > Peak cache memory : 255.2 MB > > > Yes, this is the slow version like when we use your python script. And we > can see exactly what's happening in the stats: instead of reading in the > neighbourhood of 800MB, we're read 31.7 GB, and 30.9 GB of that is > redundant! Wow, crazy, and so the total I/O time of 18.5 minutes (across > all threads) and wall-clock runtime of almost 2 minutes should not be > surprising. > > With me so far? > > Now we just need to figure out how to fix the problem for your python > script. There are two ways, and they are both easy! > > Method #1: increase the cache size > > *oiio.ImageCache().attribute("max_memory_MB", 1000)* > buf = oiio.ImageBuf('test_multichannel_black.exr') > obuf = oiio.ImageBufAlgo.channels(buf, ('R', 'G', 'B', 'A')) > obuf.write('out.exr') > > > Just grab the "default" global cache, and set the "max_memory_MB" > attribute to 1000 (instead of 256). Boom, runs in 3.7s. > > Method #2: force an immediate read > > buf = oiio.ImageBuf('test_multichannel_black.exr') > *buf.read(force = True)* > obuf = oiio.ImageBufAlgo.channels(buf, ('R', 'G', 'B', 'A')) > obuf.write('out.exr') > > > In your original script, you declare the ImageBuf and point it to the > file... but that doesn't read it. The read happens implicitly in the > channels() call -- as it tries to access pixels, they are requested from > the cache, and actual disk reads happen lazily as they are needed. That's > the behavior we usually want. > > But there is an explicit ImageBuf.read() call with which you can declare > that you really do want to read the whole image now... more or less -- > depending on a number of conditions, it may still fall back on the > ImageCache. But it does have an optional 'force' parameter, and if you pass > force=True, then it means "bypass the ImageCache, read the whole image into > local ImageBuf-owned memory NOW, even if it's large." (Obviously, the > possible down-side is that if the image is truly huge, or you have a lot of > ImageBuf's around, maybe it's not a good idea to force them all to have > their images fully resident in RAM, you don't get the memory management > benefits of their using the ImageCache. But you're an adult, you can make > that choice if you want. In fact, oiiotool also cheats a little and for > images that are smaller than 4k x 2k x 4 chans, it also forces immediate > reads just for speed sake.) > > So when we say buf.read(force=True), we're bypassing the cache, just > reading the whole thing, and so again, we end up with a really fast > runtime, in fact about 2 seconds on my laptop. (When you completely bypass > the cache, there are a number of important shortcuts and other > optimizations.) > > > That was a little long-winded, but I figured that there was a lot to gain > from really explaining all the gory details. > > Oh, one more thing. I also tried the oiiotool version with --cache 512, > i.e. a 512MB cache (note: smaller than the image!) and although there is a > little bit of redundant I/O visible in the stats, it's is still super fast, > a little more than 3 seconds, basically the same speed as when we used the > 4GB cache. So even in the pathological case and with the poor access > pattern of channels(), it's not critical to use a cache that's larger than > the images! Cache size 400 MB also is fine, but I/O is up to 4 seconds. At > 350MB, it takes 7 seconds. At 300 MB, it's 11 seconds. The cache tends to > work pretty well even when stressed. But it has its limits, and by the time > we lower it all the way down to 256 MB, for this case, it crosses some > threshold where it's just thrashing and performance is ruined, and the > operation takes a couple of minutes to complete. > > -- lg > > > On Nov 10, 2021, at 2:08 PM, Daniel Flehner Heen <flehnerhee...@gmail.com> > wrote: > > Brilliant! Thanks! > It's still slow. > I'll send you a file off list. > > On Wed, Nov 10, 2021 at 10:46 PM Larry Gritz <l...@larrygritz.com> wrote: > >> oiiotool original.exr --mulc 0.0* --eraseattrib ".*"* -o black.exr >> >> But double check the timing -- just in case it is somehow the metadata >> that is messing with the performance! >> >> >> On Nov 10, 2021, at 1:26 PM, Daniel Flehner Heen <flehnerhee...@gmail.com> >> wrote: >> >> I created the black exr and it is indeed small. Still takes a long time >> extracting the channels, so it's good for testing. >> My only concern is that some of the metadata contains confidential >> information. Is there an easy way of wiping the custom metadata with >> otiotool? >> >> In parallel with writing this email, I'm checking with my supervisors if >> I can send you the image (preferably off list.) >> >> On Wed, Nov 10, 2021 at 9:55 PM Larry Gritz <l...@larrygritz.com> wrote: >> >>> Interesting! As long as there's some easily measurable perf difference >>> (like 2x is fine), I should be able to figure out what's going on. Though >>> it's weird that your original image is more like 30x. >>> >>> Let's try something else. Assuming your original image is not something >>> you can share, what about this: >>> >>> oiiotool original.exr --mulc 0.0 -o black.exr >>> >>> That should make the pixels black, but retain all the metadata, channel >>> names, and data types of the original that had a large performance >>> difference. It should be a small file (black compresses very well) and will >>> obviously contain no imagery that is secret. >>> >>> In one "part" of an exr file, all channels are either deep or flat, so >>> "deep" is all or none of the channels in that part/subimage. >>> >>> >>> On Nov 10, 2021, at 12:30 PM, Daniel Flehner Heen < >>> flehnerhee...@gmail.com> wrote: >>> >>> Hi, Larry! >>> >>> Thanks for looking into this. I created an image with IBA.noise() in 70 >>> channels and there still is a performance difference. >>> Not that big though: 2.3 seconds (python) vs 0.9 (oiiotool) >>> >>> Not sure if it makes a difference, but the slow image has a mix of float >>> and half channels and is tiled 64x64. >>> Don't know exactly if any of the channels are deep. Will ImageBuf.deep >>> return True if any channel is deep? >>> a buf.deep call returns False. >>> >>> >>> On Wed, Nov 10, 2021 at 6:16 PM Larry Gritz <l...@larrygritz.com> wrote: >>> >>>> Well that sure is weird! Usually, both the python bindings and the >>>> oiiotool operation are pretty thin wrappers around the corresponding >>>> ImageBufAlgo function which is doing almost all the work. A big perf >>>> difference is not expected. >>>> >>>> Can I assume that this is reproducible by any 70 channel image? Like I >>>> can just use oiiotool to generate a black image and see a big speed >>>> disparity like this? Or is it in any way specific to the particular image? >>>> >>>> Wednesdays are pretty packed, but I'l try to take a look at this today >>>> if I can. >>>> >>>> >>>> On Nov 10, 2021, at 4:08 AM, Daniel Flehner Heen < >>>> flehnerhee...@gmail.com> wrote: >>>> >>>> Hi! >>>> >>>> I'm attempting to extract only the RGBA channels of an EXR with 70+ >>>> channels. >>>> >>>> Using: >>>> oiiotool -i:ch=R,G,B,A /path/to/gigantic.exr -o manageable.exr >>>> takes about 8 seconds. >>>> >>>> In python (2.7 !caugh..): >>>> buf = oiio.ImageBuf('/path/to/gigantic.exr') >>>> obuf = oiio.ImageBufAlgo.channels(buf, ('R', 'G', 'B', 'A')) >>>> obuf.write('manageable.exr') >>>> takes 4+ minutes >>>> >>>> I tried only extracting one channel and it took the exact amount of >>>> time. I expect a performance hit using python but this seems a bit off. I >>>> suspect the python bindings are reading all the channels even though I'm >>>> asking for a specific few. >>>> I might of course be going about this completely wrong, in which case a >>>> pointer in the right direction would be highly appreciated :) >>>> >>>> Straight read and write of the same file is done quickly. >>>> >>>> Thanks! >>>> >>>> -- >>>> -Daniel >>>> _______________________________________________ >>>> Oiio-dev mailing list >>>> Oiio-dev@lists.openimageio.org >>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org >>>> >>>> >>>> -- >>>> Larry Gritz >>>> l...@larrygritz.com >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Oiio-dev mailing list >>>> Oiio-dev@lists.openimageio.org >>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org >>>> >>> >>> >>> -- >>> -Daniel >>> _______________________________________________ >>> Oiio-dev mailing list >>> Oiio-dev@lists.openimageio.org >>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org >>> >>> >>> -- >>> Larry Gritz >>> l...@larrygritz.com >>> >>> >>> >>> >>> _______________________________________________ >>> Oiio-dev mailing list >>> Oiio-dev@lists.openimageio.org >>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org >>> >> >> >> -- >> -Daniel >> _______________________________________________ >> Oiio-dev mailing list >> Oiio-dev@lists.openimageio.org >> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org >> >> >> -- >> Larry Gritz >> l...@larrygritz.com >> >> >> >> >> _______________________________________________ >> Oiio-dev mailing list >> Oiio-dev@lists.openimageio.org >> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org >> > > > -- > -Daniel > _______________________________________________ > Oiio-dev mailing list > Oiio-dev@lists.openimageio.org > http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org > > > -- > Larry Gritz > l...@larrygritz.com > > > > > _______________________________________________ > Oiio-dev mailing list > Oiio-dev@lists.openimageio.org > http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org > -- -Daniel
_______________________________________________ Oiio-dev mailing list Oiio-dev@lists.openimageio.org http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org