As with any case of a general tool that can do a lot of things and tries to make the hardest cases work and the average case have good performance, a specialized tool that does exactly one thing and no more is likely to be more efficient, especially on straightforward use cases.
Without trying it out myself, just going by what you've written in the email, I can think of a few things that might be very different between oiiotool and your python script: 1. Your python example parallelizes across the frame range -- meaning that each parallel task is truly independent and should have no locking or interference at all (though at the expense of possibly using a lot of memory, since many ImageBuf's will be active at once). But oiiotool is going to handle the frames serially, and try to parallelize the work within each frame. This necessarily means that, since all the threads are dealing with the same image, there will probably be lots of times that they interfere with each other, the most severe being that the threads are sharing a single underlying OpenEXR file for input and also for output. The simplest way to test this hypothesis is with a few other timing tests (a) compare to python when you don't use the python thread pool (i.e., handle the files serially) but also don't set the "threads" attribute, so OIIO tries to thread in the various operations you're doing; (b) compare python and oiiotool for cropping just one file; (c) try both for just one file, with just one thread (set "threads" to 1 in python, and also use --threads 1 for oiiotool). 2. If using OpenEXR >= 3.1, there's also the option to turn on the use of the "exrcore" library, which is off by default, but when enabled results in a lot less locking for the case where multiple threads are reading from the same ImageInput. You can enable this in oiiotool with "--oiioattrib openexr:core 1". This will soon be the default, but only after the next openexr release that fixes a limitation where the exrcore was broken for certain compression types. But even enabled, it won't currently help the case of multiple cores wanting to write to a single file. 3. Your python oiiotool by default reads the files immediately into an ImageBuf, whereas oiiotool does so for small files but for big files (which I assume something 6k or bigger, as in your example, certainly is), falls back on an underlying ImageCache to read parts of the image on demand (which is helpful if the images are really huge, or if it turns out you only need part of it). When you need the whole image and it could fit comfortably in memory, this is obviously slower since it's breaking the I/O into smaller chunks, and also there's the matter of any additional overhead of using the cache versus having the whole image in a big flat buffer at once. 4. Also, oiiotool by default reads all images into 32 bit float buffers and caches for the in-memory representation, in order to maximally preserve precision of whatever operations you do and also to speed up any complex math (doing math on a whole float buffer is faster than converting to float, doing a single math op, then converting back on a pixel-by-pixel basis). But... the thing you're doing is simply reading the buffer in, cropping (either padding with black or trimming pixels away), and writing out again -- the pixel values are just being copied, there is no "math", so the speed and precision advantages of float buffers are irrelevant for this particular case, you are just uselessly paying the overhead of converting to and from float (your source exr file is likely "half") and having 2x more data size to slog through and copy around in memory. Needless to say, your python script is not trying to be clever in this way, it just accepts whatever the native data type is. One way to test the effects of #3 and #4 is to change the input from simply naming the file to using the -i command explicitly, with some modifiers -i:now=1:native=1 path/to/src.3935-3954%06d.exr The now=1 bypasses the ImageCache and eschews any lazy reading, and just reads the whole input image into memory (i.e., an ImageBuf) right then. And native=1 means "don't convert to float, just read it into an ImageBuf as whatever data type was in the file." I'm very curious to see a comparison between (a) oiiotool cropping a single image when you use -i:now=1:native=1, versus (b) your python script operating on a single image (but don't bother setting threads=1). I bet those two times are going to be a lot more similar. TL;DR after rereading everything I just wrote: it would not surprise me at all if almost all the runtime in the oiiotool case was simply the serialization of writing all of the output exr frames one by one, whereas the python case is explicitly parallelizing this operation by handling many frames at once, independently, and that even all the other factors I hypothesize about are relatively minor in comparison. Food for thought about oiiotool and future enhancements: * Is reliance on ImageCache for anything but fairly small images the right default? Should the size thresholds for ImageCache to kick in be much larger, or happen only if you explicitly ask for it? * Should it try to scan the commands being used before doing any work, and determine if all the operations involve only pixel copies, no math per se, that it automatically use native=1 for inputs? That is, should the promotion to float buffers internally only happen in the cases where the command line implies that there is a precision or speed advantage to doing so? (Down side: can we screw this up and miss cases where we should have promoted? Can we get results or perf that differ significantly just because one command in a long sequence changes slightly, leading to counter-intuitive behavior that is hard for users to reason about?) * When file sequence wildcards are used, should oiiotool automatically try to parallelize across the file sequence rather than within each file operation? Are there oiiotool command lines people use that operate on file sequences but have iteration-to-iteration data dependencies that would give wrong results if the sequence wasn't processed serially in order? (I suppose it could be an option users can explicitly set for whether to serialize or parallelize file sequence operations.) * Possibly simpler than the last item: What if we simply made -o asynchronous when it's the last operation on the command line? That is, -o puts the whole output task on the thread queue to do its thing, while the main thread moves on to the next iteration of the file sequence, thus allowing the output step (which is probably the most expensive part, as well as the hardest to parallelize) to overlap in time with the work on the next input file? > On May 8, 2023, at 2:41 AM, Simon Björk <bjork.si...@gmail.com> wrote: > > I'm trying to crop a sequence of (8k) exr files and it seems like oiiotool is > quite a bit slower than using the python bindings and mulitprocessing. > > Is this expected? I was under the assumption that it's always better/faster > to use oiiotool if possible. I've tried changing the --threads argument but > the results are the same. I'm on Windows. > > oiiotool path/to/src.3935-3954%06d.exr --crop 6640x5760+1000+0 --runstats -o > path/to/dst.3935-3954%06d.exr > -------------------------------- > Time: 59 seconds > import sys > import os > import time > from multiprocessing.dummy import Pool as ThreadPool > > import OpenImageIO as oiio > > oiio.attribute("threads", 1) > > def crop(path): > im = oiio.ImageBuf(path) > new_im = oiio.ImageBufAlgo.crop(im, oiio.ROI(1000, 7640, 0, 5760)) > new_filepath = "{0}/{1}".format("D:/tmp/exr_crop", os.path.basename(path)) > new_im.write(new_filepath) > > dir_path = "D:/tmp/exr_src" > files = ["{0}/{1}".format(dir_path, x) for x in os.listdir(dir_path)] > > st = time.time() > > pool = ThreadPool(48) > results = pool.map(crop, files) > > et = time.time() > > print("Total time: {0}".format(et-st)) > > -------------------------------- > Time: 6.9 seconds > > /Simon > > > > ------------------------------- > Simon Björk > Compositor/TD > > +46 (0)70-2859503 > www.bjorkvisuals.com > <http://www.bjorkvisuals.com/>_______________________________________________ > Oiio-dev mailing list > Oiio-dev@lists.openimageio.org > http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org -- Larry Gritz l...@larrygritz.com
_______________________________________________ Oiio-dev mailing list Oiio-dev@lists.openimageio.org http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org