Re: [Oiio-dev] Storing data from ImageInput::read_image in planar channel memory layout

Larry Gritz Mon, 19 Oct 2020 10:58:59 -0700

What Nathan suggests is probably what's going on here. There's not only the 
overhead of breaking the process into more calls (all the way up and down the 
stack), but probably there is repeated work deep in the OpenEXR library that 
can't easily be avoided. There could also be caching or memory bandwidth 
implications that we could never eliminate when putting different channels into 
different buffers.


Frankly, I'm pretty happy that it's only 10-30% overhead to read channels one 
at a time, and not that reading channels individually ends up more expensive by 
a factor of nchannels!

If that 10-30% is unacceptable, I think we'd have no choice but to plumb the 
whole thing with a single call that supplies multiple buffer pointers. Even 
that would be a mixed victory: it might improve OpenEXR in particular (though 
we may not know by how much until we've already invested in completing the 
majority of the work), but not help at all for most other formats. Trimming 
that last 10% would need to be really important to somebody.

Note also that the OpenEXR team is at this moment thinking about ways to 
improve performance internally. So a more fruitful way to higher efficiency may 
be to attack the OpenEXR library internals, both making it more efficient 
generally, and also trying to ensure that the particular API call sequence that 
OIIO uses is not doing something unnecessarily wasteful in the exr library.

        -- lg


> On Oct 19, 2020, at 10:41 AM, Nathan R <nathanru...@gmail.com> wrote:
> 
> > We also observe that in our OpenEXR reader, the implementation of 
> > read_scanlines/read_tiles where you ask for a single channel is ALREADY as 
> > efficient as it can get, since it requests only that one channel from the 
> > exr library, and as far as we know, incurs no unnecessary copies in the 
> > process.
> 
> I can't remember if the EXR library does any of its own caching, but for any 
> multi-channel part in an EXR, I believe you could end up decompressing the 
> same data multiple times, since the channels are all compressed together 
> within each unit (scanline, scanline block, tile, etc.)
> 
> I may be glossing over some obvious details, but I thought I'd mention it in 
> case it could explain part or all of the 10-30% overhead John is observing.
> 
> -Nathan
> 
> On 10/17/2020 10:17 AM, Larry Gritz wrote:
>> John, or anybody else interested in this issue, you should follow the 
>> discussion happening in the [issue I 
>> created](https://github.com/OpenImageIO/oiio/issues/2744 
>> <https://github.com/OpenImageIO/oiio/issues/2744>).
>> 
>> Jan correctly points out that there is a version of 
>> read_scanlines/read_tiles/read_image that takes chbegin and chend arguments, 
>> which can be used to read just one channel (chbegin=c, chend=c+1), and so 
>> calling it once per channel, directing that channel to the appropriate 
>> single-channel buffer, is surprisingly efficient. 
>> 
>> This may defy expectation, but there are several factors to keep in mind:
>> 
>> * Deep underneath, the format library is probably reading and decompressing 
>> the scanline or tile just once and then just copying the decompressed 
>> results if you are making subsequent calls requesting the same scanline or 
>> tile. So the total amount of disk read and decompress is no worse than a 
>> single interleaved read.
>> 
>> * Any requested data type conversion is happening for each channel 
>> individually, so the total conversion work is the same as if you'd asked for 
>> all channels together and interleaved (except maybe for a tiny bit of extra 
>> loop overhead).
>> 
>> * Only one channel per call has to be copied to the user buffer, so the 
>> total amount of data copied to the user area is the same as if you'd asked 
>> for all channels together (just split into nchannels chunks).
>> 
>> * The extraction of one channel into the user buffer that happens in our 
>> implementation of read_scanilnes/read_tiles is multithreaded! For this 
>> reason, you SHOULD NOT read into a temp buffer and do the unscrambling on 
>> your own, as I previously advised, unless you are very confident that your 
>> implementation of the unscrambling is more efficient than my multithreaded 
>> version.
>> 
>> We also observe that in our OpenEXR reader, the implementation of 
>> read_scanlines/read_tiles where you ask for a single channel is ALREADY as 
>> efficient as it can get, since it requests only that one channel from the 
>> exr library, and as far as we know, incurs no unnecessary copies in the 
>> process.
>> 
>> So at most, we might want to beef up our internal TIFF reader implementation 
>> to fully exploit single-channel reads in the case where the tiff file itself 
>> is stored as separate channel planes.
>> 
>> John, I would be curious to hear if Gaffer is doing this the way that we now 
>> suspect is most efficient -- multiple calls requesting a single channel, 
>> rather than asking for all channels in a temp buffer and then descrambling 
>> yourself. We may already be very close to optimal efficiency, at least for 
>> some formats, and can do very focused changes to make other formats more 
>> efficient if needed.
>> 
>>      -- lg
>> 
>> 
>>> On Oct 16, 2020, at 1:27 AM, John Haddon <j...@image-engine.com 
>>> <mailto:j...@image-engine.com>> wrote:
>>> 
>>> On Fri, 16 Oct 2020 at 08:27, Larry Gritz <l...@larrygritz.com 
>>> <mailto:l...@larrygritz.com>> wrote:
>>> Like I said, the lack of requests for this over the years is strong 
>>> circumstantial evidence that it's not an important enough case to enough 
>>> people to justify the work or the extra API complexity, though if lots of 
>>> people chime in that they have wanted this, I could be                      
>>>    convinced.
>>> 
>>> This is definitely something we would use in Gaffer, as we're currently 
>>> doing the rescrambling for all IO. But right now I don't have any solid 
>>> numbers for how much overhead that represents, so would need to do some 
>>> profiling before I could say it justified the effort. My suspicion is we'd 
>>> need to optimise other aspects of the processing pipeline before it became 
>>> particularly beneficial.
>>> 
>>> On a more general note, not aimed at OIIO in particular, my experience is 
>>> that "nobody asked" isn't necessarily a good indication that all is well. 
>>> I'm regularly astounded at the ingenious contortions folks will go through 
>>> as a workaround before asking for something.
>>> 
>>> Cheers...
>>> John
>>> 
>>> _______________________________________________
>>> Oiio-dev mailing list
>>> Oiio-dev@lists.openimageio.org <mailto:Oiio-dev@lists.openimageio.org>
>>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org 
>>> <http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org>
>> 
>> --
>> Larry Gritz
>> l...@larrygritz.com <mailto:l...@larrygritz.com>
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Oiio-dev mailing list
>> Oiio-dev@lists.openimageio.org <mailto:Oiio-dev@lists.openimageio.org>
>> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org 
>> <http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org>
> 
> _______________________________________________
> Oiio-dev mailing list
> Oiio-dev@lists.openimageio.org
> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

--
Larry Gritz
l...@larrygritz.com

_______________________________________________
Oiio-dev mailing list
Oiio-dev@lists.openimageio.org
http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org

Re: [Oiio-dev] Storing data from ImageInput::read_image in planar channel memory layout

Reply via email to