There is a perl program called exiftool that can load and set exif tool without
loading the image data (or at least it doesn’t decode the image data). I don’t
know whether it would be faster than loading image data/properties with
ImageIO. You could write a perl script that used your bundled exiftool to load
the exif data and output the results for many files in a format your program
could handle, because instantiating perl/exiftool repeatedly for each image in
a separate NSTask would probably be pretty slow.
Jim Crate
> On Jan 7, 2023, at 2:07 PM, Alex Zavatone via Cocoa-dev
> wrote:
>
> Hi Gabe. I’d add basic logging before you start each image and after you
> complete each image to see how much each is taking on each of problem tests
> so you can see the extent of how slow it is on your problem platforms.
>
> Then you can add more logging to expose the problems and start to address
> them once you see where the bottlenecks are.
>
> I wonder if there is a method to load the EXIF data out of the files without
> opening them completely. That would seem like the ideal approach.
>
> Cheers,
> Alex Zavatone
>
>> On Jan 7, 2023, at 12:36 PM, Gabriel Zachmann wrote:
>>
>> Hi Alex, hi everyone,
>>
>> thanks a lot for the many suggestions!
>> And sorry for following up on this so late!
>> I hope you are still willing to engage in this discussion.
>>
>> Yes, Alex, I agree in that the main question is:
>> how can I get the metadata of a large amount of images (say, 100k-300k)
>> *without* actually loading the whole image files.
>> (For your reference: I am interested in the date tags embedded in the EXIF
>> dictionary, and those dates will be read just once per image, then cached in
>> a dictionary containing filename & dates, and that dictionary will get
>> stored on disk for future use by the app.)
>>
>>> CGImageSourceRef imageSourceRef =
>>> CGImageSourceCreateWithURL((CFURLRef)imageUrl, NULL);
>>
>> I have tried this:
>>
>> for ( NSString* filename in imagefiles )
>> {
>> NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
>>CGImageSourceRef sourceref = CGImageSourceCreateWithURL( (__bridge
>> CFURLRef) imgurl, NULL );
>> }
>>
>> This takes 1 minute for around 300k images stored on my internal SSD.
>> That would be OK.
>>
>> However! .. if performed on a folder stored on an external hard disk, I get
>> the following timings:
>>
>> - 20 min for 150k images (45 GB)
>> - 12 min for 150k images (45 GB), second time
>> - 150 sec for 25k images (18 GB)
>> - 170 sec for 25k images (18 GB), with the lines below (*)
>> - 80 sec for 22k (3 GB) images
>> - 80 sec for 22k (3 GB) images, with the lines below (*)
>>
>> All experiments were done on different folders on the same hard disk, WD
>> MyPassport Ultra, 1 TB, USB-A connector to Macbook Air M2.
>> Timings with the same number of files/GB were the same folders, resp.
>>
>> (*): these were timings where I added the following lines to the loop:
>>
>> CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image,
>> 0, NULL );
>> bool success = CFDictionaryGetValueIfPresent( fileProps,
>> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
>> CFDictionaryGetValueIfPresent( exif_dict,
>> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
>> iso_date = [isoDateFormatter_ dateFromString: (__bridge NSString *
>> _Nonnull)(dateref) ];
>> [datesAndTimes_ addObject: iso_date ];
>>
>> (Plus some error checking, which I omit here.)
>>
>> First of all, we can see that the vast majority of time is spent on
>> CGImageSourceCreateWithURL().
>> Second, there seem to be some caching effects, although I have a hard time
>> understanding that, but that is not the point.
>> Third, the durations are not linear; I guess it might have something to do
>> with the sizes of the files, too, but again, didn't investigate further.
>>
>> So, it looks to me like CGImageSourceCreateWithURL() really loads the
>> complete image file.
>>
>> I don't see why Ole Begemann (ref'ed in Alex' post) can claim his approach
>> does not load the whole image.
>>
>>
>> Some people suggested parallelizing the whole task, using
>> dispatch_queue_create or NSOperationQueue.
>> (Thanks Steve, Gary, Jack!)
>> Before restructuring my code for that, I would like to better understand why
>> you think that will speed up things.
>> The code above pretty much does no computations, so most of the time is, I
>> guess, spent on waiting for the data to arrive from hard disk.
>> So, why would would several threads loading those images in parallel help
>> here? In my thinking, they will just compete for the same resource, i.e.,
>> hard disk.
>>
>>
>> I also googled quite a bit, to no avail.
>>
>> Any and all hints, suggestions, and insights will be highly appreciated!
>> Best, Gab
>>
>>
>>>
>>
>>
>>> if (!imageSourceRef)