Re: Retrieving the EXIF date/time from 250k images

2023-01-15 Thread Gabriel Zachmann via Cocoa-dev
> Is there any reason why you don't want to do it yourself, simply loading the 
> first KBs of the file ? I haven't been writing JPEG decoders for a lng 
> time, but I'm pretty sure the EXIF (APPn) marker must be written before the 
> image data (the SOS marker(s)), so it is at the beginning of the file.

I had thought of that , too.
The reason is that I have no experience whatsoever with the JPEG file format.
Nor with the EXIF file format.
Also, I need to be able to parse at least JPEG, PNG, TIF, GIF, and I'd like to 
add HEIC.

Best regards, Gabriel

> You could simply parse the first markers of the file (0xFF, 0x??) and look 
> for the EXIF one. The JPEG file format (JFIF) is pretty simple to decode (I 
> can help).
>
> Regards,
>
> Raphaël
>
>
> On Sun, Jan 8, 2023 at 3:30 PM Chris Ridd via Cocoa-dev 
>  wrote:
>
>
> > On 7 Jan 2023, at 23:31, Gabriel Zachmann via Cocoa-dev 
> >  wrote:
> >
> >>
> >> I wonder if there is a method to load the EXIF data out of the files 
> >> without opening them completely.  That would seem like the ideal approach.
> >
> > That's the approach I am looking for, but I haven't found any API to do 
> > that.
>
> I’m sure you’ve tried the following, but in case you haven’t:
>
> * investigate mmapping each file into memory and use a version of the API 
> that takes NSData/CFData instead of a file/URL to get the EXIF information.
>
> * if the files are indexed by Spotlight, can you get the EXIF information 
> from a NSMetadataQuery search?
>
> Chris
> ___
>
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/lemoine.raphael%40gmail.com
>
> This email sent to lemoine.raph...@gmail.com



smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-15 Thread James Crate via Cocoa-dev
There is a perl program called exiftool that can load and set exif tool without 
loading the image data (or at least it doesn’t decode the image data). I don’t 
know whether it would be faster than loading image data/properties with 
ImageIO. You could write a perl script that used your bundled exiftool to load 
the exif data and output the results for many files in a format your program 
could handle, because instantiating perl/exiftool repeatedly for each image in 
a separate NSTask would probably be pretty slow. 

Jim Crate


> On Jan 7, 2023, at 2:07 PM, Alex Zavatone via Cocoa-dev 
>  wrote:
> 
> Hi Gabe.  I’d add basic logging  before you start each image and after you 
> complete each image to see how much each is taking on each of problem tests 
> so you can see the extent of how slow it is on your problem platforms.
> 
> Then you can add more logging to expose the problems and start to address 
> them once you see where the bottlenecks are.
> 
> I wonder if there is a method to load the EXIF data out of the files without 
> opening them completely.  That would seem like the ideal approach.
> 
> Cheers,
> Alex Zavatone
> 
>> On Jan 7, 2023, at 12:36 PM, Gabriel Zachmann  wrote:
>> 
>> Hi Alex, hi everyone,
>> 
>> thanks a lot for the many suggestions!
>> And sorry for following up on this so late!
>> I hope you are still willing to engage in this discussion.
>> 
>> Yes, Alex, I agree in that the main question is:
>> how can I get the metadata of a large amount of images (say, 100k-300k) 
>> *without* actually loading the whole image files.
>> (For your reference: I am interested in the date tags embedded in the EXIF 
>> dictionary, and those dates will be read just once per image, then cached in 
>> a dictionary containing filename & dates, and that dictionary will get 
>> stored on disk for future use by the app.)
>> 
>>> CGImageSourceRef imageSourceRef = 
>>> CGImageSourceCreateWithURL((CFURLRef)imageUrl, NULL);
>> 
>> I have tried this:
>> 
>>  for ( NSString* filename in imagefiles ) 
>>  {
>> NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
>>CGImageSourceRef sourceref = CGImageSourceCreateWithURL( (__bridge 
>> CFURLRef) imgurl, NULL );
>>  }
>> 
>> This takes 1 minute for around 300k images stored on my internal SSD.
>> That would be OK.
>> 
>> However! .. if performed on a folder stored on an external hard disk, I get 
>> the following timings:
>> 
>> - 20 min for 150k images (45 GB) 
>> - 12 min for 150k images (45 GB), second time
>> - 150 sec for 25k images (18 GB)
>> - 170 sec for 25k images (18 GB), with the lines below (*)
>> - 80 sec for 22k (3 GB) images
>> - 80 sec for 22k (3 GB) images, with the lines below (*)
>> 
>> All experiments were done on different folders on the same hard disk, WD 
>> MyPassport Ultra, 1 TB, USB-A connector to Macbook Air M2.
>> Timings with the same number of files/GB were the same folders, resp.
>> 
>> (*): these were timings where I added the following lines to the loop:
>> 
>>   CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, 
>> 0, NULL );
>>   bool success = CFDictionaryGetValueIfPresent( fileProps, 
>> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
>>   CFDictionaryGetValueIfPresent( exif_dict, 
>> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
>>   iso_date = [isoDateFormatter_ dateFromString: (__bridge NSString * 
>> _Nonnull)(dateref) ];
>>   [datesAndTimes_ addObject: iso_date ];
>> 
>> (Plus some error checking, which I omit here.)
>> 
>> First of all, we can see that the vast majority of time is spent on 
>> CGImageSourceCreateWithURL().
>> Second, there seem to be some caching effects, although I have a hard time 
>> understanding that, but that is not the point.
>> Third, the durations are not linear; I guess it might have something to do 
>> with the sizes of the files, too, but again, didn't investigate further.
>> 
>> So, it looks to me like CGImageSourceCreateWithURL() really loads the 
>> complete image file.
>> 
>> I don't see why Ole Begemann (ref'ed in Alex' post) can claim his approach 
>> does not load the whole image.
>> 
>> 
>> Some people suggested parallelizing the whole task, using 
>> dispatch_queue_create or NSOperationQueue.
>> (Thanks Steve, Gary, Jack!)
>> Before restructuring my code for that, I would like to better understand why 
>> you think that will speed up things.
>> The code above pretty much does no computations, so most of the time is, I 
>> guess, spent on waiting for the data to arrive from hard disk.
>> So, why would would several threads loading those images in parallel help 
>> here? In my thinking, they will just compete for the same resource, i.e., 
>> hard disk.
>> 
>> 
>> I also googled quite a bit, to no avail.
>> 
>> Any and all hints, suggestions, and insights will be highly appreciated!
>> Best, Gab
>> 
>> 
>>> 
>> 
>> 
>>> if (!imageSourceRef)