Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Gary L. Wade via Cocoa-dev
You don’t seem to understand modern filesystems or modern hardware like some of 
us.
--
Gary L. Wade
https://www.garywade.com/ 

> On Jan 7, 2023, at 3:29 PM, Gabriel Zachmann  wrote:
> 
> *Maybe* ...
> that would mean that the filesystem performs predictive caching like the 
> CPU's cache / memory management does ... 
> 
> Also, I think you might even get worse performance: imagine several threads 
> reading different files *at the same time* - those files could lie on 
> different HD cylinders far apart from each other, so the disk heads would 
> have to move back and forth during those concurrent read operations, wouldn't 
> it? unless the disk would do some very clever caching / batching / prediction 
> itself ...
> 
> G.
> 
>> Since file systems and the associated hardware are designed to be efficient 
>> by caching data and knowing things like where one file or block is in 
>> relation to another, there’s a possibility these mechanisms could work to 
>> your advantage, pulling in the data while “in the neighborhood.” There’s no 
>> guarantee, of course, but I feel it’s worth considering.
>> --
>> Gary L. Wade
>> http://www.garywade.com/
>> 
>>> On Jan 7, 2023, at 10:37 AM, Gabriel Zachmann via Cocoa-dev 
>>>  wrote:
>>> 
>>> So, why would would several threads loading those images in parallel help 
>>> here? In my thinking, they will just compete for the same resource, i.e., 
>>> hard disk.
>>> 
>> 
> 

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Gabriel Zachmann via Cocoa-dev
>
> I wonder if there is a method to load the EXIF data out of the files without 
> opening them completely.  That would seem like the ideal approach.

That's the approach I am looking for, but I haven't found any API to do that.

Best, G.



smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Gabriel Zachmann via Cocoa-dev
*Maybe* ...
that would mean that the filesystem performs predictive caching like the CPU's 
cache / memory management does ...

Also, I think you might even get worse performance: imagine several threads 
reading different files *at the same time* - those files could lie on different 
HD cylinders far apart from each other, so the disk heads would have to move 
back and forth during those concurrent read operations, wouldn't it? unless the 
disk would do some very clever caching / batching / prediction itself ...

G.

> Since file systems and the associated hardware are designed to be efficient 
> by caching data and knowing things like where one file or block is in 
> relation to another, there’s a possibility these mechanisms could work to 
> your advantage, pulling in the data while “in the neighborhood.” There’s no 
> guarantee, of course, but I feel it’s worth considering.
> --
> Gary L. Wade
> http://www.garywade.com/
>
>> On Jan 7, 2023, at 10:37 AM, Gabriel Zachmann via Cocoa-dev 
>>  wrote:
>>
>> So, why would would several threads loading those images in parallel help 
>> here? In my thinking, they will just compete for the same resource, i.e., 
>> hard disk.
>>
>



smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Alex Zavatone via Cocoa-dev
Hi Gabe.  I’d add basic logging  before you start each image and after you 
complete each image to see how much each is taking on each of problem tests so 
you can see the extent of how slow it is on your problem platforms.

Then you can add more logging to expose the problems and start to address them 
once you see where the bottlenecks are.

I wonder if there is a method to load the EXIF data out of the files without 
opening them completely.  That would seem like the ideal approach.

Cheers,
Alex Zavatone

> On Jan 7, 2023, at 12:36 PM, Gabriel Zachmann  wrote:
> 
> Hi Alex, hi everyone,
> 
> thanks a lot for the many suggestions!
> And sorry for following up on this so late!
> I hope you are still willing to engage in this discussion.
> 
> Yes, Alex, I agree in that the main question is:
> how can I get the metadata of a large amount of images (say, 100k-300k) 
> *without* actually loading the whole image files.
> (For your reference: I am interested in the date tags embedded in the EXIF 
> dictionary, and those dates will be read just once per image, then cached in 
> a dictionary containing filename & dates, and that dictionary will get stored 
> on disk for future use by the app.)
> 
>> CGImageSourceRef imageSourceRef = 
>> CGImageSourceCreateWithURL((CFURLRef)imageUrl, NULL);
> 
> I have tried this:
> 
>   for ( NSString* filename in imagefiles ) 
>   {
>  NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
> CGImageSourceRef sourceref = CGImageSourceCreateWithURL( (__bridge 
> CFURLRef) imgurl, NULL );
>   }
> 
> This takes 1 minute for around 300k images stored on my internal SSD.
> That would be OK.
> 
> However! .. if performed on a folder stored on an external hard disk, I get 
> the following timings:
> 
>  - 20 min for 150k images (45 GB) 
>  - 12 min for 150k images (45 GB), second time
>  - 150 sec for 25k images (18 GB)
>  - 170 sec for 25k images (18 GB), with the lines below (*)
>  - 80 sec for 22k (3 GB) images
>  - 80 sec for 22k (3 GB) images, with the lines below (*)
> 
> All experiments were done on different folders on the same hard disk, WD 
> MyPassport Ultra, 1 TB, USB-A connector to Macbook Air M2.
> Timings with the same number of files/GB were the same folders, resp.
> 
> (*): these were timings where I added the following lines to the loop:
> 
>CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, 
> 0, NULL );
>bool success = CFDictionaryGetValueIfPresent( fileProps, 
> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
>CFDictionaryGetValueIfPresent( exif_dict, 
> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
>iso_date = [isoDateFormatter_ dateFromString: (__bridge NSString * 
> _Nonnull)(dateref) ];
>[datesAndTimes_ addObject: iso_date ];
> 
> (Plus some error checking, which I omit here.)
> 
> First of all, we can see that the vast majority of time is spent on 
> CGImageSourceCreateWithURL().
> Second, there seem to be some caching effects, although I have a hard time 
> understanding that, but that is not the point.
> Third, the durations are not linear; I guess it might have something to do 
> with the sizes of the files, too, but again, didn't investigate further.
> 
> So, it looks to me like CGImageSourceCreateWithURL() really loads the 
> complete image file.
> 
> I don't see why Ole Begemann (ref'ed in Alex' post) can claim his approach 
> does not load the whole image.
> 
> 
> Some people suggested parallelizing the whole task, using 
> dispatch_queue_create or NSOperationQueue.
> (Thanks Steve, Gary, Jack!)
> Before restructuring my code for that, I would like to better understand why 
> you think that will speed up things.
> The code above pretty much does no computations, so most of the time is, I 
> guess, spent on waiting for the data to arrive from hard disk.
> So, why would would several threads loading those images in parallel help 
> here? In my thinking, they will just compete for the same resource, i.e., 
> hard disk.
> 
> 
> I also googled quite a bit, to no avail.
> 
> Any and all hints, suggestions, and insights will be highly appreciated!
> Best, Gab
> 
> 
>> 
> 
> 
>> if (!imageSourceRef)
>> return;
>> 
>> CFDictionaryRef props = CGImageSourceCopyPropertiesAtIndex(imageSourceRef, 
>> 0, NULL);
>> 
>> NSDictionary *properties = (NSDictionary*)CFBridgingRelease(props);
>> 
>> if (!properties) {
>> return;
>> }
>> 
>> NSNumber *height = [properties objectForKey:@"PixelHeight"];
>> NSNumber *width = [properties objectForKey:@"PixelWidth"];
>> int height = 0;
>> int width = 0;
>> 
>> if (height) {
>> height = [height intValue];
>> }
>> if (width) {
>> width = [width intValue];
>> }
>> 
>> 
>> Or this link by Ole Bergmann?
>> 
>> https://oleb.net/blog/2011/09/accessing-image-properties-without-loading-the-image-into-memory/
>> 
>> I love these questions.  I find out more about iOS programming 

Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Gary L. Wade via Cocoa-dev
Since file systems and the associated hardware are designed to be efficient by 
caching data and knowing things like where one file or block is in relation to 
another, there’s a possibility these mechanisms could work to your advantage, 
pulling in the data while “in the neighborhood.” There’s no guarantee, of 
course, but I feel it’s worth considering.
--
Gary L. Wade
http://www.garywade.com/

> On Jan 7, 2023, at 10:37 AM, Gabriel Zachmann via Cocoa-dev 
>  wrote:
> 
> So, why would would several threads loading those images in parallel help 
> here? In my thinking, they will just compete for the same resource, i.e., 
> hard disk.
> 

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Gabriel Zachmann via Cocoa-dev
Hi Alex, hi everyone,

thanks a lot for the many suggestions!
And sorry for following up on this so late!
I hope you are still willing to engage in this discussion.

Yes, Alex, I agree in that the main question is:
how can I get the metadata of a large amount of images (say, 100k-300k) 
*without* actually loading the whole image files.
(For your reference: I am interested in the date tags embedded in the EXIF 
dictionary, and those dates will be read just once per image, then cached in a 
dictionary containing filename & dates, and that dictionary will get stored on 
disk for future use by the app.)

> CGImageSourceRef imageSourceRef = 
> CGImageSourceCreateWithURL((CFURLRef)imageUrl, NULL);

I have tried this:

   for ( NSString* filename in imagefiles )
   {
  NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
 CGImageSourceRef sourceref = CGImageSourceCreateWithURL( (__bridge 
CFURLRef) imgurl, NULL );
   }

This takes 1 minute for around 300k images stored on my internal SSD.
That would be OK.

However! .. if performed on a folder stored on an external hard disk, I get the 
following timings:

  - 20 min for 150k images (45 GB)
  - 12 min for 150k images (45 GB), second time
  - 150 sec for 25k images (18 GB)
  - 170 sec for 25k images (18 GB), with the lines below (*)
  - 80 sec for 22k (3 GB) images
  - 80 sec for 22k (3 GB) images, with the lines below (*)

All experiments were done on different folders on the same hard disk, WD 
MyPassport Ultra, 1 TB, USB-A connector to Macbook Air M2.
Timings with the same number of files/GB were the same folders, resp.

(*): these were timings where I added the following lines to the loop:

CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, 
0, NULL );
bool success = CFDictionaryGetValueIfPresent( fileProps, 
kCGImagePropertyExifDictionary, (const void **) & exif_dict );
CFDictionaryGetValueIfPresent( exif_dict, 
kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
iso_date = [isoDateFormatter_ dateFromString: (__bridge NSString * 
_Nonnull)(dateref) ];
[datesAndTimes_ addObject: iso_date ];

(Plus some error checking, which I omit here.)

First of all, we can see that the vast majority of time is spent on 
CGImageSourceCreateWithURL().
Second, there seem to be some caching effects, although I have a hard time 
understanding that, but that is not the point.
Third, the durations are not linear; I guess it might have something to do with 
the sizes of the files, too, but again, didn't investigate further.

So, it looks to me like CGImageSourceCreateWithURL() really loads the complete 
image file.

I don't see why Ole Begemann (ref'ed in Alex' post) can claim his approach does 
not load the whole image.


Some people suggested parallelizing the whole task, using dispatch_queue_create 
or NSOperationQueue.
(Thanks Steve, Gary, Jack!)
Before restructuring my code for that, I would like to better understand why 
you think that will speed up things.
The code above pretty much does no computations, so most of the time is, I 
guess, spent on waiting for the data to arrive from hard disk.
So, why would would several threads loading those images in parallel help here? 
In my thinking, they will just compete for the same resource, i.e., hard disk.


I also googled quite a bit, to no avail.

Any and all hints, suggestions, and insights will be highly appreciated!
Best, Gab


>


> if (!imageSourceRef)
> return;
>
> CFDictionaryRef props = CGImageSourceCopyPropertiesAtIndex(imageSourceRef, 0, 
> NULL);
>
> NSDictionary *properties = (NSDictionary*)CFBridgingRelease(props);
>
> if (!properties) {
> return;
> }
>
> NSNumber *height = [properties objectForKey:@"PixelHeight"];
> NSNumber *width = [properties objectForKey:@"PixelWidth"];
> int height = 0;
> int width = 0;
>
> if (height) {
> height = [height intValue];
> }
> if (width) {
> width = [width intValue];
> }
>
>
> Or this link by Ole Bergmann?
>
> https://oleb.net/blog/2011/09/accessing-image-properties-without-loading-the-image-into-memory/
>
> I love these questions.  I find out more about iOS programming by researching 
> other people’s problems than the ones that I’m currently faced with.
>
> Hopefully some of these will help.
>
> Cheers,
> Alex Zavatone



smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com