Re: Retrieving the EXIF date/time from 250k images

2022-08-16 Thread Alex Zavatone via Cocoa-dev
I love experimenting with processes like this.  Create as many operation 
processes as you have cores on your box and run a few tests to see how quickly 
the operations complete.  Try it with  2x the number of cores, 1x the number of 
cores, 1/2 x the number of cores and the number of cores + 1 and the number of 
cores - 1 to see if there are any substantial differences in time to completion.

But as the others have said, GCD and NSOperationQueue were made for these 
things.  If you’re actually trying to get work done and not researching for 
your own benefit, those are the ones to use.  They do all the heavy lifting for 
you are are easy to implement.

Cheers.
Alex Zavatone

> On Aug 16, 2022, at 2:37 PM, Jack Brindle via Cocoa-dev 
>  wrote:
> 
> Instead of using NSOperationQueue, I would use GCD to handle the tasks. 
> Create a new Concurrent queue 
> (dispatch_queue_create(DISPATCH_QUEUE_CONCURRENT)), then enqueue the 
> individual items to the queue for processing (dispatch_async(), using the 
> queue created above). Everything can be handled in blocks, including the 
> completion routines. As Christian says the problem then is that data may not 
> be in the original order so you will probably want to sort the returned 
> objects when done. This should significantly speed up the time to do the 
> whole task.
> 
> Jack
> 
> 
>> On Aug 16, 2022, at 12:26 PM, Steve Christensen via Cocoa-dev 
>>  wrote:
>> 
>> You mentioned creating and managing threads on your own, but that’s what 
>> NSOperationQueue —and the lower-level DispatchQueue— does. It also will be 
>> more efficient with thread management since it has an intimate understanding 
>> of the capabilities of the processor, etc., and will work to do the “right 
>> thing” on a per-device basis.
>> 
>> By leveraging NSOperationQueue and then keeping each of the queue operations 
>> focused on a single file then you’re not complicating the management of what 
>> to do next since most of that is handled for you. Let NSManagedObjectQueue 
>> do the heavy lifting (scheduling work) and focus on your part of the task 
>> (performing the work).
>> 
>> Steve
>> 
>>> On Aug 16, 2022, at 8:41 AM, Gabriel Zachmann  wrote:
>>> 
>>> That is a good idea.  Thanks a lot!
>>> 
>>> Maybe, I can turn this into more fine-grained, dynamic load balancing (or 
>>> latency hiding), as follows:
>>> create a number of threads (workers);
>>> as soon as a worker is finished with their "current" image, it gets the 
>>> next one (a piece of work) out of the list, processes it, and stores the 
>>> iso_date in the output array (dates_and_times).
>>> Both accesses to the pointer to the currently next piece of work, and the 
>>> output array would need to be made exclusive, of course.
>>> 
>>> Best regards, Gabriel
>> 
>> ___
>> 
>> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
>> 
>> Please do not post admin requests or moderator comments to the list.
>> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>> 
>> Help/Unsubscribe/Update your Subscription:
>> https://lists.apple.com/mailman/options/cocoa-dev/jackbrindle%40me.com
>> 
>> This email sent to jackbrin...@me.com
> 
> ___
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/zav%40mac.com
> 
> This email sent to z...@mac.com

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-16 Thread Jack Brindle via Cocoa-dev
Instead of using NSOperationQueue, I would use GCD to handle the tasks. Create 
a new Concurrent queue (dispatch_queue_create(DISPATCH_QUEUE_CONCURRENT)), then 
enqueue the individual items to the queue for processing (dispatch_async(), 
using the queue created above). Everything can be handled in blocks, including 
the completion routines. As Christian says the problem then is that data may 
not be in the original order so you will probably want to sort the returned 
objects when done. This should significantly speed up the time to do the whole 
task.

Jack


> On Aug 16, 2022, at 12:26 PM, Steve Christensen via Cocoa-dev 
>  wrote:
> 
> You mentioned creating and managing threads on your own, but that’s what 
> NSOperationQueue —and the lower-level DispatchQueue— does. It also will be 
> more efficient with thread management since it has an intimate understanding 
> of the capabilities of the processor, etc., and will work to do the “right 
> thing” on a per-device basis.
> 
> By leveraging NSOperationQueue and then keeping each of the queue operations 
> focused on a single file then you’re not complicating the management of what 
> to do next since most of that is handled for you. Let NSManagedObjectQueue do 
> the heavy lifting (scheduling work) and focus on your part of the task 
> (performing the work).
> 
> Steve
> 
>> On Aug 16, 2022, at 8:41 AM, Gabriel Zachmann  wrote:
>> 
>> That is a good idea.  Thanks a lot!
>> 
>> Maybe, I can turn this into more fine-grained, dynamic load balancing (or 
>> latency hiding), as follows:
>> create a number of threads (workers);
>> as soon as a worker is finished with their "current" image, it gets the next 
>> one (a piece of work) out of the list, processes it, and stores the iso_date 
>> in the output array (dates_and_times).
>> Both accesses to the pointer to the currently next piece of work, and the 
>> output array would need to be made exclusive, of course.
>> 
>> Best regards, Gabriel
> 
> ___
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/jackbrindle%40me.com
> 
> This email sent to jackbrin...@me.com

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-16 Thread Steve Christensen via Cocoa-dev
You mentioned creating and managing threads on your own, but that’s what 
NSOperationQueue —and the lower-level DispatchQueue— does. It also will be more 
efficient with thread management since it has an intimate understanding of the 
capabilities of the processor, etc., and will work to do the “right thing” on a 
per-device basis.

By leveraging NSOperationQueue and then keeping each of the queue operations 
focused on a single file then you’re not complicating the management of what to 
do next since most of that is handled for you. Let NSManagedObjectQueue do the 
heavy lifting (scheduling work) and focus on your part of the task (performing 
the work).

Steve

> On Aug 16, 2022, at 8:41 AM, Gabriel Zachmann  wrote:
> 
> That is a good idea.  Thanks a lot!
> 
> Maybe, I can turn this into more fine-grained, dynamic load balancing (or 
> latency hiding), as follows:
> create a number of threads (workers);
> as soon as a worker is finished with their "current" image, it gets the next 
> one (a piece of work) out of the list, processes it, and stores the iso_date 
> in the output array (dates_and_times).
> Both accesses to the pointer to the currently next piece of work, and the 
> output array would need to be made exclusive, of course.
> 
> Best regards, Gabriel

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-16 Thread Gabriel Zachmann via Cocoa-dev
That is a good idea.  Thanks a lot!

Maybe, I can turn this into more fine-grained, dynamic load balancing (or 
latency hiding), as follows:
create a number of threads (workers);
as soon as a worker is finished with their "current" image, it gets the next 
one (a piece of work) out of the list, processes it, and stores the iso_date in 
the output array (dates_and_times).
Both accesses to the pointer to the currently next piece of work, and the 
output array would need to be made exclusive, of course.

Best regards, Gabriel




smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-16 Thread Steve Christensen via Cocoa-dev
One way to speed it up is to do as much work as possible in parallel. One way 
—and this is just off the top of my head— is:

1. Create a NSOperationQueue, and add a single operation on that queue to 
manage the entire process. (This is because some parts of the process are 
synchronous and might take a while and you don’t want to block the UI thread.)

2. The operation would create another worker NSOperationQueue where operations 
are added that each process a single image file (the contents of your `for` 
loop).

3. The manager operation adds operations to the worker queue to process a 
reasonable chunk of the files (10? 50?) and then waits for those operations to 
complete. (NSOperationQueue has something like a “wait until done” method.) It 
then repeats until all the image files have been processed.

4. As each chunk completes, it can report status to the UI thread via a 
notification or some other means.

Unlike your synchronous implementation, below, the order of updates to that 
array is indeterminate. A way to fix it is to pre-populate it with as many 
placeholder items (NSDate.distantPast?) as are in imagefiles and then store 
iso_date at the same index as its corresponding filename. Another benefit is 
that there is a single memory allocation at the beginning rather than periodic 
resizes of the array (and copying the existing contents) as items are added.

And since all these items are running on different threads then you need to 
protect access to your dates_and_times array because modifying it is not 
thread-safe. One quick way is to create a NSLock and lock it around the array 
update:

[theLock lock];
dates_and_times[index] = iso_date;
[theLock unlock];

Anyway, another way to look at the process.

Steve


> On Aug 14, 2022, at 2:22 PM, Gabriel Zachmann via Cocoa-dev 
>  wrote:
> 
> I would like to collect the date/time stored in an EXIF tag in a bunch of 
> images.
> 
> I thought I could do so with the following procedure
> (some details and error checking omitted for sake of clarity):
> 
> 
>NSMutableArray * dates_and_times = [NSMutableArray arrayWithCapacity: 
> [imagefiles count]];
>CFDictionaryRef exif_dict;
>CFStringRef dateref = NULL;
>for ( NSString* filename in imagefiles )
>{
>NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
> // escapes any chars that are not allowed in URLs (space, &, etc.)
>CGImageSourceRef image = CGImageSourceCreateWithURL( (__bridge 
> CFURLRef) imgurl, NULL );
>CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, 
> 0, NULL );
>bool success = CFDictionaryGetValueIfPresent( fileProps, 
> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
>success = CFDictionaryGetValueIfPresent( exif_dict, 
> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
>NSString * date_str = [[NSString alloc] initWithString: (__bridge 
> NSString * _Nonnull)( dateref ) ];
>NSDate * iso_date = [isoDateFormatter_ dateFromString: date_str];
>if ( iso_date )
> [dates_and_times addObject: iso_date ];
>CFRelease( fileProps );
>}
> 
> 
> But, I get the impression, this code actually loads each and every image.
> On my Macbook, it takes 3m30s for 250k images (130GB).
> 
> So, the big question is: can it be done faster?
> 
> I know the EXIF tags are part of the image file, but I was hoping it might be 
> possible to load only those EXIF dictionaries.
> Or are the CGImage functions above already clever enough to implement this 
> idea?
> 
> 
> Best regards, Gab.

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com