Re: Retrieving the EXIF date/time from 250k images
I love experimenting with processes like this. Create as many operation processes as you have cores on your box and run a few tests to see how quickly the operations complete. Try it with 2x the number of cores, 1x the number of cores, 1/2 x the number of cores and the number of cores + 1 and the number of cores - 1 to see if there are any substantial differences in time to completion. But as the others have said, GCD and NSOperationQueue were made for these things. If you’re actually trying to get work done and not researching for your own benefit, those are the ones to use. They do all the heavy lifting for you are are easy to implement. Cheers. Alex Zavatone > On Aug 16, 2022, at 2:37 PM, Jack Brindle via Cocoa-dev > wrote: > > Instead of using NSOperationQueue, I would use GCD to handle the tasks. > Create a new Concurrent queue > (dispatch_queue_create(DISPATCH_QUEUE_CONCURRENT)), then enqueue the > individual items to the queue for processing (dispatch_async(), using the > queue created above). Everything can be handled in blocks, including the > completion routines. As Christian says the problem then is that data may not > be in the original order so you will probably want to sort the returned > objects when done. This should significantly speed up the time to do the > whole task. > > Jack > > >> On Aug 16, 2022, at 12:26 PM, Steve Christensen via Cocoa-dev >> wrote: >> >> You mentioned creating and managing threads on your own, but that’s what >> NSOperationQueue —and the lower-level DispatchQueue— does. It also will be >> more efficient with thread management since it has an intimate understanding >> of the capabilities of the processor, etc., and will work to do the “right >> thing” on a per-device basis. >> >> By leveraging NSOperationQueue and then keeping each of the queue operations >> focused on a single file then you’re not complicating the management of what >> to do next since most of that is handled for you. Let NSManagedObjectQueue >> do the heavy lifting (scheduling work) and focus on your part of the task >> (performing the work). >> >> Steve >> >>> On Aug 16, 2022, at 8:41 AM, Gabriel Zachmann wrote: >>> >>> That is a good idea. Thanks a lot! >>> >>> Maybe, I can turn this into more fine-grained, dynamic load balancing (or >>> latency hiding), as follows: >>> create a number of threads (workers); >>> as soon as a worker is finished with their "current" image, it gets the >>> next one (a piece of work) out of the list, processes it, and stores the >>> iso_date in the output array (dates_and_times). >>> Both accesses to the pointer to the currently next piece of work, and the >>> output array would need to be made exclusive, of course. >>> >>> Best regards, Gabriel >> >> ___ >> >> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) >> >> Please do not post admin requests or moderator comments to the list. >> Contact the moderators at cocoa-dev-admins(at)lists.apple.com >> >> Help/Unsubscribe/Update your Subscription: >> https://lists.apple.com/mailman/options/cocoa-dev/jackbrindle%40me.com >> >> This email sent to jackbrin...@me.com > > ___ > > Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) > > Please do not post admin requests or moderator comments to the list. > Contact the moderators at cocoa-dev-admins(at)lists.apple.com > > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/cocoa-dev/zav%40mac.com > > This email sent to z...@mac.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Retrieving the EXIF date/time from 250k images
Instead of using NSOperationQueue, I would use GCD to handle the tasks. Create a new Concurrent queue (dispatch_queue_create(DISPATCH_QUEUE_CONCURRENT)), then enqueue the individual items to the queue for processing (dispatch_async(), using the queue created above). Everything can be handled in blocks, including the completion routines. As Christian says the problem then is that data may not be in the original order so you will probably want to sort the returned objects when done. This should significantly speed up the time to do the whole task. Jack > On Aug 16, 2022, at 12:26 PM, Steve Christensen via Cocoa-dev > wrote: > > You mentioned creating and managing threads on your own, but that’s what > NSOperationQueue —and the lower-level DispatchQueue— does. It also will be > more efficient with thread management since it has an intimate understanding > of the capabilities of the processor, etc., and will work to do the “right > thing” on a per-device basis. > > By leveraging NSOperationQueue and then keeping each of the queue operations > focused on a single file then you’re not complicating the management of what > to do next since most of that is handled for you. Let NSManagedObjectQueue do > the heavy lifting (scheduling work) and focus on your part of the task > (performing the work). > > Steve > >> On Aug 16, 2022, at 8:41 AM, Gabriel Zachmann wrote: >> >> That is a good idea. Thanks a lot! >> >> Maybe, I can turn this into more fine-grained, dynamic load balancing (or >> latency hiding), as follows: >> create a number of threads (workers); >> as soon as a worker is finished with their "current" image, it gets the next >> one (a piece of work) out of the list, processes it, and stores the iso_date >> in the output array (dates_and_times). >> Both accesses to the pointer to the currently next piece of work, and the >> output array would need to be made exclusive, of course. >> >> Best regards, Gabriel > > ___ > > Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) > > Please do not post admin requests or moderator comments to the list. > Contact the moderators at cocoa-dev-admins(at)lists.apple.com > > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/cocoa-dev/jackbrindle%40me.com > > This email sent to jackbrin...@me.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Retrieving the EXIF date/time from 250k images
You mentioned creating and managing threads on your own, but that’s what NSOperationQueue —and the lower-level DispatchQueue— does. It also will be more efficient with thread management since it has an intimate understanding of the capabilities of the processor, etc., and will work to do the “right thing” on a per-device basis. By leveraging NSOperationQueue and then keeping each of the queue operations focused on a single file then you’re not complicating the management of what to do next since most of that is handled for you. Let NSManagedObjectQueue do the heavy lifting (scheduling work) and focus on your part of the task (performing the work). Steve > On Aug 16, 2022, at 8:41 AM, Gabriel Zachmann wrote: > > That is a good idea. Thanks a lot! > > Maybe, I can turn this into more fine-grained, dynamic load balancing (or > latency hiding), as follows: > create a number of threads (workers); > as soon as a worker is finished with their "current" image, it gets the next > one (a piece of work) out of the list, processes it, and stores the iso_date > in the output array (dates_and_times). > Both accesses to the pointer to the currently next piece of work, and the > output array would need to be made exclusive, of course. > > Best regards, Gabriel ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Retrieving the EXIF date/time from 250k images
That is a good idea. Thanks a lot! Maybe, I can turn this into more fine-grained, dynamic load balancing (or latency hiding), as follows: create a number of threads (workers); as soon as a worker is finished with their "current" image, it gets the next one (a piece of work) out of the list, processes it, and stores the iso_date in the output array (dates_and_times). Both accesses to the pointer to the currently next piece of work, and the output array would need to be made exclusive, of course. Best regards, Gabriel smime.p7s Description: S/MIME cryptographic signature ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Retrieving the EXIF date/time from 250k images
One way to speed it up is to do as much work as possible in parallel. One way —and this is just off the top of my head— is: 1. Create a NSOperationQueue, and add a single operation on that queue to manage the entire process. (This is because some parts of the process are synchronous and might take a while and you don’t want to block the UI thread.) 2. The operation would create another worker NSOperationQueue where operations are added that each process a single image file (the contents of your `for` loop). 3. The manager operation adds operations to the worker queue to process a reasonable chunk of the files (10? 50?) and then waits for those operations to complete. (NSOperationQueue has something like a “wait until done” method.) It then repeats until all the image files have been processed. 4. As each chunk completes, it can report status to the UI thread via a notification or some other means. Unlike your synchronous implementation, below, the order of updates to that array is indeterminate. A way to fix it is to pre-populate it with as many placeholder items (NSDate.distantPast?) as are in imagefiles and then store iso_date at the same index as its corresponding filename. Another benefit is that there is a single memory allocation at the beginning rather than periodic resizes of the array (and copying the existing contents) as items are added. And since all these items are running on different threads then you need to protect access to your dates_and_times array because modifying it is not thread-safe. One quick way is to create a NSLock and lock it around the array update: [theLock lock]; dates_and_times[index] = iso_date; [theLock unlock]; Anyway, another way to look at the process. Steve > On Aug 14, 2022, at 2:22 PM, Gabriel Zachmann via Cocoa-dev > wrote: > > I would like to collect the date/time stored in an EXIF tag in a bunch of > images. > > I thought I could do so with the following procedure > (some details and error checking omitted for sake of clarity): > > >NSMutableArray * dates_and_times = [NSMutableArray arrayWithCapacity: > [imagefiles count]]; >CFDictionaryRef exif_dict; >CFStringRef dateref = NULL; >for ( NSString* filename in imagefiles ) >{ >NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO]; > // escapes any chars that are not allowed in URLs (space, &, etc.) >CGImageSourceRef image = CGImageSourceCreateWithURL( (__bridge > CFURLRef) imgurl, NULL ); >CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, > 0, NULL ); >bool success = CFDictionaryGetValueIfPresent( fileProps, > kCGImagePropertyExifDictionary, (const void **) & exif_dict ); >success = CFDictionaryGetValueIfPresent( exif_dict, > kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref ); >NSString * date_str = [[NSString alloc] initWithString: (__bridge > NSString * _Nonnull)( dateref ) ]; >NSDate * iso_date = [isoDateFormatter_ dateFromString: date_str]; >if ( iso_date ) > [dates_and_times addObject: iso_date ]; >CFRelease( fileProps ); >} > > > But, I get the impression, this code actually loads each and every image. > On my Macbook, it takes 3m30s for 250k images (130GB). > > So, the big question is: can it be done faster? > > I know the EXIF tags are part of the image file, but I was hoping it might be > possible to load only those EXIF dictionaries. > Or are the CGImage functions above already clever enough to implement this > idea? > > > Best regards, Gab. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com