Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
Hi Shea, There are heaps of tools that can assist you, you've been pointed towards the excellent ExifTool in previous threads. The command line version is very easy to work with, and I have made a few different tools that whip out, or change exif data where required. A very versatile tool that handles many other metadata types on top of exif data (like MS office files, ID3 etc). Other candidate tools are:- Apache Tika - http://tika.apache.org/ - I use this quite a bit in testing, and wrangling various text based objects Jhove - http://sourceforge.net/projects/jhove/ - this will pull out all the exif in a lump where you can do things with it. We use in the Rosetta validation stack, and it forms one of the processes that we use to automatically extract and capture exif data from supported image files. All these tools will give you a structured object (CSV, XML etc) that you can use to seed a next step process, e.g. ingest into a CMS or repository. J -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Swauger,Shea Sent: Wednesday, 18 December 2013 10:37 a.m. To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream? Hi all, I'm wondering if there is a systematic method that can extract metadata embedded in digital photographs and then ingest that metadata into a CMS and relate them to their corresponding images. We currently use DigiTool, if that makes a difference. Thanks! Shea Swauger Data Management Librarian Colorado State Univeristy
Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
Alfresco uses apache tika to extract exif metadata from images. The tika plugin to support is on github at https://github.com/Alfresco/tika-exiftool . oh. On Dec 17, 2013 4:55 PM, Edward Summers e...@pobox.com wrote: I remember hearing somewhere that ExifTool is pretty good for extracting image metadata. edsu--
Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
I did some experimentation wrapping the Perl Image::ExifTool module (along with Image::OCR::Tesseract) in some code that exposed it as a SOAP service for use in a Fedora Commons ingest service. It seemed to work well enough for bulk file processing in testing, though the approach of a custom ingest system, in general, was eventually abandoned when consultants were brought in. Were I to do it again I'd probably also add a REST interface to the generic service wrapper. Rick -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward Summers Sent: Tuesday, December 17, 2013 4:54 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream? I remember hearing somewhere that ExifTool is pretty good for extracting image metadata. edsu--
Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
Piwigo does this, so you can look at the source code to see how. -Wilhelmina Randtke On Dec 17, 2013 3:37 PM, Swauger,Shea shea.swau...@colostate.edu wrote: Hi all, I'm wondering if there is a systematic method that can extract metadata embedded in digital photographs and then ingest that metadata into a CMS and relate them to their corresponding images. We currently use DigiTool, if that makes a difference. Thanks! Shea Swauger Data Management Librarian Colorado State Univeristy
[CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
Hi all, I'm wondering if there is a systematic method that can extract metadata embedded in digital photographs and then ingest that metadata into a CMS and relate them to their corresponding images. We currently use DigiTool, if that makes a difference. Thanks! Shea Swauger Data Management Librarian Colorado State Univeristy
Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
Exiftool is what you need. Easy to use and works on any platform. kyle On Tue, Dec 17, 2013 at 1:37 PM, Swauger,Shea shea.swau...@colostate.eduwrote: Hi all, I'm wondering if there is a systematic method that can extract metadata embedded in digital photographs and then ingest that metadata into a CMS and relate them to their corresponding images. We currently use DigiTool, if that makes a difference. Thanks! Shea Swauger Data Management Librarian Colorado State Univeristy
Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
++1 ___ Andrea Medina-Smith Metadata Librarian NIST Gaithersburg andrea.medina-sm...@nist.gov 301-975-2592 Be Green! Think before you print this email. -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle Banerjee Sent: Tuesday, December 17, 2013 4:45 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream? Exiftool is what you need. Easy to use and works on any platform. kyle On Tue, Dec 17, 2013 at 1:37 PM, Swauger,Shea shea.swau...@colostate.eduwrote: Hi all, I'm wondering if there is a systematic method that can extract metadata embedded in digital photographs and then ingest that metadata into a CMS and relate them to their corresponding images. We currently use DigiTool, if that makes a difference. Thanks! Shea Swauger Data Management Librarian Colorado State Univeristy
Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
Hi Shea, Well, one option you might explore is extracting metadata from images using exiftool (http://www.sno.phy.queensu.ca/~phil/exiftool/) to a CSV or TXT file and then convert this file to what ever tool or file format (xml) you use for batch import to your CMS. So semi-automated. We currently do the reverse, embed metadata into images and then ingest to our IR (DSpace). hope this helps, Monica On 12/17/2013 3:37 PM, Swauger,Shea wrote: Hi all, I'm wondering if there is a systematic method that can extract metadata embedded in digital photographs and then ingest that metadata into a CMS and relate them to their corresponding images. We currently use DigiTool, if that makes a difference. Thanks! Shea Swauger Data Management Librarian Colorado State Univeristy
Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
I use EXIFTool to extract the EXIF metadata from images: http://www.sno.phy.queensu.ca/~phil/exiftool/ I do this dynamically for all of the 8,000+ photos on FreeLargePhotos.com. Here is an example of the text output: http://freelargephotos.com/photos/003805/exif.txt From there, you could parse that into whatever you wanted for import. Since you would have the filename that may be sufficient to map it into the right place in DigiTool (but I'm unfamiliar with it). Roy On Tue, Dec 17, 2013 at 1:37 PM, Swauger,Shea shea.swau...@colostate.eduwrote: Hi all, I'm wondering if there is a systematic method that can extract metadata embedded in digital photographs and then ingest that metadata into a CMS and relate them to their corresponding images. We currently use DigiTool, if that makes a difference. Thanks! Shea Swauger Data Management Librarian Colorado State Univeristy
Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
Hi, It is possible, at least the extraction part. I don;t know enough about Digitool to know the deposit part. We wrote a series of shell scripts, using exiftool (as I see others are suggesting). The output is then put through a number of sed commands and outputs a file that can be deposited into our digital preservation system, Rosetta. Edward On Tue, Dec 17, 2013 at 4:37 PM, Swauger,Shea shea.swau...@colostate.eduwrote: Hi all, I'm wondering if there is a systematic method that can extract metadata embedded in digital photographs and then ingest that metadata into a CMS and relate them to their corresponding images. We currently use DigiTool, if that makes a difference. Thanks! Shea Swauger Data Management Librarian Colorado State Univeristy
Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
The extraction and ingestion seem like two different coins. Lots of tools can extract. exiftool, or imagemagick, or whatever can extract the data. Question then is how and where to insert it into the system you are using. So, not a pipedream. Indeed extraction is very possible. The harder part might be figuring out how or where to store the data in your system. Then, assuming it's relevant, how/where your system actually displays or uses the data. Depending on your system, that's where the pipedream question comes into play, I think. Patrick On 12/17/2013 04:45 PM, Kyle Banerjee wrote: Exiftool is what you need. Easy to use and works on any platform. kyle On Tue, Dec 17, 2013 at 1:37 PM, Swauger,Shea shea.swau...@colostate.eduwrote: Hi all, I'm wondering if there is a systematic method that can extract metadata embedded in digital photographs and then ingest that metadata into a CMS and relate them to their corresponding images. We currently use DigiTool, if that makes a difference. Thanks! Shea Swauger Data Management Librarian Colorado State Univeristy
Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?
I remember hearing somewhere that ExifTool is pretty good for extracting image metadata. edsu--