Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-19 Thread Jay Gattuso
Hi Shea, 

There are heaps of tools that can assist you, you've been pointed towards the 
excellent ExifTool in previous threads. The command line version is very easy 
to work with, and I have made a few different tools that whip out, or change 
exif data where required. A very versatile tool that handles many other 
metadata types on top of exif data (like MS office files, ID3 etc).

Other candidate tools are:-  

Apache Tika - http://tika.apache.org/ - I use this quite a bit in testing, and 
wrangling various text based objects

Jhove - http://sourceforge.net/projects/jhove/ - this will pull out all the 
exif in a lump where you can do things with it. We use in the Rosetta 
validation stack, and it forms one of the processes that we use to 
automatically extract and capture exif data from supported image files. 

All these tools will give you a structured object (CSV, XML etc) that you can 
use to seed a next step process, e.g. ingest into a CMS or repository. 

J  
   

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Swauger,Shea
Sent: Wednesday, 18 December 2013 10:37 a.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: 
Possible or Pipedream?

Hi all,

I'm wondering if there is a systematic method that can extract metadata 
embedded in digital photographs and then ingest that metadata into a CMS and 
relate them to their corresponding images. We currently use DigiTool, if that 
makes a difference.

Thanks!

Shea Swauger
Data Management Librarian
Colorado State Univeristy


Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-19 Thread Simon Spero
Alfresco uses apache  tika to extract exif metadata from images. The tika
plugin to support is on github at https://github.com/Alfresco/tika-exiftool
.

oh.

On Dec 17, 2013 4:55 PM, Edward Summers e...@pobox.com wrote:

 I remember hearing somewhere that ExifTool is pretty good for extracting
 image metadata.

 edsu--



Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-19 Thread Richard Sarvas
I did some experimentation wrapping the Perl Image::ExifTool module (along with 
Image::OCR::Tesseract) in some code that exposed it as a SOAP service for use 
in a Fedora Commons ingest service. It seemed to work well enough for bulk file 
processing in testing, though the approach of a custom ingest system, in 
general, was eventually abandoned when consultants were brought in. 

Were I to do it again I'd probably also add a REST interface to the generic 
service wrapper.


Rick


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward 
Summers
Sent: Tuesday, December 17, 2013 4:54 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: 
Possible or Pipedream?

I remember hearing somewhere that ExifTool is pretty good for extracting image 
metadata. 

edsu--


Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-19 Thread Wilhelmina Randtke
Piwigo does this, so you can look at the source code to see how.

-Wilhelmina Randtke
On Dec 17, 2013 3:37 PM, Swauger,Shea shea.swau...@colostate.edu wrote:

 Hi all,

 I'm wondering if there is a systematic method that can extract metadata
 embedded in digital photographs and then ingest that metadata into a CMS
 and relate them to their corresponding images. We currently use DigiTool,
 if that makes a difference.

 Thanks!

 Shea Swauger
 Data Management Librarian
 Colorado State Univeristy



[CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-17 Thread Swauger,Shea
Hi all,

I'm wondering if there is a systematic method that can extract metadata 
embedded in digital photographs and then ingest that metadata into a CMS and 
relate them to their corresponding images. We currently use DigiTool, if that 
makes a difference.

Thanks!

Shea Swauger
Data Management Librarian
Colorado State Univeristy


Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-17 Thread Kyle Banerjee
Exiftool is what you need. Easy to use and works on any platform.

kyle


On Tue, Dec 17, 2013 at 1:37 PM, Swauger,Shea shea.swau...@colostate.eduwrote:

 Hi all,

 I'm wondering if there is a systematic method that can extract metadata
 embedded in digital photographs and then ingest that metadata into a CMS
 and relate them to their corresponding images. We currently use DigiTool,
 if that makes a difference.

 Thanks!

 Shea Swauger
 Data Management Librarian
 Colorado State Univeristy



Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-17 Thread Medina-Smith, Andrea
++1

___
Andrea Medina-Smith
Metadata Librarian
NIST Gaithersburg
andrea.medina-sm...@nist.gov
301-975-2592

Be Green! Think before you print this email. 

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle 
Banerjee
Sent: Tuesday, December 17, 2013 4:45 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: 
Possible or Pipedream?

Exiftool is what you need. Easy to use and works on any platform.

kyle


On Tue, Dec 17, 2013 at 1:37 PM, Swauger,Shea shea.swau...@colostate.eduwrote:

 Hi all,

 I'm wondering if there is a systematic method that can extract 
 metadata embedded in digital photographs and then ingest that metadata 
 into a CMS and relate them to their corresponding images. We currently 
 use DigiTool, if that makes a difference.

 Thanks!

 Shea Swauger
 Data Management Librarian
 Colorado State Univeristy



Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-17 Thread Monica Rivero

Hi Shea,

Well, one option you might explore is extracting metadata from images 
using exiftool (http://www.sno.phy.queensu.ca/~phil/exiftool/) to a CSV 
or TXT file and then convert this file to what ever tool or file format 
(xml) you use for batch import to your CMS. So semi-automated.


We currently do the reverse, embed metadata into images and then ingest 
to our IR (DSpace).


hope this helps,
Monica

On 12/17/2013 3:37 PM, Swauger,Shea wrote:

Hi all,

I'm wondering if there is a systematic method that can extract metadata 
embedded in digital photographs and then ingest that metadata into a CMS and 
relate them to their corresponding images. We currently use DigiTool, if that 
makes a difference.

Thanks!

Shea Swauger
Data Management Librarian
Colorado State Univeristy



Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-17 Thread Roy Tennant
I use EXIFTool to extract the EXIF metadata from images:

http://www.sno.phy.queensu.ca/~phil/exiftool/

I do this dynamically for all of the 8,000+ photos on FreeLargePhotos.com.
Here is an example of the text output:

http://freelargephotos.com/photos/003805/exif.txt

From there, you could parse that into whatever you wanted for import. Since
you would have the filename that may be sufficient to map it into the right
place in DigiTool (but I'm unfamiliar with it).
Roy


On Tue, Dec 17, 2013 at 1:37 PM, Swauger,Shea shea.swau...@colostate.eduwrote:

 Hi all,

 I'm wondering if there is a systematic method that can extract metadata
 embedded in digital photographs and then ingest that metadata into a CMS
 and relate them to their corresponding images. We currently use DigiTool,
 if that makes a difference.

 Thanks!

 Shea Swauger
 Data Management Librarian
 Colorado State Univeristy



Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-17 Thread Edward M. Corrado
Hi,

It is possible, at least the extraction part. I don;t know enough about
Digitool to know the deposit part. We wrote a series of shell scripts,
using exiftool (as I see others are suggesting). The output is then put
through a number of sed commands and outputs a file that can be deposited
into our digital preservation system, Rosetta.

Edward


On Tue, Dec 17, 2013 at 4:37 PM, Swauger,Shea shea.swau...@colostate.eduwrote:

 Hi all,

 I'm wondering if there is a systematic method that can extract metadata
 embedded in digital photographs and then ingest that metadata into a CMS
 and relate them to their corresponding images. We currently use DigiTool,
 if that makes a difference.

 Thanks!

 Shea Swauger
 Data Management Librarian
 Colorado State Univeristy



Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-17 Thread Patrick Murray-John

The extraction and ingestion seem like two different coins.

Lots of tools can extract. exiftool, or imagemagick, or whatever can 
extract the data.


Question then is how and where to insert it into the system you are using.

So, not a pipedream. Indeed extraction is very possible.

The harder part might be figuring out how or where to store the data in 
your system.


Then, assuming it's relevant, how/where your system actually displays or 
uses the data.


Depending on your system, that's where the pipedream question comes into 
play, I think.


Patrick

On 12/17/2013 04:45 PM, Kyle Banerjee wrote:

Exiftool is what you need. Easy to use and works on any platform.

kyle


On Tue, Dec 17, 2013 at 1:37 PM, Swauger,Shea shea.swau...@colostate.eduwrote:


Hi all,

I'm wondering if there is a systematic method that can extract metadata
embedded in digital photographs and then ingest that metadata into a CMS
and relate them to their corresponding images. We currently use DigiTool,
if that makes a difference.

Thanks!

Shea Swauger
Data Management Librarian
Colorado State Univeristy



Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-17 Thread Edward Summers
I remember hearing somewhere that ExifTool is pretty good for extracting image 
metadata. 

edsu--