Re: [CODE4LIB] Looking for a script to clean up OCR text files

2014-11-23 Thread Monica Rivero
Hi Erica, We are working on a similar project converting concert performances from the past 20 years for our School of Music. though we use simple OCR for PDFs (supporting full text searching), we are selectively cleaning up OCR for metadata purposes. That is taking the first page of

Re: [CODE4LIB] Automated Embedded Metadata Extraction in Photographs: Possible or Pipedream?

2013-12-17 Thread Monica Rivero
Hi Shea, Well, one option you might explore is extracting metadata from images using exiftool (http://www.sno.phy.queensu.ca/~phil/exiftool/) to a CSV or TXT file and then convert this file to what ever tool or file format (xml) you use for batch import to your CMS. So semi-automated. We

Re: [CODE4LIB] Question for Institutional Repository Folks

2013-10-28 Thread Monica Rivero
If you have adobe acrobat professional software, you can use the option FileCreateCombine files into one single PDF. This will combine the password-protected PDF plus a coversheet PDF containing the metadata you are looking to add. Good luck! Monica On 10/28/2013 1:16 PM, Matthew Sherman

Re: [CODE4LIB] Tool to highlight differences in two files

2013-04-23 Thread Monica Rivero
Hi Wilhelmina, We've used oXygen and Text Wrangler (but only for macs). regards, Monica On 4/23/2013 3:24 PM, Wilhelmina Randtke wrote: I would like to compare versions of a website scraped at different times to see what paragraphs on a page have changed. Does anyone here know of a tool for