date:20141123

Re: [CODE4LIB] Looking for a script to clean up OCR text files

2014-11-23 Thread Kevin Hawkins

It sounds like there are two sorts of things you need to clean up: a) OCR errors b) Formatting (like unnecessary line breaks) For the former, I understand that Adobe Acrobat and ABBYY FineReader have tools built in to spellchecking. PrimeOCR, an expensive OCR package, has a related package

Re: [CODE4LIB] Library Hours Fail

2014-11-23 Thread Fitchett, Deborah

We'd been using Andrew Darby's method and ran into this problem earlier this year. A (now ex-)colleague coded Calibr (https://github.com/LincolnUniLTL/calibr ) when we ran into this problem, and we've been running it since. Does depend on tidy csv though. Deborah -Original Message-

Re: [CODE4LIB] Looking for a script to clean up OCR text files

2014-11-23 Thread Monica Rivero

Hi Erica, We are working on a similar project converting concert performances from the past 20 years for our School of Music. though we use simple OCR for PDFs (supporting full text searching), we are selectively cleaning up OCR for metadata purposes. That is taking the first page of

Re: [CODE4LIB] Looking for a script to clean up OCR text files

Re: [CODE4LIB] Library Hours Fail

Re: [CODE4LIB] Looking for a script to clean up OCR text files

3 matches

Site Navigation

Mail list logo

Footer information