Hi Padraic,
I have uploaded a shell script which happens to implement Robert
Haschart's recipe:
https://github.com/pietsch/Data-Munging/blob/master/ocr4pdf.sh
Enjoy!
Christian
On Fri, Oct 18, 2013 at 10:22:17AM +0100, Padraic Stack wrote:
I would love to see that bash script if you could
On Oct 16, 2013, at 10:56 AM, Robert Haschart rh...@virginia.edu wrote:
The abstract extraction routine I have been working on does use
tesseract internally for doing OCR when it encounters a document that
doesn't have usable full-text. I agree that tesseract is not that easy
to install,
Hi Eric,
On Thu, Oct 17, 2013 at 09:43:04AM -0400, Eric Lease Morgan wrote:
Robert, can you outline the process you used to get Tesseract to do
OCR agains PDF documents? I installed Tesseract a few months ago,
but I couldn't figure out how to get to work against PDF, only some
image files.
On 10/17/2013 9:43 AM, Eric Lease Morgan wrote:
On Oct 16, 2013, at 10:56 AM, Robert Haschartrh...@virginia.edu wrote:
The abstract extraction routine I have been working on does use
tesseract internally for doing OCR when it encounters a document that
doesn't have usable full-text. I agree