Hi Padraic,
I have uploaded a shell script which happens to implement Robert
Haschart's recipe:
https://github.com/pietsch/Data-Munging/blob/master/ocr4pdf.sh
Enjoy!
Christian
On Fri, Oct 18, 2013 at 10:22:17AM +0100, Padraic Stack wrote:
> I would love to see that bash script if you could uplo
On 10/17/2013 9:43 AM, Eric Lease Morgan wrote:
On Oct 16, 2013, at 10:56 AM, Robert Haschart wrote:
The abstract extraction routine I have been working on does use
tesseract internally for doing OCR when it encounters a document that
doesn't have usable full-text. I agree that tesseract is n
Hi Eric,
On Thu, Oct 17, 2013 at 09:43:04AM -0400, Eric Lease Morgan wrote:
> Robert, can you outline the process you used to get Tesseract to do
> OCR agains PDF documents? I installed Tesseract a few months ago,
> but I couldn't figure out how to get to work against PDF, only some
> image files.
On Oct 16, 2013, at 10:56 AM, Robert Haschart wrote:
> The abstract extraction routine I have been working on does use
> tesseract internally for doing OCR when it encounters a document that
> doesn't have usable full-text. I agree that tesseract is not that easy
> to install, especially if (