John Miles <[email protected]> wrote:

> None of the articles appear to be text-searchable, unfortunately, so that'll

> take a few kilowatt-hours of CPU time to fix.


On that subject, what do you use for that?

Personally I do something like this:
- pdftohtml
- index the html pages with mnogosearch
- dump on server
- the pdf's are now searchable through a web interface (and from command line 
obviously)

This works fine for pdf's that have embedded text, but obviously no go for OCR.

So basically the question is, know of any good open source ocr software for the 
job?
In the absence of better options I'll probably give tesseract-ocr a spin, and 
see if it's any good for this.

regards,
Fred
_______________________________________________
time-nuts mailing list -- [email protected]
To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
and follow the instructions there.

Reply via email to