[tesseract-ocr] Recognizing known text (generating searchable PDF)

Erik Jensen Tue, 31 Mar 2015 23:46:46 -0700

I'm trying to generate a searchable and copyable PDF from a series of 
images. Using Tesseract works pretty well, but still results in a number of 
errors on each page. However, I already have a copy of the text that 
appears on each page, so all I really need is to find the location of each 
of the known glyphs on the page so I can put the overlay text in the 
correct location. Is there a way to use the known text to guide Tesseract's 
recognition to accomplish this?


Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0e20c095-79f7-41d9-a590-e3adc45197c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Recognizing known text (generating searchable PDF)

Reply via email to