Hello everyone,

I've been searching around this group for an answer to my question, but I 
couldn't find anything satisfactory so here it goes. For the attached 
image, the OCR result is the following:

Review the Main Idea state-


ment at the beginning of this


section. List five sources that a


historian'might use to write


a history of your Iife.Then,


eValIJate them for authenticity,

*reiiability (72 confidence)*, and bias.


The command I used to run OCR is `tesseract rotated.jpeg foo -psm 1 -c 
language_model_penalty_non_dict_word 1.0`. 


Tesseract does a good job overall, but fails to determine that 
"reiiability" should be "reliability" (among few other words, but I'm 
curious about this case in particular). Can you please explain to me why it 
Tesseract fails to find the dictionary word?


Assuming I cannot fix this discrepancy on the word-recognition level, can I 
utilize the API in some way to iterate over the words and only pick 
dictionary words from available choices? 


Since the DAWG is a graph, is it impossible for Tesseract to ask for a 
dictionary word that is, say, 1 or 2 characters from the current best 
candidate? 


Thanks a lot for your help,

Jakub

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4d235f09-80e2-4a9b-af95-629dc780fa1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to