I am facing challenges with the accuracy of the OCR, and was hoping that 
someone could guide me through the process of debugging the problem so that 
I can apply these techniques to other OCR related issues that I face. 
 Attached is a snippet of a document that is not correctly OCR'd.  The 
output that I get is:

RE U'EST FO DICAL

The following config entries were added to *configs/use-userdict*
load_system_dawg F
load_freq_dawg F
load_punc_dawg F
load_number_dawg F
load_unambig_dawg F
load_bigram_dawg F
load_fixed_length_dawgs F
user_words_suffix user-words
tessedit_write_images T
tessedit_dump_pageseg_images T

and *eng.user-words* has the following entries
REQUEST
FOR
INDEPENDENT
MEDICAL
REVIEW

The following  command line was used

tesseract test.png stdout -l eng use-userdict


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7c64df0a-4f9f-496d-8874-7bbd65b37b36%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to