On Thursday, February 18, 2016 at 8:39:55 AM UTC-5, viraf wrote: > > So, I decided to manually remove the underline from the image and OCR it. > The new image is attached. > > *$ tesseract test2.png stdout -l eng * > REQUEST FOR INDEPENDENT NIEDICAL REVIEW > > *$ tesseract test2.png stdout -l eng use-userdict* > REQUEST FOR INDEPENDENT IVIEDICAL REVIEW > > Having specified the user dictionary, I would have expected the output to > be correct. Could someone please elaborate on why the difference ? > I have also observed that Tesseract correctly handles underlines in other > places - so I am unclear on what is required here. What are the rules for > handling text with underlines ? > > Thanks > > - viraf > > > > On Thursday, February 18, 2016 at 1:08:37 AM UTC-5, viraf wrote: >> >> I am facing challenges with the accuracy of the OCR, and was hoping that >> someone could guide me through the process of debugging the problem so that >> I can apply these techniques to other OCR related issues that I face. >> Attached is a snippet of a document that is not correctly OCR'd. The >> output that I get is: >> >> RE U'EST FO DICAL >> >> The following config entries were added to *configs/use-userdict* >> load_system_dawg F >> load_freq_dawg F >> load_punc_dawg F >> load_number_dawg F >> load_unambig_dawg F >> load_bigram_dawg F >> load_fixed_length_dawgs F >> user_words_suffix user-words >> tessedit_write_images T >> tessedit_dump_pageseg_images T >> >> and *eng.user-words* has the following entries >> REQUEST >> FOR >> INDEPENDENT >> MEDICAL >> REVIEW >> >> The following command line was used >> >> tesseract test.png stdout -l eng use-userdict >> >> >>
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/57190ee9-5316-4af7-8c09-fec9378eeb76%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

