On Thursday, February 18, 2016 at 8:39:55 AM UTC-5, viraf wrote:
>
> So, I decided to manually remove the underline from the image and OCR it. 
>  The new image is attached.
>
> *$ tesseract test2.png stdout -l eng *
> REQUEST FOR INDEPENDENT NIEDICAL REVIEW
>
> *$ tesseract test2.png stdout -l eng use-userdict*
> REQUEST FOR INDEPENDENT IVIEDICAL REVIEW
>
> Having specified the user dictionary, I would have expected the output to 
> be correct.  Could someone please elaborate on why the difference ?
> I have also observed that Tesseract correctly handles underlines in other 
> places - so I am unclear on what is required here.  What are the rules for 
> handling text with underlines ?
>
> Thanks
>
> - viraf
>
>
>
> On Thursday, February 18, 2016 at 1:08:37 AM UTC-5, viraf wrote:
>>
>> I am facing challenges with the accuracy of the OCR, and was hoping that 
>> someone could guide me through the process of debugging the problem so that 
>> I can apply these techniques to other OCR related issues that I face. 
>>  Attached is a snippet of a document that is not correctly OCR'd.  The 
>> output that I get is:
>>
>> RE U'EST FO DICAL
>>
>> The following config entries were added to *configs/use-userdict*
>> load_system_dawg F
>> load_freq_dawg F
>> load_punc_dawg F
>> load_number_dawg F
>> load_unambig_dawg F
>> load_bigram_dawg F
>> load_fixed_length_dawgs F
>> user_words_suffix user-words
>> tessedit_write_images T
>> tessedit_dump_pageseg_images T
>>
>> and *eng.user-words* has the following entries
>> REQUEST
>> FOR
>> INDEPENDENT
>> MEDICAL
>> REVIEW
>>
>> The following  command line was used
>>
>> tesseract test.png stdout -l eng use-userdict
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/57190ee9-5316-4af7-8c09-fec9378eeb76%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to