Hello Hari,
I faced the same problem.

When there are 2 different type of fonts, Tesseract doesn't recognize it
properly. It recognizes first text and ignores next text if the font size
is bigger than first one.
I resolved it by cropping the image into 2 pieces. I'm using
ImageMagick (java api) to clean and crop the images.

And I see you made a command unnecessarily complicated (I have tesseract
path set up)

C:\EA>tesseract Capture.PNG Capture -l eng
Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica

C:\EA>tesseract Capture1.PNG Capture1 -l eng
Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica

Tesseract will return proper text if the text is at center, how I
achieved is -- crop, trim added a border

Datta

On Thu, Jun 28, 2018 at 3:33 PM Hari P <[email protected]> wrote:

> I am using tesseract v4.0 beta 1 and trying to OCR remittance file. There
> is one section which has CHECK NO, but tesseract doesn't seem to recognize
> it at all.
>
> I have tried with setting dictionary words and penalties to 1 for non
> dictionary words, yet no change.
>
> tesseract capture.png captureoutput1 --user-words "C:\Program Files
> (x86)\Tesseract-OCR\tessdata\eng.user-words" -c load_system_dawg=0 -c
> load_freq_dawg=0 -c language_model_penalty_non_dict_word=1 -c
> language_model_penalty_non_freq_dict_word=1
>
> These are the words I have in eng.user-words.
>
> CHECK NO.
> CHECK
> NO
> check
> no
>
> Any idea how to fix this?
>
> Thanks,
> Hari
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/01ef5e64-3332-4b0f-a0aa-8ab9488083f1%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/01ef5e64-3332-4b0f-a0aa-8ab9488083f1%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Best Regards,
Dattatraya Tembare
+1 914 721 6311

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHZwW__M4Ugp08n9NbcSD0vxJbV2cPJDzOv-HArGypFPQE6hsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
eneck no

150744


Reply via email to