Hello Hari, I faced the same problem. When there are 2 different type of fonts, Tesseract doesn't recognize it properly. It recognizes first text and ignores next text if the font size is bigger than first one. I resolved it by cropping the image into 2 pieces. I'm using ImageMagick (java api) to clean and crop the images.
And I see you made a command unnecessarily complicated (I have tesseract path set up) C:\EA>tesseract Capture.PNG Capture -l eng Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica C:\EA>tesseract Capture1.PNG Capture1 -l eng Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica Tesseract will return proper text if the text is at center, how I achieved is -- crop, trim added a border Datta On Thu, Jun 28, 2018 at 3:33 PM Hari P <[email protected]> wrote: > I am using tesseract v4.0 beta 1 and trying to OCR remittance file. There > is one section which has CHECK NO, but tesseract doesn't seem to recognize > it at all. > > I have tried with setting dictionary words and penalties to 1 for non > dictionary words, yet no change. > > tesseract capture.png captureoutput1 --user-words "C:\Program Files > (x86)\Tesseract-OCR\tessdata\eng.user-words" -c load_system_dawg=0 -c > load_freq_dawg=0 -c language_model_penalty_non_dict_word=1 -c > language_model_penalty_non_freq_dict_word=1 > > These are the words I have in eng.user-words. > > CHECK NO. > CHECK > NO > check > no > > Any idea how to fix this? > > Thanks, > Hari > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/01ef5e64-3332-4b0f-a0aa-8ab9488083f1%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/01ef5e64-3332-4b0f-a0aa-8ab9488083f1%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Best Regards, Dattatraya Tembare +1 914 721 6311 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAHZwW__M4Ugp08n9NbcSD0vxJbV2cPJDzOv-HArGypFPQE6hsQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
eneck no
150744

