"C" is missing in the text because tesseract doesn't have enough margin to read the text. Require proper margin.
On Friday, June 29, 2018 at 12:39:06 PM UTC-4, Dattatraya Tembare wrote: > > Hello Hari, > I faced the same problem. > > When there are 2 different type of fonts, Tesseract doesn't recognize it > properly. It recognizes first text and ignores next text if the font size > is bigger than first one. > I resolved it by cropping the image into 2 pieces. I'm using > ImageMagick (java api) to clean and crop the images. > > And I see you made a command unnecessarily complicated (I have tesseract > path set up) > > C:\EA>tesseract Capture.PNG Capture -l eng > Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica > > C:\EA>tesseract Capture1.PNG Capture1 -l eng > Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica > > Tesseract will return proper text if the text is at center, how I > achieved is -- crop, trim added a border > > Datta > > On Thu, Jun 28, 2018 at 3:33 PM Hari P <[email protected]> wrote: > >> I am using tesseract v4.0 beta 1 and trying to OCR remittance file. There >> is one section which has CHECK NO, but tesseract doesn't seem to recognize >> it at all. >> >> I have tried with setting dictionary words and penalties to 1 for non >> dictionary words, yet no change. >> >> tesseract capture.png captureoutput1 --user-words "C:\Program Files >> (x86)\Tesseract-OCR\tessdata\eng.user-words" -c load_system_dawg=0 -c >> load_freq_dawg=0 -c language_model_penalty_non_dict_word=1 -c >> language_model_penalty_non_freq_dict_word=1 >> >> These are the words I have in eng.user-words. >> >> CHECK NO. >> CHECK >> NO >> check >> no >> >> Any idea how to fix this? >> >> Thanks, >> Hari >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/01ef5e64-3332-4b0f-a0aa-8ab9488083f1%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/01ef5e64-3332-4b0f-a0aa-8ab9488083f1%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > Best Regards, > Dattatraya Tembare > +1 914 721 6311 > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a883cbb9-a96c-4744-b29f-7335c99b813c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

