Robert is pointing you to right direction. Did you read the log you post here? " Tesseract Open Source OCR Engine v3.04.01 with Leptonica" You are mixing tesseract versions so no surprise of problems.
Zdenko ut 16. 10. 2018 o 8:26 Vinod Gattani <[email protected]> napísal(a): > Hi, > Typo: " Why the version is not 4.0.? > I installed using "git pull https://github.com/tesseract-ocr/tesseract". > And then followed the instructions on training page. > > Regards > > On Tue, Oct 16, 2018 at 11:53 AM Robert Kamiński < > [email protected]> wrote: > >> Hi, >> " Why the version is 4.0." What do you mean by that? In logs it states >> that it's 3.04v. "Tesseract Open Source OCR Engine v3.04.01 with >> Leptonica". >> The problem might be the fact that 4th version is using lstm files >> whereas you have version 3.04 using box files instead. Try to check the >> version of installed Tesseract. Also note that I'm not the expert here ^.^ >> >> >> wt., 16 paź 2018 o 08:04 Vinod Gattani <[email protected]> >> napisał(a): >> >>> Hi All, >>> >>> I have started a project to do OCR on Identity Cards. I am learning to >>> train tesseract models with custom fonts. >>> >>> Please help me on this. >>> >>> Steps till now: >>> >>> 1. git pull https://github.com/tesseract-ocr/tesseract >>> 2. Then I followed instructions on training package till command "sudo >>> make training-install". >>> 3.Downloaded eng.traineddata from >>> https://github.com/tesseract-ocr/tessdata_best in tessdata folder >>> 4. Command " src/training/tesstrain.sh --fonts_dir /usr/share/fonts >>> --fontlist "Arial Bold" --lang eng --linedata_only >>> --noextract_font_properties --langdata_dir ../langdata --tessdata_dir >>> ./tessdata --output_dir ~/tesstutorial/engtrain" >>> >>> It is giving error: >>> === Phase E: Generating lstmf files === >>> Using TESSDATA_PREFIX=./tessdata >>> [Tue Oct 16 05:41:31 UTC 2018] /usr/bin/tesseract >>> /tmp/tmp.4EGdp9wW57/eng.Arial_Bold.exp0.tif >>> /tmp/tmp.4EGdp9wW57/eng.Arial_Bold.exp0 --psm 6 lstm.train >>> Tesseract Open Source OCR Engine v3.04.01 with Leptonica >>> fseek(data_file_, static_cast<size_t>(offset_table_[tessdata_type]), >>> SEEK_SET) == 0:Error:Assert failed:in file ../ccutil/tessdatamanager.h, >>> line 173 >>> ERROR: /tmp/tmp.4EGdp9wW57/eng.Arial_Bold.exp0.lstmf does not exist or >>> is not readable >>> >>> Why the version is 4.0. >>> >>> Also, how do we download custom font for my Identity Cards. >>> >>> Regards, >>> >>> On Monday, 10 September 2018 15:05:15 UTC+5:30, [email protected] >>> wrote: >>>> >>>> Thank you Shreeshrii for reply! >>>> >>>> Manual customization of theese files might be kinda annoying. If i will >>>> need to experiment with the dawg files and I'll achieve something I'll >>>> surely let you know if there is any difference. Again thank you for your >>>> help and time :) >>>> >>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/279bc21a-199a-43be-b5d6-07bfdd2a833f%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/279bc21a-199a-43be-b5d6-07bfdd2a833f%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CALtwN-eGJG3MOTm7f-p%3DESRGgU7PtC41SVcBU8OSNMGThYjo5A%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CALtwN-eGJG3MOTm7f-p%3DESRGgU7PtC41SVcBU8OSNMGThYjo5A%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAN557awfgH5F07nyV5iL1o5pN4MfebOvUWsJBLdSbG6QsdCmew%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAN557awfgH5F07nyV5iL1o5pN4MfebOvUWsJBLdSbG6QsdCmew%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wxAd4YCEUwnU-bPf9FQ%2BtutmKdwSQXro_eo6cjLkNRHA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

