I tried using Lohit Bengali and here is the command /usr/share/tesseract-ocr/./tesstrain.sh --fonts_dir /usr/share/fonts --lang ben --linedata_only --noextract_font_properties --langdata_dir /home/jennil/Desktop/pro/langdata-master --tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata --output_dir /home/jennil/Desktop/pro/output/ben_output --fontlist “Lohit Bengali”
and the error i got is == Starting training for language 'ben' [Mon Jul 23 01:18:01 EDT 2018] /usr/bin/text2image --fonts_dir=/usr/share/fonts --font=“Lohit --outputbase=/tmp/font_tmp.zAepRNq6Yo/sample_text.txt --text=/tmp/font_tmp.zAepRNq6Yo/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo Could not find font named “Lohit. Pango suggested font FreeMono. Please correct --font arg. === Phase I: Generating training images === Rendering using “Lohit Rendering using Bengali” [Mon Jul 23 01:18:16 EDT 2018] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.abQfzSYB19/ben/ben.Bengali”.exp0 --max_pages=3 --font=Bengali” --text=/home/jennil/Desktop/pro/langdata-master/ben/ben.training_text [Mon Jul 23 01:18:16 EDT 2018] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0 --max_pages=3 --font=“Lohit --text=/home/jennil/Desktop/pro/langdata-master/ben/ben.training_text Could not find font named Bengali”. Pango suggested font FreeMono. Please correct --font arg. Could not find font named “Lohit. Pango suggested font FreeMono. Please correct --font arg. ERROR: /tmp/tmp.abQfzSYB19/ben/ben.Bengali”.exp0.box does not exist or is not readable ERROR: /tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0.box does not exist or is not readable ERROR: /tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0.box does not exist or is not readable please help me out *shreeshrii* I read the link, but still i got this confusion about the fonts...the lohit bengali font is already in the system, then why this thing is happening some of the fonts that showed up when i wrote *text2image --fonts_dir /usr/share/fonts --list_available_fonts*are 01: Liberation Serif Italic 102: Likhan Medium 103: Lohit Assamese *104: Lohit Bengali* 105: Lohit Devanagari 106: Lohit Gujarati 107: Lohit Gurmukhi 108: Lohit Kannada 109: Lohit Malayalam 110: Lohit Odia 111: Lohit Tamil 112: Lohit Tamil Classical 113: Lohit Telugu 114: Loma 115: Loma Bold 116: Loma Bold Oblique 117: Loma Oblique 118: Manjari 119: Manjari Bold 120: Manjari Thin 121: Meera 122: Mitra Mono ... Lohit Bengali is in it, so please tell me why is the error, do i need to do something others too? On Sun, Jul 22, 2018 at 11:00 AM, Shree Devi Kumar <[email protected]> wrote: > See https://github.com/tesseract-ocr/tesseract/wiki/Fonts > > On Sun 22 Jul, 2018, 8:20 PM Jennil Thiyam, <[email protected]> > wrote: > >> you guys help me...now there is no error, but i don't know about the >> fonts, i try to train the bengali in "lohit-bengali" font thinking its >> already in the FONTS folder, but i got >> >> === Starting training for language 'ben' >> [Sun Jul 22 10:48:33 EDT 2018] /usr/bin/text2image >> --fonts_dir=/usr/share/fonts/truetype --font=“lohit-bengali” >> --outputbase=/tmp/font_tmp.z6y7AIvqyI/sample_text.txt >> --text=/tmp/font_tmp.z6y7AIvqyI/sample_text.txt >> --fontconfig_tmpdir=/tmp/font_tmp.z6y7AIvqyI >> Could not find font named “lohit-bengali”. >> Pango suggested font FreeMono. >> Please correct --font arg. >> >> === Phase I: Generating training images === >> Rendering using “lohit-bengali” >> [Sun Jul 22 10:48:34 EDT 2018] /usr/bin/text2image >> --fontconfig_tmpdir=/tmp/font_tmp.z6y7AIvqyI >> --fonts_dir=/usr/share/fonts/truetype --strip_unrenderable_words >> --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp. >> pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0 --max_pages=3 >> --font=“lohit-bengali” --text=/home/jennil/Desktop/ >> pro/langdata-master/ben/ben.training_text >> Could not find font named “lohit-bengali”. >> Pango suggested font FreeMono. >> Please correct --font arg. >> ERROR: /tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0.box does not >> exist or is not readable >> ERROR: /tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0.box does not >> exist or is not readable >> >> SO , please tell is all the fonts which are in this FONTS folder are >> already installed to tesseract or not? >> >> >> On Sun, Jul 22, 2018 at 7:15 AM, Jennil Thiyam <[email protected]> >> wrote: >> >>> Oh sorry for the mistake...I put two dashes, still it says unrecognised.. >>> >>> On Sun 22 Jul, 2018, 4:27 PM Shree Devi Kumar, <[email protected]> >>> wrote: >>> >>>> needs two dashes, >>>> >>>> On Sun, Jul 22, 2018 at 12:29 PM <[email protected]> wrote: >>>> >>>>> hello again, i modified the error in the way you said and there is no >>>>> error. but now the same error of unrecognised is occured in output_dir. >>>>> the error is >>>>> ERROR: Unrecognized argument -–output_dir >>>>> >>>>> my command is >>>>> >>>>> /usr/share/tesseract-ocr/./tesstrain.sh \ >>>>> >>>>> --fonts_dir /usr/share/fonts \ >>>>> >>>>> --lang ben \ >>>>> >>>>> --linedata_only \ >>>>> >>>>> --noextract_font_properties \ >>>>> >>>>> --langdata_dir /home/jennil/Desktop/pro/langdata-master/ben \ >>>>> >>>>> --tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata \ >>>>> >>>>> -–output_dir /home/jennil/Desktop/pro/output/ben_output \ >>>>> >>>>> --fontlist “Lohit Bengali” >>>>> >>>>> >>>>> please do help >>>>> >>>>> On Saturday, July 21, 2018 at 1:42:41 PM UTC-4, shree wrote: >>>>>> >>>>>> --linedata_only\ >>>>>> >>>>>> You need space before the continuation mark \ >>>>>> >>>>>> On Sat 21 Jul, 2018, 10:00 PM , <[email protected]> wrote: >>>>>> >>>>>>> can u please point out the place where to put the space >>>>>>> >>>>>>> thank you >>>>>>> >>>>>>> On Saturday, July 21, 2018 at 12:12:22 PM UTC-4, [email protected] >>>>>>> wrote: >>>>>>>> >>>>>>>> My command is >>>>>>>> >>>>>>>> >>>>>>>> usr/share/tesseract-ocr/./tesstrain.sh \ >>>>>>>> >>>>>>>> --fonts_dir /usr/share/fonts \ >>>>>>>> >>>>>>>> --lang ben \ >>>>>>>> >>>>>>>> --linedata_only\ >>>>>>>> >>>>>>>> --noextract_font_properties \ >>>>>>>> >>>>>>>> --langdata_dir /home/jennil/Desktop/pro/langdata-master/ben\ >>>>>>>> >>>>>>>> --tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata –output_dir >>>>>>>> /home/jennil/Desktop/pro/output/ben_output\ >>>>>>>> >>>>>>>> --fontlist “Lohit Bengali” >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> and here is the error >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *ERROR: Unrecognized argument >>>>>>>> --linedata_only--noextract_font_properties* >>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/37073e8b- >>>>>>> f628-438c-b1b9-648e90c405b8%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/37073e8b-f628-438c-b1b9-648e90c405b8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/tesseract-ocr/c841fc9d-e1e3-4905-a065-651320f40fa5% >>>>> 40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/c841fc9d-e1e3-4905-a065-651320f40fa5%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> -- >>>> >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/tesseract-ocr/CAG2NduWXu383FWz10VrpW__WW- >>>> eJpp5A%2BXNgRPLuDOFzxsEt6A%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWXu383FWz10VrpW__WW-eJpp5A%2BXNgRPLuDOFzxsEt6A%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ >> msgid/tesseract-ocr/CAJxgoof-ysEQ%2BKfYC%2Bxzd31pCeWwfEGk0J6zp1Oi0LD69u >> Bc2g%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoof-ysEQ%2BKfYC%2Bxzd31pCeWwfEGk0J6zp1Oi0LD69uBc2g%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/CAG2NduXGxBoxwOH1sf6WgAPEY-hwBJoJ75bEHzPbU7GKrobUNA% > 40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGxBoxwOH1sf6WgAPEY-hwBJoJ75bEHzPbU7GKrobUNA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoof0UyOER3mb8BHrZpfJATyEOyKWqhxN1zG-fOneDj%2Buig%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

