Even though the double quotes look fancy here, its not the case in command prompt. >From all your help i am able to run this command but still i got lots of thing that says
*Normalization failed for string <some character>* and at last with this *Error writing unicharset!!* any help is welcome, i am so new to the tesseract, and trying my way to get in. On Mon, Jul 23, 2018 at 9:29 AM, Lorenzo Bolzani <[email protected]> wrote: > > Please read the complete error message: it's telling you exactly where the > problem is. > > I think you are using "fancy double quotes" or something like that rather > than the normal ones. > > Are you doing cut and paste from some word processor? This is probably > causing all the errors... > > > > 2018-07-23 9:48 GMT+02:00 Jennil Thiyam <[email protected]>: > >> I tried using Lohit Bengali and here is the command >> >> /usr/share/tesseract-ocr/./tesstrain.sh --fonts_dir /usr/share/fonts >> --lang ben --linedata_only --noextract_font_properties --langdata_dir >> /home/jennil/Desktop/pro/langdata-master --tessdata_dir >> /usr/share/tesseract-ocr/4.00/tessdata --output_dir >> /home/jennil/Desktop/pro/output/ben_output --fontlist “Lohit Bengali” >> >> and the error i got is >> >> == Starting training for language 'ben' >> [Mon Jul 23 01:18:01 EDT 2018] /usr/bin/text2image >> --fonts_dir=/usr/share/fonts --font=“Lohit >> --outputbase=/tmp/font_tmp.zAepRNq6Yo/sample_text.txt >> --text=/tmp/font_tmp.zAepRNq6Yo/sample_text.txt >> --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo >> Could not find font named “Lohit. >> Pango suggested font FreeMono. >> Please correct --font arg. >> >> === Phase I: Generating training images === >> Rendering using “Lohit >> Rendering using Bengali” >> [Mon Jul 23 01:18:16 EDT 2018] /usr/bin/text2image >> --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo >> --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 >> --char_spacing=0.0 --exposure=0 >> --outputbase=/tmp/tmp.abQfzSYB19/ben/ben.Bengali”.exp0 >> --max_pages=3 --font=Bengali” --text=/home/jennil/Desktop/pr >> o/langdata-master/ben/ben.training_text >> [Mon Jul 23 01:18:16 EDT 2018] /usr/bin/text2image >> --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo >> --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 >> --char_spacing=0.0 --exposure=0 >> --outputbase=/tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0 >> --max_pages=3 --font=“Lohit --text=/home/jennil/Desktop/pr >> o/langdata-master/ben/ben.training_text >> Could not find font named Bengali”. >> Pango suggested font FreeMono. >> Please correct --font arg. >> Could not find font named “Lohit. >> Pango suggested font FreeMono. >> Please correct --font arg. >> ERROR: /tmp/tmp.abQfzSYB19/ben/ben.Bengali”.exp0.box does not exist or >> is not readable >> ERROR: /tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0.box does not exist or is >> not readable >> ERROR: /tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0.box does not exist or is >> not readable >> >> please help me out *shreeshrii* >> I read the link, but still i got this confusion about the fonts...the >> lohit bengali font is already in the system, then why this thing is >> happening >> >> >> some of the fonts that showed up when i wrote *text2image --fonts_dir >> /usr/share/fonts --list_available_fonts*are >> >> 01: Liberation Serif Italic >> 102: Likhan Medium >> 103: Lohit Assamese >> *104: Lohit Bengali* >> 105: Lohit Devanagari >> 106: Lohit Gujarati >> 107: Lohit Gurmukhi >> 108: Lohit Kannada >> 109: Lohit Malayalam >> 110: Lohit Odia >> 111: Lohit Tamil >> 112: Lohit Tamil Classical >> 113: Lohit Telugu >> 114: Loma >> 115: Loma Bold >> 116: Loma Bold Oblique >> 117: Loma Oblique >> 118: Manjari >> 119: Manjari Bold >> 120: Manjari Thin >> 121: Meera >> 122: Mitra Mono >> ... >> >> Lohit Bengali is in it, so please tell me why is the error, do i need to >> do something others too? >> >> >> On Sun, Jul 22, 2018 at 11:00 AM, Shree Devi Kumar <[email protected]> >> wrote: >> >>> See https://github.com/tesseract-ocr/tesseract/wiki/Fonts >>> >>> On Sun 22 Jul, 2018, 8:20 PM Jennil Thiyam, <[email protected]> >>> wrote: >>> >>>> you guys help me...now there is no error, but i don't know about the >>>> fonts, i try to train the bengali in "lohit-bengali" font thinking its >>>> already in the FONTS folder, but i got >>>> >>>> === Starting training for language 'ben' >>>> [Sun Jul 22 10:48:33 EDT 2018] /usr/bin/text2image >>>> --fonts_dir=/usr/share/fonts/truetype --font=“lohit-bengali” >>>> --outputbase=/tmp/font_tmp.z6y7AIvqyI/sample_text.txt >>>> --text=/tmp/font_tmp.z6y7AIvqyI/sample_text.txt >>>> --fontconfig_tmpdir=/tmp/font_tmp.z6y7AIvqyI >>>> Could not find font named “lohit-bengali”. >>>> Pango suggested font FreeMono. >>>> Please correct --font arg. >>>> >>>> === Phase I: Generating training images === >>>> Rendering using “lohit-bengali” >>>> [Sun Jul 22 10:48:34 EDT 2018] /usr/bin/text2image >>>> --fontconfig_tmpdir=/tmp/font_tmp.z6y7AIvqyI >>>> --fonts_dir=/usr/share/fonts/truetype --strip_unrenderable_words >>>> --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.pBWa4wRH >>>> mt/ben/ben.“lohit-bengali”.exp0 --max_pages=3 --font=“lohit-bengali” >>>> --text=/home/jennil/Desktop/pro/langdata-master/ben/ben.training_text >>>> Could not find font named “lohit-bengali”. >>>> Pango suggested font FreeMono. >>>> Please correct --font arg. >>>> ERROR: /tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0.box does not >>>> exist or is not readable >>>> ERROR: /tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0.box does not >>>> exist or is not readable >>>> >>>> SO , please tell is all the fonts which are in this FONTS folder are >>>> already installed to tesseract or not? >>>> >>>> >>>> On Sun, Jul 22, 2018 at 7:15 AM, Jennil Thiyam <[email protected]> >>>> wrote: >>>> >>>>> Oh sorry for the mistake...I put two dashes, still it says >>>>> unrecognised.. >>>>> >>>>> On Sun 22 Jul, 2018, 4:27 PM Shree Devi Kumar, <[email protected]> >>>>> wrote: >>>>> >>>>>> needs two dashes, >>>>>> >>>>>> On Sun, Jul 22, 2018 at 12:29 PM <[email protected]> wrote: >>>>>> >>>>>>> hello again, i modified the error in the way you said and there is >>>>>>> no error. but now the same error of unrecognised is occured in >>>>>>> output_dir. >>>>>>> the error is >>>>>>> ERROR: Unrecognized argument -–output_dir >>>>>>> >>>>>>> my command is >>>>>>> >>>>>>> /usr/share/tesseract-ocr/./tesstrain.sh \ >>>>>>> >>>>>>> --fonts_dir /usr/share/fonts \ >>>>>>> >>>>>>> --lang ben \ >>>>>>> >>>>>>> --linedata_only \ >>>>>>> >>>>>>> --noextract_font_properties \ >>>>>>> >>>>>>> --langdata_dir /home/jennil/Desktop/pro/langdata-master/ben \ >>>>>>> >>>>>>> --tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata \ >>>>>>> >>>>>>> -–output_dir /home/jennil/Desktop/pro/output/ben_output \ >>>>>>> >>>>>>> --fontlist “Lohit Bengali” >>>>>>> >>>>>>> >>>>>>> please do help >>>>>>> >>>>>>> On Saturday, July 21, 2018 at 1:42:41 PM UTC-4, shree wrote: >>>>>>>> >>>>>>>> --linedata_only\ >>>>>>>> >>>>>>>> You need space before the continuation mark \ >>>>>>>> >>>>>>>> On Sat 21 Jul, 2018, 10:00 PM , <[email protected]> wrote: >>>>>>>> >>>>>>>>> can u please point out the place where to put the space >>>>>>>>> >>>>>>>>> thank you >>>>>>>>> >>>>>>>>> On Saturday, July 21, 2018 at 12:12:22 PM UTC-4, >>>>>>>>> [email protected] wrote: >>>>>>>>>> >>>>>>>>>> My command is >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> usr/share/tesseract-ocr/./tesstrain.sh \ >>>>>>>>>> >>>>>>>>>> --fonts_dir /usr/share/fonts \ >>>>>>>>>> >>>>>>>>>> --lang ben \ >>>>>>>>>> >>>>>>>>>> --linedata_only\ >>>>>>>>>> >>>>>>>>>> --noextract_font_properties \ >>>>>>>>>> >>>>>>>>>> --langdata_dir /home/jennil/Desktop/pro/langdata-master/ben\ >>>>>>>>>> >>>>>>>>>> --tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata >>>>>>>>>> –output_dir /home/jennil/Desktop/pro/output/ben_output\ >>>>>>>>>> >>>>>>>>>> --fontlist “Lohit Bengali” >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> and here is the error >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *ERROR: Unrecognized argument >>>>>>>>>> --linedata_only--noextract_font_properties* >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/37073e8b-f62 >>>>>>>>> 8-438c-b1b9-648e90c405b8%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/37073e8b-f628-438c-b1b9-648e90c405b8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/c841fc9d-e1e >>>>>>> 3-4905-a065-651320f40fa5%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/c841fc9d-e1e3-4905-a065-651320f40fa5%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWXu38 >>>>>> 3FWz10VrpW__WW-eJpp5A%2BXNgRPLuDOFzxsEt6A%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWXu383FWz10VrpW__WW-eJpp5A%2BXNgRPLuDOFzxsEt6A%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ms >>>> gid/tesseract-ocr/CAJxgoof-ysEQ%2BKfYC%2Bxzd31pCeWwfEGk0J6zp >>>> 1Oi0LD69uBc2g%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoof-ysEQ%2BKfYC%2Bxzd31pCeWwfEGk0J6zp1Oi0LD69uBc2g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/CAG2NduXGxBoxwOH1sf6WgAPEY-hwBJoJ75bEHzPbU >>> 7GKrobUNA%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGxBoxwOH1sf6WgAPEY-hwBJoJ75bEHzPbU7GKrobUNA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/tesseract-ocr/CAJxgoof0UyOER3mb8BHrZpfJATyEOyKWqhxN1zG- >> fOneDj%2Buig%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoof0UyOER3mb8BHrZpfJATyEOyKWqhxN1zG-fOneDj%2Buig%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/CAMgOLLzzqHtKGXmQMh1Eg4ptqWOqM > vG9psBh4MRf-e9bYLnTuw%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzzqHtKGXmQMh1Eg4ptqWOqMvG9psBh4MRf-e9bYLnTuw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooePCpQDoyWNd35gYUN1ONe354-qi8Kbmgdd8wimp0mkBg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

