Even though the double quotes look fancy here, its not the case in command
prompt.
>From all your help i am able to run this command but still i got lots of
thing that says

*Normalization failed for string  <some character>*

and at last with this

*Error writing unicharset!!*

any help is welcome, i am so new to the tesseract, and trying my way to get
in.

On Mon, Jul 23, 2018 at 9:29 AM, Lorenzo Bolzani <[email protected]>
wrote:

>
> Please read the complete error message: it's telling you exactly where the
> problem is.
>
> I think you are using "fancy double quotes" or something like that rather
> than the normal ones.
>
> Are you doing cut and paste from some word processor? This is probably
> causing all the errors...
>
>
>
> 2018-07-23 9:48 GMT+02:00 Jennil Thiyam <[email protected]>:
>
>> I tried using Lohit Bengali and here is the command
>>
>> /usr/share/tesseract-ocr/./tesstrain.sh --fonts_dir /usr/share/fonts
>> --lang ben --linedata_only --noextract_font_properties --langdata_dir
>> /home/jennil/Desktop/pro/langdata-master --tessdata_dir
>> /usr/share/tesseract-ocr/4.00/tessdata --output_dir
>> /home/jennil/Desktop/pro/output/ben_output --fontlist “Lohit Bengali”
>>
>> and the error i got is
>>
>> == Starting training for language 'ben'
>> [Mon Jul 23 01:18:01 EDT 2018] /usr/bin/text2image
>> --fonts_dir=/usr/share/fonts --font=“Lohit 
>> --outputbase=/tmp/font_tmp.zAepRNq6Yo/sample_text.txt
>> --text=/tmp/font_tmp.zAepRNq6Yo/sample_text.txt
>> --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo
>> Could not find font named “Lohit.
>> Pango suggested font FreeMono.
>> Please correct --font arg.
>>
>> === Phase I: Generating training images ===
>> Rendering using “Lohit
>> Rendering using Bengali”
>> [Mon Jul 23 01:18:16 EDT 2018] /usr/bin/text2image
>> --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo
>> --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32
>> --char_spacing=0.0 --exposure=0 
>> --outputbase=/tmp/tmp.abQfzSYB19/ben/ben.Bengali”.exp0
>> --max_pages=3 --font=Bengali” --text=/home/jennil/Desktop/pr
>> o/langdata-master/ben/ben.training_text
>> [Mon Jul 23 01:18:16 EDT 2018] /usr/bin/text2image
>> --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo
>> --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32
>> --char_spacing=0.0 --exposure=0 
>> --outputbase=/tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0
>> --max_pages=3 --font=“Lohit --text=/home/jennil/Desktop/pr
>> o/langdata-master/ben/ben.training_text
>> Could not find font named Bengali”.
>> Pango suggested font FreeMono.
>> Please correct --font arg.
>> Could not find font named “Lohit.
>> Pango suggested font FreeMono.
>> Please correct --font arg.
>> ERROR: /tmp/tmp.abQfzSYB19/ben/ben.Bengali”.exp0.box does not exist or
>> is not readable
>> ERROR: /tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0.box does not exist or is
>> not readable
>> ERROR: /tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0.box does not exist or is
>> not readable
>>
>> please help me out *shreeshrii*
>> I read the link, but still i got this confusion about the fonts...the
>> lohit bengali font is already in the system, then why this thing is
>> happening
>>
>>
>> some of the fonts that showed up when i wrote *text2image --fonts_dir
>> /usr/share/fonts --list_available_fonts*are
>>
>> 01: Liberation Serif Italic
>> 102: Likhan Medium
>> 103: Lohit Assamese
>> *104: Lohit Bengali*
>> 105: Lohit Devanagari
>> 106: Lohit Gujarati
>> 107: Lohit Gurmukhi
>> 108: Lohit Kannada
>> 109: Lohit Malayalam
>> 110: Lohit Odia
>> 111: Lohit Tamil
>> 112: Lohit Tamil Classical
>> 113: Lohit Telugu
>> 114: Loma
>> 115: Loma Bold
>> 116: Loma Bold Oblique
>> 117: Loma Oblique
>> 118: Manjari
>> 119: Manjari Bold
>> 120: Manjari Thin
>> 121: Meera
>> 122: Mitra Mono
>> ...
>>
>> Lohit Bengali is in it, so please tell me why is the error, do i need to
>> do something others too?
>>
>>
>> On Sun, Jul 22, 2018 at 11:00 AM, Shree Devi Kumar <[email protected]>
>> wrote:
>>
>>> See https://github.com/tesseract-ocr/tesseract/wiki/Fonts
>>>
>>> On Sun 22 Jul, 2018, 8:20 PM Jennil Thiyam, <[email protected]>
>>> wrote:
>>>
>>>> you guys help me...now there is no error, but i don't know about the
>>>> fonts, i try to train the bengali in "lohit-bengali" font thinking its
>>>> already in the FONTS folder, but i got
>>>>
>>>> === Starting training for language 'ben'
>>>> [Sun Jul 22 10:48:33 EDT 2018] /usr/bin/text2image
>>>> --fonts_dir=/usr/share/fonts/truetype --font=“lohit-bengali”
>>>> --outputbase=/tmp/font_tmp.z6y7AIvqyI/sample_text.txt
>>>> --text=/tmp/font_tmp.z6y7AIvqyI/sample_text.txt
>>>> --fontconfig_tmpdir=/tmp/font_tmp.z6y7AIvqyI
>>>> Could not find font named “lohit-bengali”.
>>>> Pango suggested font FreeMono.
>>>> Please correct --font arg.
>>>>
>>>> === Phase I: Generating training images ===
>>>> Rendering using “lohit-bengali”
>>>> [Sun Jul 22 10:48:34 EDT 2018] /usr/bin/text2image
>>>> --fontconfig_tmpdir=/tmp/font_tmp.z6y7AIvqyI
>>>> --fonts_dir=/usr/share/fonts/truetype --strip_unrenderable_words
>>>> --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.pBWa4wRH
>>>> mt/ben/ben.“lohit-bengali”.exp0 --max_pages=3 --font=“lohit-bengali”
>>>> --text=/home/jennil/Desktop/pro/langdata-master/ben/ben.training_text
>>>> Could not find font named “lohit-bengali”.
>>>> Pango suggested font FreeMono.
>>>> Please correct --font arg.
>>>> ERROR: /tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0.box does not
>>>> exist or is not readable
>>>> ERROR: /tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0.box does not
>>>> exist or is not readable
>>>>
>>>> SO , please tell is all the fonts which are in this FONTS folder are
>>>> already installed to tesseract or not?
>>>>
>>>>
>>>> On Sun, Jul 22, 2018 at 7:15 AM, Jennil Thiyam <[email protected]>
>>>> wrote:
>>>>
>>>>> Oh sorry for the mistake...I put two dashes, still it says
>>>>> unrecognised..
>>>>>
>>>>> On Sun 22 Jul, 2018, 4:27 PM Shree Devi Kumar, <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> needs two dashes,
>>>>>>
>>>>>> On Sun, Jul 22, 2018 at 12:29 PM <[email protected]> wrote:
>>>>>>
>>>>>>> hello again, i modified the error in the way you said and there is
>>>>>>> no error. but now the same error of unrecognised is occured in 
>>>>>>> output_dir.
>>>>>>> the error is
>>>>>>> ERROR: Unrecognized argument -–output_dir
>>>>>>>
>>>>>>> my command is
>>>>>>>
>>>>>>> /usr/share/tesseract-ocr/./tesstrain.sh \
>>>>>>>
>>>>>>> --fonts_dir /usr/share/fonts \
>>>>>>>
>>>>>>> --lang ben \
>>>>>>>
>>>>>>> --linedata_only \
>>>>>>>
>>>>>>> --noextract_font_properties \
>>>>>>>
>>>>>>> --langdata_dir /home/jennil/Desktop/pro/langdata-master/ben \
>>>>>>>
>>>>>>> --tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata \
>>>>>>>
>>>>>>> -–output_dir /home/jennil/Desktop/pro/output/ben_output \
>>>>>>>
>>>>>>> --fontlist “Lohit Bengali”
>>>>>>>
>>>>>>>
>>>>>>> please do help
>>>>>>>
>>>>>>> On Saturday, July 21, 2018 at 1:42:41 PM UTC-4, shree wrote:
>>>>>>>>
>>>>>>>> --linedata_only\
>>>>>>>>
>>>>>>>> You need space before the continuation mark \
>>>>>>>>
>>>>>>>> On Sat 21 Jul, 2018, 10:00 PM , <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> can u please point out the place where to put the space
>>>>>>>>>
>>>>>>>>> thank you
>>>>>>>>>
>>>>>>>>> On Saturday, July 21, 2018 at 12:12:22 PM UTC-4,
>>>>>>>>> [email protected] wrote:
>>>>>>>>>>
>>>>>>>>>> My command is
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> usr/share/tesseract-ocr/./tesstrain.sh \
>>>>>>>>>>
>>>>>>>>>> --fonts_dir /usr/share/fonts \
>>>>>>>>>>
>>>>>>>>>> --lang ben \
>>>>>>>>>>
>>>>>>>>>> --linedata_only\
>>>>>>>>>>
>>>>>>>>>> --noextract_font_properties \
>>>>>>>>>>
>>>>>>>>>> --langdata_dir /home/jennil/Desktop/pro/langdata-master/ben\
>>>>>>>>>>
>>>>>>>>>> --tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata
>>>>>>>>>> –output_dir /home/jennil/Desktop/pro/output/ben_output\
>>>>>>>>>>
>>>>>>>>>> --fontlist “Lohit Bengali”
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> and here is the error
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *ERROR: Unrecognized argument
>>>>>>>>>> --linedata_only--noextract_font_properties*
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/37073e8b-f62
>>>>>>>>> 8-438c-b1b9-648e90c405b8%40googlegroups.com
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/37073e8b-f628-438c-b1b9-648e90c405b8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/c841fc9d-e1e
>>>>>>> 3-4905-a065-651320f40fa5%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/c841fc9d-e1e3-4905-a065-651320f40fa5%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> ____________________________________________________________
>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWXu38
>>>>>> 3FWz10VrpW__WW-eJpp5A%2BXNgRPLuDOFzxsEt6A%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWXu383FWz10VrpW__WW-eJpp5A%2BXNgRPLuDOFzxsEt6A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/tesseract-ocr/CAJxgoof-ysEQ%2BKfYC%2Bxzd31pCeWwfEGk0J6zp
>>>> 1Oi0LD69uBc2g%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoof-ysEQ%2BKfYC%2Bxzd31pCeWwfEGk0J6zp1Oi0LD69uBc2g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/CAG2NduXGxBoxwOH1sf6WgAPEY-hwBJoJ75bEHzPbU
>>> 7GKrobUNA%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGxBoxwOH1sf6WgAPEY-hwBJoJ75bEHzPbU7GKrobUNA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/CAJxgoof0UyOER3mb8BHrZpfJATyEOyKWqhxN1zG-
>> fOneDj%2Buig%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoof0UyOER3mb8BHrZpfJATyEOyKWqhxN1zG-fOneDj%2Buig%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAMgOLLzzqHtKGXmQMh1Eg4ptqWOqM
> vG9psBh4MRf-e9bYLnTuw%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzzqHtKGXmQMh1Eg4ptqWOqMvG9psBh4MRf-e9bYLnTuw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJxgooePCpQDoyWNd35gYUN1ONe354-qi8Kbmgdd8wimp0mkBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to