I tried using Lohit Bengali and here is the command

/usr/share/tesseract-ocr/./tesstrain.sh --fonts_dir /usr/share/fonts --lang
ben --linedata_only --noextract_font_properties --langdata_dir
/home/jennil/Desktop/pro/langdata-master --tessdata_dir
/usr/share/tesseract-ocr/4.00/tessdata --output_dir
/home/jennil/Desktop/pro/output/ben_output --fontlist “Lohit Bengali”

and the error i got is

== Starting training for language 'ben'
[Mon Jul 23 01:18:01 EDT 2018] /usr/bin/text2image
--fonts_dir=/usr/share/fonts --font=“Lohit
--outputbase=/tmp/font_tmp.zAepRNq6Yo/sample_text.txt
--text=/tmp/font_tmp.zAepRNq6Yo/sample_text.txt
--fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo
Could not find font named “Lohit.
Pango suggested font FreeMono.
Please correct --font arg.

=== Phase I: Generating training images ===
Rendering using “Lohit
Rendering using Bengali”
[Mon Jul 23 01:18:16 EDT 2018] /usr/bin/text2image
--fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo --fonts_dir=/usr/share/fonts
--strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0
--outputbase=/tmp/tmp.abQfzSYB19/ben/ben.Bengali”.exp0 --max_pages=3
--font=Bengali”
--text=/home/jennil/Desktop/pro/langdata-master/ben/ben.training_text
[Mon Jul 23 01:18:16 EDT 2018] /usr/bin/text2image
--fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo --fonts_dir=/usr/share/fonts
--strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0
--outputbase=/tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0 --max_pages=3
--font=“Lohit
--text=/home/jennil/Desktop/pro/langdata-master/ben/ben.training_text
Could not find font named Bengali”.
Pango suggested font FreeMono.
Please correct --font arg.
Could not find font named “Lohit.
Pango suggested font FreeMono.
Please correct --font arg.
ERROR: /tmp/tmp.abQfzSYB19/ben/ben.Bengali”.exp0.box does not exist or is
not readable
ERROR: /tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0.box does not exist or is not
readable
ERROR: /tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0.box does not exist or is not
readable

please help me out *shreeshrii*
I read the link, but still i got this confusion about the fonts...the lohit
bengali font is already in the system, then why this thing is happening


some of the fonts that showed up when i wrote *text2image --fonts_dir
/usr/share/fonts --list_available_fonts*are

01: Liberation Serif Italic
102: Likhan Medium
103: Lohit Assamese
*104: Lohit Bengali*
105: Lohit Devanagari
106: Lohit Gujarati
107: Lohit Gurmukhi
108: Lohit Kannada
109: Lohit Malayalam
110: Lohit Odia
111: Lohit Tamil
112: Lohit Tamil Classical
113: Lohit Telugu
114: Loma
115: Loma Bold
116: Loma Bold Oblique
117: Loma Oblique
118: Manjari
119: Manjari Bold
120: Manjari Thin
121: Meera
122: Mitra Mono
...

Lohit Bengali is in it, so please tell me why is the error, do i need to do
something others too?


On Sun, Jul 22, 2018 at 11:00 AM, Shree Devi Kumar <[email protected]>
wrote:

> See https://github.com/tesseract-ocr/tesseract/wiki/Fonts
>
> On Sun 22 Jul, 2018, 8:20 PM Jennil Thiyam, <[email protected]>
> wrote:
>
>> you guys help me...now there is no error, but i don't know about the
>> fonts, i try to train the bengali in "lohit-bengali" font thinking its
>> already in the FONTS folder, but i got
>>
>> === Starting training for language 'ben'
>> [Sun Jul 22 10:48:33 EDT 2018] /usr/bin/text2image
>> --fonts_dir=/usr/share/fonts/truetype --font=“lohit-bengali”
>> --outputbase=/tmp/font_tmp.z6y7AIvqyI/sample_text.txt
>> --text=/tmp/font_tmp.z6y7AIvqyI/sample_text.txt
>> --fontconfig_tmpdir=/tmp/font_tmp.z6y7AIvqyI
>> Could not find font named “lohit-bengali”.
>> Pango suggested font FreeMono.
>> Please correct --font arg.
>>
>> === Phase I: Generating training images ===
>> Rendering using “lohit-bengali”
>> [Sun Jul 22 10:48:34 EDT 2018] /usr/bin/text2image
>> --fontconfig_tmpdir=/tmp/font_tmp.z6y7AIvqyI
>> --fonts_dir=/usr/share/fonts/truetype --strip_unrenderable_words
>> --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.
>> pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0 --max_pages=3
>> --font=“lohit-bengali” --text=/home/jennil/Desktop/
>> pro/langdata-master/ben/ben.training_text
>> Could not find font named “lohit-bengali”.
>> Pango suggested font FreeMono.
>> Please correct --font arg.
>> ERROR: /tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0.box does not
>> exist or is not readable
>> ERROR: /tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0.box does not
>> exist or is not readable
>>
>> SO , please tell is all the fonts which are in this FONTS folder are
>> already installed to tesseract or not?
>>
>>
>> On Sun, Jul 22, 2018 at 7:15 AM, Jennil Thiyam <[email protected]>
>> wrote:
>>
>>> Oh sorry for the mistake...I put two dashes, still it says unrecognised..
>>>
>>> On Sun 22 Jul, 2018, 4:27 PM Shree Devi Kumar, <[email protected]>
>>> wrote:
>>>
>>>> needs two dashes,
>>>>
>>>> On Sun, Jul 22, 2018 at 12:29 PM <[email protected]> wrote:
>>>>
>>>>> hello again, i modified the error in the way you said and there is no
>>>>> error. but now the same error of unrecognised is occured in output_dir.
>>>>> the error is
>>>>> ERROR: Unrecognized argument -–output_dir
>>>>>
>>>>> my command is
>>>>>
>>>>> /usr/share/tesseract-ocr/./tesstrain.sh \
>>>>>
>>>>> --fonts_dir /usr/share/fonts \
>>>>>
>>>>> --lang ben \
>>>>>
>>>>> --linedata_only \
>>>>>
>>>>> --noextract_font_properties \
>>>>>
>>>>> --langdata_dir /home/jennil/Desktop/pro/langdata-master/ben \
>>>>>
>>>>> --tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata \
>>>>>
>>>>> -–output_dir /home/jennil/Desktop/pro/output/ben_output \
>>>>>
>>>>> --fontlist “Lohit Bengali”
>>>>>
>>>>>
>>>>> please do help
>>>>>
>>>>> On Saturday, July 21, 2018 at 1:42:41 PM UTC-4, shree wrote:
>>>>>>
>>>>>> --linedata_only\
>>>>>>
>>>>>> You need space before the continuation mark \
>>>>>>
>>>>>> On Sat 21 Jul, 2018, 10:00 PM , <[email protected]> wrote:
>>>>>>
>>>>>>> can u please point out the place where to put the space
>>>>>>>
>>>>>>> thank you
>>>>>>>
>>>>>>> On Saturday, July 21, 2018 at 12:12:22 PM UTC-4, [email protected]
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> My command is
>>>>>>>>
>>>>>>>>
>>>>>>>> usr/share/tesseract-ocr/./tesstrain.sh \
>>>>>>>>
>>>>>>>> --fonts_dir /usr/share/fonts \
>>>>>>>>
>>>>>>>> --lang ben \
>>>>>>>>
>>>>>>>> --linedata_only\
>>>>>>>>
>>>>>>>> --noextract_font_properties \
>>>>>>>>
>>>>>>>> --langdata_dir /home/jennil/Desktop/pro/langdata-master/ben\
>>>>>>>>
>>>>>>>> --tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata –output_dir
>>>>>>>> /home/jennil/Desktop/pro/output/ben_output\
>>>>>>>>
>>>>>>>> --fontlist “Lohit Bengali”
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> and here is the error
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *ERROR: Unrecognized argument
>>>>>>>> --linedata_only--noextract_font_properties*
>>>>>>>>
>>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/37073e8b-
>>>>>>> f628-438c-b1b9-648e90c405b8%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/37073e8b-f628-438c-b1b9-648e90c405b8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/tesseract-ocr/c841fc9d-e1e3-4905-a065-651320f40fa5%
>>>>> 40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/c841fc9d-e1e3-4905-a065-651320f40fa5%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/tesseract-ocr/CAG2NduWXu383FWz10VrpW__WW-
>>>> eJpp5A%2BXNgRPLuDOFzxsEt6A%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWXu383FWz10VrpW__WW-eJpp5A%2BXNgRPLuDOFzxsEt6A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/tesseract-ocr/CAJxgoof-ysEQ%2BKfYC%2Bxzd31pCeWwfEGk0J6zp1Oi0LD69u
>> Bc2g%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoof-ysEQ%2BKfYC%2Bxzd31pCeWwfEGk0J6zp1Oi0LD69uBc2g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAG2NduXGxBoxwOH1sf6WgAPEY-hwBJoJ75bEHzPbU7GKrobUNA%
> 40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXGxBoxwOH1sf6WgAPEY-hwBJoJ75bEHzPbU7GKrobUNA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJxgoof0UyOER3mb8BHrZpfJATyEOyKWqhxN1zG-fOneDj%2Buig%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to