Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread ShreeDevi Kumar
change double quote to single quote

" to '

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Mar 13, 2018 at 10:05 PM, 이경준  wrote:

>
> 
>
>
> Thank U . I delete last line you taught me .
>
> I can see  lots of korean fonts before
>
> In there everything can be used for training???
>
> but I have an error
>
> argument fonts "(specifi_font') are not assigned (like that)
>
>>
>> and I saw the this issue _ github https://github.com/tess
>> eract-ocr/tesseract/issues/688
>>
>> 2018년 3월 14일 수요일 오전 1시 7분 11초 UTC+9, shree 님의 말:
>>>
>>> Did you use the fonts_dir where they are installed???
>>>
>>> On Tue 13 Mar, 2018, 9:32 PM 이경준,  wrote:
>>>
 Thank U . I have a fontslist file

 but vim fontlist.txt

 There are no fonts ??

 It means that I cannot use korena fonts??

 2018년 3월 13일 화요일 오후 9시 9분 45초 UTC+9, shree 님의 말:
>
> Give the following command - after changing directories to match your
> setup
>
> text2image --find_fonts \
> --fonts_dir /usr/share/fonts \
> --text ../langdata/kor/kor.training_text \
> --min_coverage .9  \
> --render_per_font false \
> --outputbase ../langdata/kor/kor \
> |& grep raw | sed -e 's/ :.*/" \\/g'  | sed -e 's/^/  "/'
> >../langdata/kor/fontslist.txt
>
> and then check the selected fonts in
> ../langdata/kor/fontslist.txt
>
>
> --
 You received this message because you are subscribed to the Google
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit https://groups.google.com/d/ms
 gid/tesseract-ocr/d13e3159-5282-461b-bafa-57413cb988f8%40goo
 glegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/44260794-f60d-4522-9fd1-7d25f9dde7a8%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXJCcoBK95Pki8mj4J2hV8%2Bh4n%3DU8N5-TByx94K_YEHEA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread 이경준





Thank U . I delete last line you taught me . 

I can see  lots of korean fonts before

In there everything can be used for training??? 

but I have an error 

argument fonts "(specifi_font') are not assigned (like that) 

>
> and I saw the this issue _ github 
> https://github.com/tesseract-ocr/tesseract/issues/688
>
> 2018년 3월 14일 수요일 오전 1시 7분 11초 UTC+9, shree 님의 말:
>>
>> Did you use the fonts_dir where they are installed???
>>
>> On Tue 13 Mar, 2018, 9:32 PM 이경준,  wrote:
>>
>>> Thank U . I have a fontslist file 
>>>
>>> but vim fontlist.txt 
>>>
>>> There are no fonts ?? 
>>>
>>> It means that I cannot use korena fonts?? 
>>>
>>> 2018년 3월 13일 화요일 오후 9시 9분 45초 UTC+9, shree 님의 말:

 Give the following command - after changing directories to match your 
 setup

 text2image --find_fonts \
 --fonts_dir /usr/share/fonts \
 --text ../langdata/kor/kor.training_text \
 --min_coverage .9  \
 --render_per_font false \
 --outputbase ../langdata/kor/kor \
 |& grep raw | sed -e 's/ :.*/" \\/g'  | sed -e 's/^/  "/' 
 >../langdata/kor/fontslist.txt

 and then check the selected fonts in 
 ../langdata/kor/fontslist.txt 


 -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/d13e3159-5282-461b-bafa-57413cb988f8%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/44260794-f60d-4522-9fd1-7d25f9dde7a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread 이경준

2018년 3월 14일 수요일 오전 1시 27분 46초 UTC+9, 이경준 님의 말:
>
> yes  ㅜㅜ 
>
> and I saw the this issue _ github 
> https://github.com/tesseract-ocr/tesseract/issues/688
>
> 2018년 3월 14일 수요일 오전 1시 7분 11초 UTC+9, shree 님의 말:
>>
>> Did you use the fonts_dir where they are installed???
>>
>> On Tue 13 Mar, 2018, 9:32 PM 이경준,  wrote:
>>
>>> Thank U . I have a fontslist file 
>>>
>>> but vim fontlist.txt 
>>>
>>> There are no fonts ?? 
>>>
>>> It means that I cannot use korena fonts?? 
>>>
>>> 2018년 3월 13일 화요일 오후 9시 9분 45초 UTC+9, shree 님의 말:

 Give the following command - after changing directories to match your 
 setup

 text2image --find_fonts \
 --fonts_dir /usr/share/fonts \
 --text ../langdata/kor/kor.training_text \
 --min_coverage .9  \
 --render_per_font false \
 --outputbase ../langdata/kor/kor \
 |& grep raw | sed -e 's/ :.*/" \\/g'  | sed -e 's/^/  "/' 
 >../langdata/kor/fontslist.txt

 and then check the selected fonts in 
 ../langdata/kor/fontslist.txt 


 -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/d13e3159-5282-461b-bafa-57413cb988f8%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7cdaeac4-9455-470b-a774-4862abe5e6d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread 이경준
yes  ㅜㅜ 

and I saw the this issue _ 
github https://github.com/tesseract-ocr/tesseract/issues/688

2018년 3월 14일 수요일 오전 1시 7분 11초 UTC+9, shree 님의 말:
>
> Did you use the fonts_dir where they are installed???
>
> On Tue 13 Mar, 2018, 9:32 PM 이경준,  
> wrote:
>
>> Thank U . I have a fontslist file 
>>
>> but vim fontlist.txt 
>>
>> There are no fonts ?? 
>>
>> It means that I cannot use korena fonts?? 
>>
>> 2018년 3월 13일 화요일 오후 9시 9분 45초 UTC+9, shree 님의 말:
>>>
>>> Give the following command - after changing directories to match your 
>>> setup
>>>
>>> text2image --find_fonts \
>>> --fonts_dir /usr/share/fonts \
>>> --text ../langdata/kor/kor.training_text \
>>> --min_coverage .9  \
>>> --render_per_font false \
>>> --outputbase ../langdata/kor/kor \
>>> |& grep raw | sed -e 's/ :.*/" \\/g'  | sed -e 's/^/  "/' 
>>> >../langdata/kor/fontslist.txt
>>>
>>> and then check the selected fonts in 
>>> ../langdata/kor/fontslist.txt 
>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/d13e3159-5282-461b-bafa-57413cb988f8%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1b210d71-2e65-4738-945d-1a534de038a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread ShreeDevi Kumar
Did you use the fonts_dir where they are installed???

On Tue 13 Mar, 2018, 9:32 PM 이경준,  wrote:

> Thank U . I have a fontslist file
>
> but vim fontlist.txt
>
> There are no fonts ??
>
> It means that I cannot use korena fonts??
>
> 2018년 3월 13일 화요일 오후 9시 9분 45초 UTC+9, shree 님의 말:
>>
>> Give the following command - after changing directories to match your
>> setup
>>
>> text2image --find_fonts \
>> --fonts_dir /usr/share/fonts \
>> --text ../langdata/kor/kor.training_text \
>> --min_coverage .9  \
>> --render_per_font false \
>> --outputbase ../langdata/kor/kor \
>> |& grep raw | sed -e 's/ :.*/" \\/g'  | sed -e 's/^/  "/'
>> >../langdata/kor/fontslist.txt
>>
>> and then check the selected fonts in
>> ../langdata/kor/fontslist.txt
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/d13e3159-5282-461b-bafa-57413cb988f8%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXyvf%3DS9UguA4K8meOHdik1%2BQWt_wTYnm-kV_uk64MPTg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread 이경준
Thank U . I have a fontslist file 

but vim fontlist.txt 

There are no fonts ?? 

It means that I cannot use korena fonts?? 

2018년 3월 13일 화요일 오후 9시 9분 45초 UTC+9, shree 님의 말:
>
> Give the following command - after changing directories to match your setup
>
> text2image --find_fonts \
> --fonts_dir /usr/share/fonts \
> --text ../langdata/kor/kor.training_text \
> --min_coverage .9  \
> --render_per_font false \
> --outputbase ../langdata/kor/kor \
> |& grep raw | sed -e 's/ :.*/" \\/g'  | sed -e 's/^/  "/' 
> >../langdata/kor/fontslist.txt
>
> and then check the selected fonts in 
> ../langdata/kor/fontslist.txt 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d13e3159-5282-461b-bafa-57413cb988f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread ShreeDevi Kumar
Give the following command - after changing directories to match your setup

text2image --find_fonts \
--fonts_dir /usr/share/fonts \
--text ../langdata/kor/kor.training_text \
--min_coverage .9  \
--render_per_font false \
--outputbase ../langdata/kor/kor \
|& grep raw | sed -e 's/ :.*/" \\/g'  | sed -e 's/^/  "/'
>../langdata/kor/fontslist.txt

and then check the selected fonts in
../langdata/kor/fontslist.txt

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVgftq2%2BPS2t5HaoxAb62uxpiMNaPr1v8tXKCSZ_sVGew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread 이경준
Thank U. I have lots of  Korean fonts, But, Only baekmuk fonts do work .

but, I really want to know why pango library. doesn't recognize...

2018년 3월 13일 화요일 오후 7시 48분 44초 UTC+9, shree 님의 말:
>
> remove these two lines and try
>
>--fonts_dir $fonts_dir \
>--fontlist $fonts_for_training \
>
>
> this overrides what is given in language-specific.sh
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Mar 13, 2018 at 4:11 PM, 이경준  
> wrote:
>
>>
>>
>> 2018년 3월 13일 화요일 오후 7시 40분 27초 UTC+9, 이경준 님의 말:
>>
>>> Hi. my name is june. Hi shree. I have a question. I'm using bash script 
>>> you gave me.
>>>
>>>
>>> in the script ..
>>>
>>>
>>> # the EVAL handles the quotes in the font list
>>> eval $tesstrain_dir/tesstrain.sh \
>>>--lang $Lang \
>>>--linedata_only\
>>>--noextract_font_properties \
>>>--exposures "0" \
>>>--fonts_dir $fonts_dir \
>>>--fontlist $fonts_for_training \
>>>--langdata_dir $langdata_dir \
>>>--training_text $langdata_dir/$Lang/$Lang.$plusTraining_text \
>>>--tessdata_dir $bestdata_dir \
>>>--output_dir $train_output_dir
>>>  
>>> P.S everything variables is assgined. and (e.g. 
>>> fonts_for_training="Baekmuk Batang")
>>>
>>>
>>> Run script(above). But I have an error . It doesn't work
>>>
>>>
>>> So I have to delete " --fontlist $fonts_for_training " and  I make a 
>>> pair of tesstrain1.sh & language-specific1.sh (for training_fonts) 
>>>
>>> In this case It does work. 
>>>
>>>
>>> I review my system (ubuntu 16.04.03 LTS) $ fc-list 
>>>
>>> korean. 
>>>
>>> I have lots of korean fonts 
>>>
>>> But, it doesn't work 
>>>
>>> Why pango library doesn't recognize the fonts I installed.
>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/4783cb3a-09ad-47dd-8d0f-099c2fdfafe6%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/387cce9d-8639-46df-9dd2-20cc3409678c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread ShreeDevi Kumar
remove these two lines and try

   --fonts_dir $fonts_dir \
   --fontlist $fonts_for_training \


this overrides what is given in language-specific.sh

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Mar 13, 2018 at 4:11 PM, 이경준  wrote:

>
>
> 2018년 3월 13일 화요일 오후 7시 40분 27초 UTC+9, 이경준 님의 말:
>
>> Hi. my name is june. Hi shree. I have a question. I'm using bash script
>> you gave me.
>>
>>
>> in the script ..
>>
>>
>> # the EVAL handles the quotes in the font list
>> eval $tesstrain_dir/tesstrain.sh \
>>--lang $Lang \
>>--linedata_only\
>>--noextract_font_properties \
>>--exposures "0" \
>>--fonts_dir $fonts_dir \
>>--fontlist $fonts_for_training \
>>--langdata_dir $langdata_dir \
>>--training_text $langdata_dir/$Lang/$Lang.$plusTraining_text \
>>--tessdata_dir $bestdata_dir \
>>--output_dir $train_output_dir
>>
>> P.S everything variables is assgined. and (e.g.
>> fonts_for_training="Baekmuk Batang")
>>
>>
>> Run script(above). But I have an error . It doesn't work
>>
>>
>> So I have to delete " --fontlist $fonts_for_training " and  I make a pair
>> of tesstrain1.sh & language-specific1.sh (for training_fonts)
>>
>> In this case It does work.
>>
>>
>> I review my system (ubuntu 16.04.03 LTS) $ fc-list
>>
>> korean.
>>
>> I have lots of korean fonts
>>
>> But, it doesn't work
>>
>> Why pango library doesn't recognize the fonts I installed.
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/4783cb3a-09ad-47dd-8d0f-099c2fdfafe6%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXHhFdtJGV1C0uCWFahkYQcL-PnWEASNQbLYMtFM8nAiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread 이경준


2018년 3월 13일 화요일 오후 7시 40분 27초 UTC+9, 이경준 님의 말:
>
> Hi. my name is june. Hi shree. I have a question. I'm using bash script 
> you gave me.
>
>
> in the script ..
>
>
> # the EVAL handles the quotes in the font list
> eval $tesstrain_dir/tesstrain.sh \
>--lang $Lang \
>--linedata_only\
>--noextract_font_properties \
>--exposures "0" \
>--fonts_dir $fonts_dir \
>--fontlist $fonts_for_training \
>--langdata_dir $langdata_dir \
>--training_text $langdata_dir/$Lang/$Lang.$plusTraining_text \
>--tessdata_dir $bestdata_dir \
>--output_dir $train_output_dir
>  
> P.S everything variables is assgined. and (e.g. 
> fonts_for_training="Baekmuk Batang")
>
>
> Run script(above). But I have an error . It doesn't work
>
>
> So I have to delete " --fontlist $fonts_for_training " and  I make a pair 
> of tesstrain1.sh & language-specific1.sh (for training_fonts) 
>
> In this case It does work. 
>
>
> I review my system (ubuntu 16.04.03 LTS) $ fc-list 
>
> korean. 
>
> I have lots of korean fonts 
>
> But, it doesn't work 
>
> Why pango library doesn't recognize the fonts I installed.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4783cb3a-09ad-47dd-8d0f-099c2fdfafe6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
#!/bin/bash
# (C) Copyright 2014, Google Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# This script provides an easy way to execute various phases of training
# Tesseract.  For a detailed description of the phases, see
# https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract
#
# USAGE:
#
# tesstrain.sh
#--fontlist FONTS   # A list of fontnames to train on.
#--fonts_dir FONTS_PATH # Path to font files.
#--lang LANG_CODE   # ISO 639 code.
#--langdata_dir DATADIR # Path to tesseract/training/langdata directory.
#--output_dir OUTPUTDIR # Location of output traineddata file.
#--overwrite# Safe to overwrite files in output_dir.
#--linedata_only# Only generate training data for lstmtraining.
#--run_shape_clustering # Run shape clustering (use for Indic langs).
#--exposures EXPOSURES  # A list of exposure levels to use (e.g. "-1 0 1").
#
# OPTIONAL flags for input data. If unspecified we will look for them in
# the langdata_dir directory.
#--training_text TEXTFILE   # Text to render and use for training.
#--wordlist WORDFILE# Word list for the language ordered by
#   # decreasing frequency.
#
# OPTIONAL flag to specify location of existing traineddata files, required
# during feature extraction. If unspecified will use TESSDATA_PREFIX defined in
# the current environment.
#--tessdata_dir TESSDATADIR # Path to tesseract/tessdata directory.
#
# NOTE:
# The font names specified in --fontlist need to be recognizable by Pango using
# fontconfig. An easy way to list the canonical names of all fonts available on
# your system is to run text2image with --list_available_fonts and the
# appropriate --fonts_dir path.


source "$(dirname $0)/tesstrain_utils.sh"

ARGV=("$@")
parse_flags

mkdir -p ${TRAINING_DIR}
tlog "\n=== Starting training for language '${LANG_CODE}'"

source "$(dirname $0)/language-specific1.sh"
set_lang_specific_parameters ${LANG_CODE}

initialize_fontconfig

phase_I_generate_image 8
phase_UP_generate_unicharset
if ((LINEDATA)); then
  phase_E_extract_features "lstm.train" 8 "lstmf"
  make__lstmdata
else
  phase_D_generate_dawg
  phase_E_extract_features "box.train" 8 "tr"
  phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto"
  if [[ "${ENABLE_SHAPE_CLUSTERING}" == "y" ]]; then
  phase_S_cluster_shapes
  fi
  phase_M_cluster_microfeatures
  phase_B_generate_ambiguities
  make__traineddata
fi

tlog "\nCompleted training for language '${LANG_CODE}'\n"
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#