date:20180814

Re: [tesseract-ocr] Training tools don't get built when building tesseract from souce

2018-08-14 Thread Shree Devi Kumar

│   │   ├── training
│   │   │   ├── ambiguous_words
│   │   │   ├── ambiguous_words.o
│   │   │   ├── boxchar.lo
│   │   │   ├── boxchar.o
│   │   │   ├── classifier_tester
│   │   │   ├── classifier_tester.o
│   │   │   ├── cntraining
│   │   │   ├── cntraining.o
│   │   │   ├── combine_lang_model
│   │   │   ├── combine_lang_model.o
│   │   │   ├── combine_tessdata
│   │   │   ├── combine_tessdata.o
│   │   │   ├── commandlineflags.lo
│   │   │   ├── commandlineflags.o
│   │   │   ├── commontraining.lo
│   │   │   ├── commontraining.o
│   │   │   ├── dawg2wordlist
│   │   │   ├── dawg2wordlist.o
│   │   │   ├── degradeimage.lo
│   │   │   ├── degradeimage.o
│   │   │   ├── fileio.lo
│   │   │   ├── fileio.o
│   │   │   ├── lang_model_helpers.lo
│   │   │   ├── lang_model_helpers.o
│   │   │   ├── libtesseract_tessopt.la
│   │   │   ├── libtesseract_training.la
│   │   │   ├── ligature_table.lo
│   │   │   ├── ligature_table.o
│   │   │   ├── lstmeval
│   │   │   ├── lstmeval.o
│   │   │   ├── lstmtester.lo
│   │   │   ├── lstmtester.o
│   │   │   ├── lstmtraining
│   │   │   ├── lstmtraining.o
│   │   │   ├── Makefile
│   │   │   ├── mergenf.o
│   │   │   ├── merge_unicharsets
│   │   │   ├── merge_unicharsets.o
│   │   │   ├── mftraining
│   │   │   ├── mftraining.o
│   │   │   ├── normstrngs.lo
│   │   │   ├── normstrngs.o
│   │   │   ├── pango_font_info.lo
│   │   │   ├── pango_font_info.o
│   │   │   ├── set_unicharset_properties
│   │   │   ├── set_unicharset_properties.o
│   │   │   ├── shapeclustering
│   │   │   ├── shapeclustering.o
│   │   │   ├── stringrenderer.lo
│   │   │   ├── stringrenderer.o
│   │   │   ├── tessopt.lo
│   │   │   ├── tessopt.o
│   │   │   ├── text2image
│   │   │   ├── text2image.o
│   │   │   ├── tlog.lo
│   │   │   ├── tlog.o
│   │   │   ├── unicharset_extractor
│   │   │   ├── unicharset_extractor.o
│   │   │   ├── unicharset_training_utils.lo
│   │   │   ├── unicharset_training_utils.o
│   │   │   ├── validate_grapheme.lo
│   │   │   ├── validate_grapheme.o
│   │   │   ├── validate_indic.lo
│   │   │   ├── validate_indic.o
│   │   │   ├── validate_javanese.lo
│   │   │   ├── validate_javanese.o
│   │   │   ├── validate_khmer.lo
│   │   │   ├── validate_khmer.o
│   │   │   ├── validate_myanmar.lo
│   │   │   ├── validate_myanmar.o
│   │   │   ├── validator.lo
│   │   │   ├── validator.o
│   │   │   ├── wordlist2dawg
│   │   │   └── wordlist2dawg.o




On Wed, Aug 15, 2018 at 9:35 AM Shree Devi Kumar 
wrote:

> libtool: install: /usr/bin/install -c .libs/combine_lang_model 
> /usr/local/bin/combine_lang_model
> libtool: install: /usr/bin/install -c .libs/combine_tessdata 
> /usr/local/bin/combine_tessdata
> libtool: install: /usr/bin/install -c .libs/dawg2wordlist 
> /usr/local/bin/dawg2wordlist
> libtool: install: /usr/bin/install -c .libs/lstmeval /usr/local/bin/lstmeval
> libtool: install: /usr/bin/install -c .libs/lstmtraining 
> /usr/local/bin/lstmtraining
> libtool: install: /usr/bin/install -c .libs/merge_unicharsets 
> /usr/local/bin/merge_unicharsets
> libtool: install: /usr/bin/install -c .libs/set_unicharset_properties 
> /usr/local/bin/set_unicharset_properties
> libtool: install: /usr/bin/install -c .libs/text2image 
> /usr/local/bin/text2image
> libtool: install: /usr/bin/install -c .libs/unicharset_extractor 
> /usr/local/bin/unicharset_extractor
> libtool: install: /usr/bin/install -c .libs/wordlist2dawg 
> /usr/local/bin/wordlist2dawg
> libtool: install: /usr/bin/install -c .libs/ambiguous_words 
> /usr/local/bin/ambiguous_words
> libtool: install: /usr/bin/install -c .libs/classifier_tester 
> /usr/local/bin/classifier_tester
> libtool: install: /usr/bin/install -c .libs/cntraining 
> /usr/local/bin/cntraining
> libtool: install: /usr/bin/install -c .libs/mftraining 
> /usr/local/bin/mftraining
> libtool: install: /usr/bin/install -c .libs/shapeclustering 
> /usr/local/bin/shapeclustering
>
>
> The files are installed in /usr/local/bin
>
>

-- 


भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWiENY6Hg3WaFOzX%3DH%2BQdj%2BxWZBDJ4zOxUOJSqEH3UiNQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Training tools don't get built when building tesseract from souce

2018-08-14 Thread Shree Devi Kumar

libtool: install: /usr/bin/install -c .libs/combine_lang_model
/usr/local/bin/combine_lang_model
libtool: install: /usr/bin/install -c .libs/combine_tessdata
/usr/local/bin/combine_tessdata
libtool: install: /usr/bin/install -c .libs/dawg2wordlist
/usr/local/bin/dawg2wordlist
libtool: install: /usr/bin/install -c .libs/lstmeval /usr/local/bin/lstmeval
libtool: install: /usr/bin/install -c .libs/lstmtraining
/usr/local/bin/lstmtraining
libtool: install: /usr/bin/install -c .libs/merge_unicharsets
/usr/local/bin/merge_unicharsets
libtool: install: /usr/bin/install -c .libs/set_unicharset_properties
/usr/local/bin/set_unicharset_properties
libtool: install: /usr/bin/install -c .libs/text2image /usr/local/bin/text2image
libtool: install: /usr/bin/install -c .libs/unicharset_extractor
/usr/local/bin/unicharset_extractor
libtool: install: /usr/bin/install -c .libs/wordlist2dawg
/usr/local/bin/wordlist2dawg
libtool: install: /usr/bin/install -c .libs/ambiguous_words
/usr/local/bin/ambiguous_words
libtool: install: /usr/bin/install -c .libs/classifier_tester
/usr/local/bin/classifier_tester
libtool: install: /usr/bin/install -c .libs/cntraining /usr/local/bin/cntraining
libtool: install: /usr/bin/install -c .libs/mftraining /usr/local/bin/mftraining
libtool: install: /usr/bin/install -c .libs/shapeclustering
/usr/local/bin/shapeclustering


The files are installed in /usr/local/bin

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV3EDypRoWKMSp6U_7C--o8-r9UAcZSUuzdt99uWpoRBw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: LSTM files

2018-08-14 Thread zwwtsinghua

tesseract fas.B_Mitra.exp0.tif fas.B_Mitra.exp0 lstm.train
tesseract fas.B_Mitra.exp1.tif fas.B_Mitra.exp1 lstm.train
.
.
.
you can try these.
I'm not quite sure, since I didn't doing like this before.


在 2018年8月14日星期二 UTC+8下午9:11:37，Zohreh Khosrobeygi写道：
>
> ok, but I have some tif and box files for each font for example:
> fas.B_Mitra.exp0.box
> fas.B_Mitra.exp0.tif
> fas.B_Mitra.exp1.box
> fas.B_Mitra.exp1.tif
> fas.B_Mitra.exp2.box
> fas.B_Mitra.exp2.tif
> .
> .
> .
> How can I make lstm for each of them?
>
>
>
> On Tue, Aug 14, 2018 at 4:56 PM, > wrote:
>
>> I mean put all the file path in this file, then running the lstmtraining
>> # cat eng.training_files.txt
>> /home/tess-ocr/model_output/test//eng.Arial.exp0.lstmf
>> /home/tess-ocr/model_output/test//eng.Microsoft_YaHei.exp0.lstmf
>> /home/tess-ocr/model_output/test//eng.Times_New_Roman.exp0.lstmf
>>
>>
>> 在 2018年8月14日星期二 UTC+8下午6:04:48，Zohreh Khosrobeygi写道：
>>>
>>> Sorry, I couldn't understand. 
>>> Could you please explain more this "and then put all the lstm files 
>>> together in training_files.txt"
>>>
>>> On Tue, Aug 14, 2018 at 1:19 PM,  wrote:
>>>
 you should use tessearct command for each of your box/tif pair 
 tesseract ${dir}/lang.font.exp0.tif ${dir}/lang.font.exp0 lstm.train
 and then put all the lstm files together in training_files.txt

 在 2018年8月13日星期一 UTC+8下午6:16:09，Zohreh Khosrobeygi写道：
>
> Hi, 
> I have been training persian language. My text is too large so I had 
> to generated 18 boxfiles and 18 tifs for one text. Then I make on 
> unicharset for all 18 files. Now when I want to make lstm file, it just 
> create one lstm,"fas.B_Mitra.exp0.lstmf" and can't create for 1 ot 18. 
> I test somthing else. I created "fas.B_Nazanin.exp0.lstmf" too and 
> use lstmtraining but it just used fas.B_Mitra.exp0.lstmf and did not use 
> another.
> How can I make a lstm for all my boxes?
> Thx.
>
 -- 
 You received this message because you are subscribed to a topic in the 
 Google Groups "tesseract-ocr" group.
 To unsubscribe from this topic, visit 
 https://groups.google.com/d/topic/tesseract-ocr/928-Wfn5rGs/unsubscribe
 .
 To unsubscribe from this group and all its topics, send an email to 
 tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/97a24af8-fb96-402a-a15b-1e6a7df405ca%40googlegroups.com
  
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>
>>>
>>> -- 
>>> Zohreh Khosrobeygi
>>> University of Tehran, 2016
>>> Tel: +989196042887
>>> khosrobe...@ut.ac.ir
>>>
>>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/tesseract-ocr/928-Wfn5rGs/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/67f4fb37-b3d2-4d11-83ff-d83607c48966%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Zohreh Khosrobeygi
> University of Tehran, 2016
> Tel: +989196042887
> khosrobe...@ut.ac.ir 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a42188a8-2b1a-40e2-9f29-519cc0e0db40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: LSTM files

2018-08-14 Thread Khosrobeigy.zohreh

ok, but I have some tif and box files for each font for example:
fas.B_Mitra.exp0.box
fas.B_Mitra.exp0.tif
fas.B_Mitra.exp1.box
fas.B_Mitra.exp1.tif
fas.B_Mitra.exp2.box
fas.B_Mitra.exp2.tif
.
.
.
How can I make lstm for each of them?



On Tue, Aug 14, 2018 at 4:56 PM,  wrote:

> I mean put all the file path in this file, then running the lstmtraining
> # cat eng.training_files.txt
> /home/tess-ocr/model_output/test//eng.Arial.exp0.lstmf
> /home/tess-ocr/model_output/test//eng.Microsoft_YaHei.exp0.lstmf
> /home/tess-ocr/model_output/test//eng.Times_New_Roman.exp0.lstmf
>
>
> 在 2018年8月14日星期二 UTC+8下午6:04:48，Zohreh Khosrobeygi写道：
>>
>> Sorry, I couldn't understand.
>> Could you please explain more this "and then put all the lstm files
>> together in training_files.txt"
>>
>> On Tue, Aug 14, 2018 at 1:19 PM,  wrote:
>>
>>> you should use tessearct command for each of your box/tif pair
>>> tesseract ${dir}/lang.font.exp0.tif ${dir}/lang.font.exp0 lstm.train
>>> and then put all the lstm files together in training_files.txt
>>>
>>> 在 2018年8月13日星期一 UTC+8下午6:16:09，Zohreh Khosrobeygi写道：

 Hi,
 I have been training persian language. My text is too large so I had to
 generated 18 boxfiles and 18 tifs for one text. Then I make on unicharset
 for all 18 files. Now when I want to make lstm file, it just create one
 lstm,"fas.B_Mitra.exp0.lstmf" and can't create for 1 ot 18.
 I test somthing else. I created "fas.B_Nazanin.exp0.lstmf" too and
 use lstmtraining but it just used fas.B_Mitra.exp0.lstmf and did not use
 another.
 How can I make a lstm for all my boxes?
 Thx.

>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>> pic/tesseract-ocr/928-Wfn5rGs/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/97a24af8-fb96-402a-a15b-1e6a7df405ca%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Zohreh Khosrobeygi
>> University of Tehran, 2016
>> Tel: +989196042887
>> khosrobe...@ut.ac.ir
>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/tesseract-ocr/928-Wfn5rGs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/67f4fb37-b3d2-4d11-83ff-d83607c48966%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Zohreh Khosrobeygi
University of Tehran, 2016
Tel: +989196042887
khosrobeygi.zo...@ut.ac.ir 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgz5Y6JxrWSBi5ODSbK0cmphFAwro7qW02b0-n_AujKdQQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: why such simple word can't be recognized?

2018-08-14 Thread zwwtsinghua

It's interesting. I'v tried many way to process the img, binary inverse, 
cut, resize. 
I'v tried with oem of 3.0.0 and 4.0.0,  psm of 3\6\7 
I thought maybe some one works, but actually no one did, and nothing went 
out
Maybe this special fonts just hit some weakness of tesseract


在 2018年8月14日星期二 UTC+8下午6:59:01，xll...@gmail.com写道：
>
> I use opencv to extract chars from image and combine them together, but 
> tasseract failure to recognize it.
> I have tested with paramters "-c 
> tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.-\\'"
>   
> "-psm 7" and "-psm 8", still no lucky.
> please see attachment, ears.png
>
> but some others were successful, like godmother.png.
>
> who could teach me, please.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2f569d8f-d70a-4740-9c51-1691d96f4541%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: LSTM files

2018-08-14 Thread zwwtsinghua

I mean put all the file path in this file, then running the lstmtraining
# cat eng.training_files.txt
/home/tess-ocr/model_output/test//eng.Arial.exp0.lstmf
/home/tess-ocr/model_output/test//eng.Microsoft_YaHei.exp0.lstmf
/home/tess-ocr/model_output/test//eng.Times_New_Roman.exp0.lstmf


在 2018年8月14日星期二 UTC+8下午6:04:48，Zohreh Khosrobeygi写道：
>
> Sorry, I couldn't understand. 
> Could you please explain more this "and then put all the lstm files 
> together in training_files.txt"
>
> On Tue, Aug 14, 2018 at 1:19 PM, > wrote:
>
>> you should use tessearct command for each of your box/tif pair 
>> tesseract ${dir}/lang.font.exp0.tif ${dir}/lang.font.exp0 lstm.train
>> and then put all the lstm files together in training_files.txt
>>
>> 在 2018年8月13日星期一 UTC+8下午6:16:09，Zohreh Khosrobeygi写道：
>>>
>>> Hi, 
>>> I have been training persian language. My text is too large so I had to 
>>> generated 18 boxfiles and 18 tifs for one text. Then I make on unicharset 
>>> for all 18 files. Now when I want to make lstm file, it just create one 
>>> lstm,"fas.B_Mitra.exp0.lstmf" and can't create for 1 ot 18. 
>>> I test somthing else. I created "fas.B_Nazanin.exp0.lstmf" too and 
>>> use lstmtraining but it just used fas.B_Mitra.exp0.lstmf and did not use 
>>> another.
>>> How can I make a lstm for all my boxes?
>>> Thx.
>>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/tesseract-ocr/928-Wfn5rGs/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/97a24af8-fb96-402a-a15b-1e6a7df405ca%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Zohreh Khosrobeygi
> University of Tehran, 2016
> Tel: +989196042887
> khosrobe...@ut.ac.ir 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/67f4fb37-b3d2-4d11-83ff-d83607c48966%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] why such simple word can't be recognized?

2018-08-14 Thread xllacyx

I use opencv to extract chars from image and combine them together, but 
tasseract failure to recognize it.
I have tested with paramters "-c 
tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz,.-\\'"
  
"-psm 7" and "-psm 8", still no lucky.
please see attachment, ears.png

but some others were successful, like godmother.png.

who could teach me, please.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8744457a-85cf-4359-b82e-b68597e28d55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: LSTM files

2018-08-14 Thread Khosrobeigy.zohreh

Sorry, I couldn't understand.
Could you please explain more this "and then put all the lstm files
together in training_files.txt"

On Tue, Aug 14, 2018 at 1:19 PM,  wrote:

> you should use tessearct command for each of your box/tif pair
> tesseract ${dir}/lang.font.exp0.tif ${dir}/lang.font.exp0 lstm.train
> and then put all the lstm files together in training_files.txt
>
> 在 2018年8月13日星期一 UTC+8下午6:16:09，Zohreh Khosrobeygi写道：
>>
>> Hi,
>> I have been training persian language. My text is too large so I had to
>> generated 18 boxfiles and 18 tifs for one text. Then I make on unicharset
>> for all 18 files. Now when I want to make lstm file, it just create one
>> lstm,"fas.B_Mitra.exp0.lstmf" and can't create for 1 ot 18.
>> I test somthing else. I created "fas.B_Nazanin.exp0.lstmf" too and
>> use lstmtraining but it just used fas.B_Mitra.exp0.lstmf and did not use
>> another.
>> How can I make a lstm for all my boxes?
>> Thx.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/tesseract-ocr/928-Wfn5rGs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/97a24af8-fb96-402a-a15b-1e6a7df405ca%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Zohreh Khosrobeygi
University of Tehran, 2016
Tel: +989196042887
khosrobeygi.zo...@ut.ac.ir 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgz6kOnb7LO5J9ZbZ9zdbH40a%2BQnVm-_T37nTLr-b_OBtA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Error on combine_lang_model script; Null char=2 Invalid format in radical table at line 4: 3400 1.4 Creation of encoded unicharset failed!! Error writing recoder!!

2018-08-14 Thread zwwtsinghua

I'v come across with the same fault before
Because I simply move langdata that clone on window to linux server.
As a consequence, the radical-stroke.txt file which need to be formed on 
"CL" turn to be "CR LF"
everything went right after I convert this file 

在 2018年8月6日星期一 UTC+8下午12:11:33，Shandigutt写道：
>
> Hi,
>
> I am trying to train Tesseract for Sinhala language. I was following training 
> guidelines 
> 
>  
> mentioned in Github wiki. I get an error with reference to the 4th step 
> which is "Creating Starter Traineddata". Please find the below command I 
> executed,
>
> training/combine_lang_model --input_unicharset 
> ../training/sin/sin.unicharset --script_dir ../langdata --words 
> ../langdata/sin/sin.wordlist --puncs ../langdata/sin/sin.punc --numbers 
> ../langdata/sin/sin.numbers --output_dir ../training/combined_sin 
> --version_str 1.0 --lang sin
>
> I get the following output,
>
> Loaded unicharset of size 94 from file ../training/sin/sin.unicharset
> Setting unichar properties
> Setting script properties
> Warning: properties incomplete for index 4 = ී
> Warning: properties incomplete for index 6 = ි
> Warning: properties incomplete for index 11 = ු
> Warning: properties incomplete for index 15 = ්‌
> Warning: properties incomplete for index 33 = ූ
> Warning: properties incomplete for index 52 = ්‍ර
> Warning: properties incomplete for index 56 = ්‍ය
> Warning: properties incomplete for index 87 = ක්‍
> Warning: properties incomplete for index 93 = ර්‍
> Config file is optional, continuing...
> Null char=2
> Invalid format in radical table at line 4: 34001.4
> Creation of encoded unicharset failed!!
> Error writing recoder!!
> Reducing Trie to SquishedDawg
> Reducing Trie to SquishedDawg
> Reducing Trie to SquishedDawg
>
> For more information I have attached my sin.unicharset file and sin.config 
> files. 
>
> I use below Tesseract version,
>
> tesseract -v
> tesseract 4.00.00dev-696-geba0ae3
>  leptonica-1.74.4
>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 
> 1.2.8
>
>  Found SSE
>
> I use below OS,
>
> uname -a
> Linux shandigutt-laptop-ubuntu 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 
> 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> Appreciate if somebody can please help me on this.
>
> Thannks
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/aed90864-5061-465a-a68f-2e1fcddf1e14%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: LSTM files

2018-08-14 Thread zwwtsinghua

you should use tessearct command for each of your box/tif pair 
tesseract ${dir}/lang.font.exp0.tif ${dir}/lang.font.exp0 lstm.train
and then put all the lstm files together in training_files.txt

在 2018年8月13日星期一 UTC+8下午6:16:09，Zohreh Khosrobeygi写道：
>
> Hi, 
> I have been training persian language. My text is too large so I had to 
> generated 18 boxfiles and 18 tifs for one text. Then I make on unicharset 
> for all 18 files. Now when I want to make lstm file, it just create one 
> lstm,"fas.B_Mitra.exp0.lstmf" and can't create for 1 ot 18. 
> I test somthing else. I created "fas.B_Nazanin.exp0.lstmf" too and 
> use lstmtraining but it just used fas.B_Mitra.exp0.lstmf and did not use 
> another.
> How can I make a lstm for all my boxes?
> Thx.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/97a24af8-fb96-402a-a15b-1e6a7df405ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Training tools don't get built when building tesseract from souce

Re: [tesseract-ocr] Training tools don't get built when building tesseract from souce

Re: [tesseract-ocr] Re: LSTM files

Re: [tesseract-ocr] Re: LSTM files

[tesseract-ocr] Re: why such simple word can't be recognized?

Re: [tesseract-ocr] Re: LSTM files

[tesseract-ocr] why such simple word can't be recognized?

Re: [tesseract-ocr] Re: LSTM files

[tesseract-ocr] Re: Error on combine_lang_model script; Null char=2 Invalid format in radical table at line 4: 3400 1.4 Creation of encoded unicharset failed!! Error writing recoder!!

[tesseract-ocr] Re: LSTM files

10 matches

Site Navigation

Mail list logo

Footer information