[tesseract-ocr] training font

2017-03-19 Thread Ava Nimaee
hi , i need your help.
i want know that in tesseract-ocr for persian , we have a train for each 
font or we have a train for all fonts ?thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8fd2ce87-545d-4e10-ad6d-5585b1cb8cfc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: training font

2017-04-08 Thread Ava Nimaee
thank you 

On Wednesday, March 22, 2017 at 10:31:41 PM UTC+4:30, Saurabh Srivastav 
wrote:
>
> you can train it for single font.
>
> On Sunday, March 19, 2017 at 1:23:50 PM UTC+5:30, Ava Nimaee wrote:
>>
>> hi , i need your help.
>> i want know that in tesseract-ocr for persian , we have a train for each 
>> font or we have a train for all fonts ?thanks
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5af424c8-889a-4c21-800e-f210d493a9c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] what fonts does esseract support?

2017-04-08 Thread Ava Nimaee
hi sorry i want know that what fonts does tesseract support?  also , what 
are tesseract's priority for training?
thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9d31c72f-d6e6-4cba-a9ed-3b8c574cc7ae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] ERROR: Could not find training text file

2017-07-31 Thread Ava Nimaee
Hi . sorry I used this syntax:
training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng 
--linedata_only \
  --noextract_font_properties --langdata_dir langdata \
  --tessdata_dir tessdata \
  --fontlist "Times New Roman," --output_dir engtrain
Befor that i create boxfile and tif and Ucnicahset_output
I clone langdata for tesseract v4.0
but take this error:
 === Phase I: Generating training images ===
ERROR: Could not find training text file langdata/eng/eng.training_text
i can't solve it and i don't know where should i put taining_text.txt 
actually it is a text file that i want train it.
Thanks for attention.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a141d688-bc59-4485-b7bc-66ac650ebfd8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Error:Assert failed:in file text2image.cpp, line 428

2017-07-29 Thread Ava Nimaee
I use tesseract v 4.0 on ubuntu 16.04

On Wednesday, July 26, 2017 at 11:20:25 AM UTC+4:30, shree wrote:
>
> Which version of tesseract are you using? Which platform?
>
> Try building the latest code from github and use that.
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Jul 25, 2017 at 9:02 PM, Ava Nimaee <beigy@gmail.com 
> > wrote:
>
>> hi
>> sorry but i can't solve this error. when i used  "text2image 
>> --text=training_text.txt –outputbase=eng.Times New Roman,.exp0 
>> --font='Times New Roman,' --fonts_dir=/usr/share/fonts"
>> show me this :
>> Output file missing!
>> !FLAGS_outputbase.empty():Error:Assert failed:in file text2image.cpp, 
>> line 428
>> Segmentation fault (core dumped)
>> can you please help me?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/67213a5a-a743-4705-8a05-7db4ee4b6a79%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/67213a5a-a743-4705-8a05-7db4ee4b6a79%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/875c5ade-455e-4b1b-bf60-f827231e6f38%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Error:Assert failed:in file text2image.cpp, line 428

2017-07-29 Thread Ava Nimaee
Thank for your help

On Wednesday, July 26, 2017 at 11:20:25 AM UTC+4:30, shree wrote:
>
> Which version of tesseract are you using? Which platform?
>
> Try building the latest code from github and use that.
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Jul 25, 2017 at 9:02 PM, Ava Nimaee <beigy@gmail.com 
> > wrote:
>
>> hi
>> sorry but i can't solve this error. when i used  "text2image 
>> --text=training_text.txt –outputbase=eng.Times New Roman,.exp0 
>> --font='Times New Roman,' --fonts_dir=/usr/share/fonts"
>> show me this :
>> Output file missing!
>> !FLAGS_outputbase.empty():Error:Assert failed:in file text2image.cpp, 
>> line 428
>> Segmentation fault (core dumped)
>> can you please help me?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/67213a5a-a743-4705-8a05-7db4ee4b6a79%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/67213a5a-a743-4705-8a05-7db4ee4b6a79%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/623d90b5-7269-4450-a297-417dc48290ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] ERROR: Could not find training text file

2017-08-04 Thread Ava Nimaee
Thanks alot

On Monday, July 31, 2017 at 4:10:14 PM UTC+4:30, shree wrote:
>
> add a line similar to following to your training command, pointing to 
> where you have your training text
>
>   --training_text ../langdata/eng/eng.training_text \
>
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Mon, Jul 31, 2017 at 4:24 PM, Ava Nimaee <beigy@gmail.com 
> > wrote:
>
>> Hi . sorry I used this syntax:
>> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng 
>> --linedata_only \
>>   --noextract_font_properties --langdata_dir langdata \
>>   --tessdata_dir tessdata \
>>   --fontlist "Times New Roman," --output_dir engtrain
>> Befor that i create boxfile and tif and Ucnicahset_output
>> I clone langdata for tesseract v4.0
>> but take this error:
>>  === Phase I: Generating training images ===
>> ERROR: Could not find training text file langdata/eng/eng.training_text
>> i can't solve it and i don't know where should i put taining_text.txt 
>> actually it is a text file that i want train it.
>> Thanks for attention.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/a141d688-bc59-4485-b7bc-66ac650ebfd8%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/a141d688-bc59-4485-b7bc-66ac650ebfd8%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8633cd80-bf08-48ee-b219-de7cede2aafe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

2017-08-16 Thread Ava Nimaee
Thanks alot. you're right .
the path shoulde be compelet i 
used /home/zohreh/Desktop/tesseract-master/z/engtrian/eng/eng.traineddata 
 insted of z/engtrain/eng/eng.traineddata.
it just can write the path from root

On Wednesday, August 16, 2017 at 5:50:18 AM UTC+4:30, roberty...@gmail.com 
wrote:
>
> Hi, I don't encounter this error.
>
> But you may check your traineddata whether in the correct directory, as 
> well as some other paths.
>
> 在 2017年8月15日星期二 UTC+8下午5:45:17,Ava Nimaee写道:
>>
>> Hi thanks for your help
>> i used your link. but i got this error:
>> mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file 
>> ../lstm/lstmtrainer.h, line 110
>> Segmentation fault (core dumped)
>> I wanna start train persian language.so im trying english first. i creat 
>> boxfile and unicharset .then eng.charset_size=110.txt 
>> ,eng.Times_New_Roman.exp0.lstmf , eng.traineddata , eng.training_files.txt 
>> , eng.unicharset
>> that all of those have created with this syntax:
>> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng   
>>  --training_text training/langdata/eng/eng.training_text 
>> --linedata_only \
>>   --noextract_font_properties --langdata_dir training/langdata \
>>   --tessdata_dir ./tessdata \
>>   --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian
>> and now i have error that i told you
>>
>> On Monday, August 14, 2017 at 1:00:02 PM UTC+4:30, roberty...@gmail.com 
>> wrote:
>>>
>>>  What problems do you encounter? Please give more information about the 
>>> problems.
>>>
>>> I later used the new tutorial (
>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact)
>>>  
>>> to train data, and I didn't have any problems. Hope to help you.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/28e0ed79-a5d2-44a2-824f-a6c408145e84%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

2017-08-16 Thread Ava Nimaee
sorry i have a qustion:
what is the output of this syntax.because i after that i have alot of 
 base44.409_2195.checkpoint. but in tutorials i saw eng.lstm
and i have not that. whic syntax create eng.lstm?

I must thank you for your support at this time


On Wednesday, August 16, 2017 at 5:50:18 AM UTC+4:30, roberty...@gmail.com 
wrote:
>
> Hi, I don't encounter this error.
>
> But you may check your traineddata whether in the correct directory, as 
> well as some other paths.
>
> 在 2017年8月15日星期二 UTC+8下午5:45:17,Ava Nimaee写道:
>>
>> Hi thanks for your help
>> i used your link. but i got this error:
>> mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file 
>> ../lstm/lstmtrainer.h, line 110
>> Segmentation fault (core dumped)
>> I wanna start train persian language.so im trying english first. i creat 
>> boxfile and unicharset .then eng.charset_size=110.txt 
>> ,eng.Times_New_Roman.exp0.lstmf , eng.traineddata , eng.training_files.txt 
>> , eng.unicharset
>> that all of those have created with this syntax:
>> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng   
>>  --training_text training/langdata/eng/eng.training_text 
>> --linedata_only \
>>   --noextract_font_properties --langdata_dir training/langdata \
>>   --tessdata_dir ./tessdata \
>>   --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian
>> and now i have error that i told you
>>
>> On Monday, August 14, 2017 at 1:00:02 PM UTC+4:30, roberty...@gmail.com 
>> wrote:
>>>
>>>  What problems do you encounter? Please give more information about the 
>>> problems.
>>>
>>> I later used the new tutorial (
>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact)
>>>  
>>> to train data, and I didn't have any problems. Hope to help you.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/df771d0a-f104-4f0b-9628-f281f8c3da3f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

2017-08-15 Thread Ava Nimaee
Hi thanks for your help
i used your link. but i got this error:
mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file 
../lstm/lstmtrainer.h, line 110
Segmentation fault (core dumped)
I wanna start train persian language.so im trying english first. i creat 
boxfile and unicharset .then eng.charset_size=110.txt 
,eng.Times_New_Roman.exp0.lstmf , eng.traineddata , eng.training_files.txt 
, eng.unicharset
that all of those have created with this syntax:
training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng   
 --training_text training/langdata/eng/eng.training_text 
--linedata_only \
  --noextract_font_properties --langdata_dir training/langdata \
  --tessdata_dir ./tessdata \
  --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian
and now i have error that i told you

On Monday, August 14, 2017 at 1:00:02 PM UTC+4:30, roberty...@gmail.com 
wrote:
>
>  What problems do you encounter? Please give more information about the 
> problems.
>
> I later used the new tutorial (
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact)
>  
> to train data, and I didn't have any problems. Hope to help you.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/32fbf734-6549-4a5d-8fef-f08ad4085097%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

2017-08-14 Thread Ava Nimaee
I have traineddata in this 
path: /home/zohreh/tesstutorial/engtrian/eng/eng.traineddata.
that with using :
training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng   
 --training_text training/langdata/eng/eng.training_text 
--linedata_only \
  --noextract_font_properties --langdata_dir training/langdata \
  --tessdata_dir ./tessdata \
  --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian
i created it.
And also i used the link that u sent me.
sorry shree but i  tried alot but i couldn't solve that.


On Monday, August 7, 2017 at 10:28:05 PM UTC+4:30, shree wrote:
>
> You also need to provide a traineddata file as input
>
> Please review the updated training instructions in the wiki and change the 
> training commands accordingly.
>
> On 07-Aug-2017 6:15 PM, "Ava Nimaee" <beigy@gmail.com > 
> wrote:
>
>> hi how can you solve it? i have this error too.
>> please help me
>>
>> On Friday, August 4, 2017 at 11:03:41 AM UTC+4:30, roberty...@gmail.com 
>> wrote:
>>>
>>> Hello,
>>>
>>> I use the 'git pull' command to update the code from the link 
>>> https://github.com/tesseract-ocr/tesseract.git, and I recompile, 
>>> reinstall the Tess4.0.
>>>
>>> But when I execute the command (showed in below) to finetune the 
>>> traineddata, an error appears: 
>>> "mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file 
>>> ../lstm/lstmtrainer.h, line 110"
>>>
>>> lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned 
>>> \
>>> --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
>>> --train_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \
>>> --eval_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \
>>> --target_error_rate 0.01
>>>
>>>
>>>
>>> There is nothing wrong with the Tess before updating the code. But now, 
>>> An assertion error crashes. Why? Can you help me?
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/7c66d368-f232-4eed-abfc-3bba2418f024%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/7c66d368-f232-4eed-abfc-3bba2418f024%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3ae829b7-0a54-4439-b895-46ca2955c77f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: training font

2017-07-15 Thread Ava Nimaee
sorry about my delay
i use tesseract v 4.0

On Saturday, April 8, 2017 at 11:02:33 PM UTC+4:30, peiman F. wrote:
>
> which version of tesseract you are using
> tesseract dont support persian completely yet
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cfe0f649-1c6c-40c1-8848-21b8a7f37661%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: train a new font for language of persian

2017-07-18 Thread Ava Nimaee
sorry about me delay i should train some words like as لا
in previous version like as this word detect wrong. and now i want 
understand in version 4.0 we need to font detection or no we can trani any 
font together?
and is there a bach file for tesseract 4.0 ? i can have it ?
thanks alot


On Friday, May 5, 2017 at 7:01:03 PM UTC+4:30, shree wrote:
>
> There is already farsi/persian traineddata for tesseract-ocr 4.0-alpha at 
> https://github.com/tesseract-ocr/tessdata/raw/master/fas.traineddata
>
> Have you given it a try? Which font do you want to add to it?
>
> On Thursday, May 4, 2017 at 6:06:09 PM UTC+5:30, Ava Nimaee wrote:
>>
>> hi every one. i want start to use tesseract to first. i need learn about 
>> where i shuld start? i want train a new font for persian language .but i 
>> have been confused.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3591b7fc-6e1c-4c36-ad0b-fdb5a7615af2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] create boxfile and tiff

2017-07-25 Thread Ava Nimaee
hi 
i used *text2image --text=training_text.txt --outputbase=eng.* 
*Times_New_Roman,.exp0 
--font='* Times_New_Roman,*' --fonts_dir=* */usr/share/fonts  but show 
this *

*FcInitiReinitialize failed!!Could not find font named Arial. Pango 
suggested font Please correct --font arg.:Error:Assert failed:in file 
text2image.cpp, line 437Segmentation fault (core dumped)sorry i cant solve 
itcan you help me*

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/65690cd5-fbd4-4c6c-8bca-228289f71901%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Error:Assert failed:in file text2image.cpp, line 428

2017-07-25 Thread Ava Nimaee
hi
sorry but i can't solve this error. when i used  "text2image 
--text=training_text.txt –outputbase=eng.Times New Roman,.exp0 
--font='Times New Roman,' --fonts_dir=/usr/share/fonts"
show me this :
Output file missing!
!FLAGS_outputbase.empty():Error:Assert failed:in file text2image.cpp, line 
428
Segmentation fault (core dumped)
can you please help me?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/67213a5a-a743-4705-8a05-7db4ee4b6a79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread Ava Nimaee
thank for your attention
i remove all and install again last version tesseract and leptonica and use 
this syntax
training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng   
 --training_text training/langdata/eng/eng.training_text 
--linedata_only \
  --noextract_font_properties --langdata_dir training/langdata \
  --tessdata_dir ./tessdata \
  --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian

but got a new error. all of things is ok but at the end took this:

Setting unichar properties
Other case É of é is not in unicharset
Setting script properties
Failed to read data from: training/langdata/eng/eng.config
Null char=2
Invalid format in radical table at line 4: 3400 1.4
Creation of encoded unicharset failed!!
Error writing recoder!!
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg
Moving /tmp/tmp.GW5DOJr0rG/eng/eng.Times_New_Roman.exp0.lstmf to 
/home/zohreh/tesstutorial/engtrian

Completed training for language 'eng'
and i dont have eng.config my langdata . i clone langdata from git's 
tesseract


On Saturday, August 5, 2017 at 5:50:59 PM UTC+4:30, shree wrote:
>
> ​tesseract -v
> tesseract 4.00.00dev-594-g044e06e-2085
>  leptonica-1.74.4
>   libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 
> 1.2.8
>
>  Found AVX
>  Found SSE
>
>
> The above version is working ok on linux
>
>  nice lstmtraining \
>--old_traineddata ../tessdata/best/san.traineddata \
>   --continue_from ../tessdata/best/san.lstm \
>--traineddata ../tesstutorial/vedic/san/san.traineddata  \
>--train_listfile ../tesstutorial/vedic/san.training_files.txt \
>--eval_listfile ../tesstutorial/vedic/san.eval_files.txt \
>   --model_output ../tesstutorial/vedic/santune \
>   --max_iterations 200 \
>--debug_interval 0
>
> Loaded file ../tessdata/best/san.lstm, unpacking...
> Warning: LSTMTrainer deserialized an LSTMRecognizer!
> Code range changed from 145 to 2308!!
> Num (Extended) outputs,weights in Series:
>   1,36,0,1:1, 0
> Num (Extended) outputs,weights in Series:
>   C3,3:9, 0
>   Ft16:16, 160
> Total weights = 160
>   [C3,3Ft16]:16, 160
>   Mp3,3:16, 0
>   Lfys48:48, 12480
>   Lfx96:96, 55680
>   Lrx96:96, 74112
>   Lfx192:192, 221952
>   Fc2308:2308, 445444
> Total weights = 809828
> Previous null char=2 mapped to 2
> Continuing from ../tessdata/best/san.lstm
> Loaded 138/138 pages (1-138) of document 
> ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp0.lstmf
> Loaded 138/138 pages (1-138) of document 
> ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp-1.lstmf
> Loaded 138/138 pages (1-138) of document 
> ../tesstutorial/vedic/san.Adobe_Devanagari.exp-2.lstmf
> Loaded 138/138 pages (1-138) of document 
> ../tesstutorial/vedic/san.Adobe_Devanagari.exp1.lstmf
>
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Sat, Aug 5, 2017 at 6:43 PM, ShreeDevi Kumar <shree...@gmail.com 
> > wrote:
>
>> did you build the training tools again?
>>
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Sat, Aug 5, 2017 at 6:37 PM, Ava Nimaee <beigy@gmail.com 
>> > wrote:
>>
>>> yes, you said me and i clone last tesseract-master and insatll it and 
>>> leptoica again and make tiff and box file and unicharest and then use this 
>>> syntax:
>>> training/tesstrain.sh \
>>>   --fonts_dir /usr/share/fonts \
>>>   --lang eng  \
>>>   --training_text langdata/eng/eng.training_text \
>>>   --linedata_only \
>>>   --noextract_font_properties  --langdata_dir langdata \
>>>   --tessdata_dir ./tessdata \
>>>   --fontlist "Times New Roman," \
>>>   --output_dir tesstutorial/engtrian
>>> 
>>> training/tesstrain.sh \
>>>   --fonts_dir /usr/share/fonts \
>>>   --lang eng  \
>>>   --training_text langdata/eng/eng.training_text \
>>>   --linedata_only \
>>>   --noextract_font_properties  --langdata_dir langdata \
>>>   --tessdata_dir ./tessdata \
>>>   --output_dir tesstutorial/engeval
>>> and finally i use the last code that i said took error.
>>> and for last syntax i put langdata/eng on folder of engtrian
>>>
>>>
>>> On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote:
>>>>
>>>> Are you using the latest source of programs from github for building 
>>>> tesseract?
>>>>

[tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

2017-08-07 Thread Ava Nimaee
hi how can you solve it? i have this error too.
please help me

On Friday, August 4, 2017 at 11:03:41 AM UTC+4:30, roberty...@gmail.com 
wrote:
>
> Hello,
>
> I use the 'git pull' command to update the code from the link 
> https://github.com/tesseract-ocr/tesseract.git, and I recompile, 
> reinstall the Tess4.0.
>
> But when I execute the command (showed in below) to finetune the 
> traineddata, an error appears: 
> "mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file 
> ../lstm/lstmtrainer.h, line 110"
>
> lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
> --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
> --train_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \
> --eval_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \
> --target_error_rate 0.01
>
>
>
> There is nothing wrong with the Tess before updating the code. But now, An 
> assertion error crashes. Why? Can you help me?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7c66d368-f232-4eed-abfc-3bba2418f024%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-07 Thread Ava Nimaee
I'll do thank you

On Monday, August 7, 2017 at 12:38:39 PM UTC+4:30, shree wrote:
>
> There have been changes since then.
>
> Either update your git repository via
>
> git pull origin
>
> or 
>
> clone it again.
>
> ​
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Mon, Aug 7, 2017 at 12:26 PM, Ava Nimaee <beigy@gmail.com 
> > wrote:
>
>>  about 3 weeks ago
>>
>>
>> On Sunday, August 6, 2017 at 7:59:44 AM UTC+4:30, shree wrote:
>>>
>>> >Invalid format in radical table at line 4: 3400 1.4
>>>
>>> When did you clone langdata?
>>>
>>> Ray has updated radical-stroke.txt 11 days ago - see 
>>> https://github.com/tesseract-ocr/langdata/commit/3e32be3dc07be0994f3687664a44cb3246b5aa11
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Sat, Aug 5, 2017 at 10:56 PM, Ava Nimaee <beigy@gmail.com> wrote:
>>>
>>>> thank for your attention
>>>> i remove all and install again last version tesseract and leptonica and 
>>>> use this syntax
>>>> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng   
>>>>  --training_text training/langdata/eng/eng.training_text 
>>>> --linedata_only \
>>>>   --noextract_font_properties --langdata_dir training/langdata \
>>>>   --tessdata_dir ./tessdata \
>>>>   --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian
>>>>
>>>> but got a new error. all of things is ok but at the end took this:
>>>>
>>>> Setting unichar properties
>>>> Other case É of é is not in unicharset
>>>> Setting script properties
>>>> Failed to read data from: training/langdata/eng/eng.config
>>>> Null char=2
>>>> Invalid format in radical table at line 4: 3400 1.4
>>>> Creation of encoded unicharset failed!!
>>>> Error writing recoder!!
>>>> Reducing Trie to SquishedDawg
>>>> Reducing Trie to SquishedDawg
>>>> Reducing Trie to SquishedDawg
>>>> Moving /tmp/tmp.GW5DOJr0rG/eng/eng.Times_New_Roman.exp0.lstmf to 
>>>> /home/zohreh/tesstutorial/engtrian
>>>>
>>>> Completed training for language 'eng'
>>>> and i dont have eng.config my langdata . i clone langdata from git's 
>>>> tesseract
>>>>
>>>>
>>>> On Saturday, August 5, 2017 at 5:50:59 PM UTC+4:30, shree wrote:
>>>>>
>>>>> ​tesseract -v
>>>>> tesseract 4.00.00dev-594-g044e06e-2085
>>>>>  leptonica-1.74.4
>>>>>   libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : 
>>>>> zlib 1.2.8
>>>>>
>>>>>  Found AVX
>>>>>  Found SSE
>>>>>
>>>>>
>>>>> The above version is working ok on linux
>>>>>
>>>>>  nice lstmtraining \
>>>>>--old_traineddata ../tessdata/best/san.traineddata \
>>>>>   --continue_from ../tessdata/best/san.lstm \
>>>>>--traineddata ../tesstutorial/vedic/san/san.traineddata  \
>>>>>--train_listfile ../tesstutorial/vedic/san.training_files.txt \
>>>>>--eval_listfile ../tesstutorial/vedic/san.eval_files.txt \
>>>>>   --model_output ../tesstutorial/vedic/santune \
>>>>>   --max_iterations 200 \
>>>>>--debug_interval 0
>>>>>
>>>>> Loaded file ../tessdata/best/san.lstm, unpacking...
>>>>> Warning: LSTMTrainer deserialized an LSTMRecognizer!
>>>>> Code range changed from 145 to 2308!!
>>>>> Num (Extended) outputs,weights in Series:
>>>>>   1,36,0,1:1, 0
>>>>> Num (Extended) outputs,weights in Series:
>>>>>   C3,3:9, 0
>>>>>   Ft16:16, 160
>>>>> Total weights = 160
>>>>>   [C3,3Ft16]:16, 160
>>>>>   Mp3,3:16, 0
>>>>>   Lfys48:48, 12480
>>>>>   Lfx96:96, 55680
>>>>>   Lrx96:96, 74112
>>>>>   Lfx192:192, 221952
>>>>>   Fc2308:2308, 445444
>>>>> Total weights = 809828
>>>>> Previous null char=2 mapped to 2
>>>>> Continuing from ../tessdata/best/san.lstm
>>>>> Loaded 13

Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread Ava Nimaee
yes, you said me and i clone last tesseract-master and insatll it and 
leptoica again and make tiff and box file and unicharest and then use this 
syntax:
training/tesstrain.sh \
  --fonts_dir /usr/share/fonts \
  --lang eng  \
  --training_text langdata/eng/eng.training_text \
  --linedata_only \
  --noextract_font_properties  --langdata_dir langdata \
  --tessdata_dir ./tessdata \
  --fontlist "Times New Roman," \
  --output_dir tesstutorial/engtrian

training/tesstrain.sh \
  --fonts_dir /usr/share/fonts \
  --lang eng  \
  --training_text langdata/eng/eng.training_text \
  --linedata_only \
  --noextract_font_properties  --langdata_dir langdata \
  --tessdata_dir ./tessdata \
  --output_dir tesstutorial/engeval
and finally i use the last code that i said took error.
and for last syntax i put langdata/eng on folder of engtrian


On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote:
>
> Are you using the latest source of programs from github for building 
> tesseract?
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Sat, Aug 5, 2017 at 6:21 PM, Ava Nimaee <beigy@gmail.com 
> > wrote:
>
>> Hi 
>> i used this syntax:
>>
>> training/lstmtraining --debug_interval 100 \
>>   --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
>>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
>>   --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
>>   --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
>>   --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
>>   --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log
>>
>> and put eng.traineddata on right path but has an error:
>>
>> ERROR: Non-existent flag --traineddata
>>
>> can you help me?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a9e00cdf-64d2-4cfe-9ff8-de931c34d798%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread Ava Nimaee
yes but i just cant install those syntax

make ScrollView.jar
export SCROLLVIEW_PATH=$PWD/java


On Saturday, August 5, 2017 at 5:44:20 PM UTC+4:30, shree wrote:
>
> did you build the training tools again?
>
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Sat, Aug 5, 2017 at 6:37 PM, Ava Nimaee <beigy@gmail.com 
> > wrote:
>
>> yes, you said me and i clone last tesseract-master and insatll it and 
>> leptoica again and make tiff and box file and unicharest and then use this 
>> syntax:
>> training/tesstrain.sh \
>>   --fonts_dir /usr/share/fonts \
>>   --lang eng  \
>>   --training_text langdata/eng/eng.training_text \
>>   --linedata_only \
>>   --noextract_font_properties  --langdata_dir langdata \
>>   --tessdata_dir ./tessdata \
>>   --fontlist "Times New Roman," \
>>   --output_dir tesstutorial/engtrian
>> 
>> training/tesstrain.sh \
>>   --fonts_dir /usr/share/fonts \
>>   --lang eng  \
>>   --training_text langdata/eng/eng.training_text \
>>   --linedata_only \
>>   --noextract_font_properties  --langdata_dir langdata \
>>   --tessdata_dir ./tessdata \
>>   --output_dir tesstutorial/engeval
>> and finally i use the last code that i said took error.
>> and for last syntax i put langdata/eng on folder of engtrian
>>
>>
>> On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote:
>>>
>>> Are you using the latest source of programs from github for building 
>>> tesseract?
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Sat, Aug 5, 2017 at 6:21 PM, Ava Nimaee <beigy@gmail.com> wrote:
>>>
>>>> Hi 
>>>> i used this syntax:
>>>>
>>>> training/lstmtraining --debug_interval 100 \
>>>>   --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
>>>>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
>>>>   --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
>>>>   --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
>>>>   --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
>>>>   --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log
>>>>
>>>> and put eng.traineddata on right path but has an error:
>>>>
>>>> ERROR: Non-existent flag --traineddata
>>>>
>>>> can you help me?
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com?utm_medium=email_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/a9e00cdf-64d2-4cfe-9ff8-de931c34d798%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/a9e00cdf-64d2-4cfe-9ff8-de931c34d798%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2b28aeff-5f90-4353-bee7-c8fe001a36cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread Ava Nimaee
thanks alot i try again

On Saturday, August 5, 2017 at 5:50:59 PM UTC+4:30, shree wrote:
>
> ​tesseract -v
> tesseract 4.00.00dev-594-g044e06e-2085
>  leptonica-1.74.4
>   libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 
> 1.2.8
>
>  Found AVX
>  Found SSE
>
>
> The above version is working ok on linux
>
>  nice lstmtraining \
>--old_traineddata ../tessdata/best/san.traineddata \
>   --continue_from ../tessdata/best/san.lstm \
>--traineddata ../tesstutorial/vedic/san/san.traineddata  \
>--train_listfile ../tesstutorial/vedic/san.training_files.txt \
>--eval_listfile ../tesstutorial/vedic/san.eval_files.txt \
>   --model_output ../tesstutorial/vedic/santune \
>   --max_iterations 200 \
>--debug_interval 0
>
> Loaded file ../tessdata/best/san.lstm, unpacking...
> Warning: LSTMTrainer deserialized an LSTMRecognizer!
> Code range changed from 145 to 2308!!
> Num (Extended) outputs,weights in Series:
>   1,36,0,1:1, 0
> Num (Extended) outputs,weights in Series:
>   C3,3:9, 0
>   Ft16:16, 160
> Total weights = 160
>   [C3,3Ft16]:16, 160
>   Mp3,3:16, 0
>   Lfys48:48, 12480
>   Lfx96:96, 55680
>   Lrx96:96, 74112
>   Lfx192:192, 221952
>   Fc2308:2308, 445444
> Total weights = 809828
> Previous null char=2 mapped to 2
> Continuing from ../tessdata/best/san.lstm
> Loaded 138/138 pages (1-138) of document 
> ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp0.lstmf
> Loaded 138/138 pages (1-138) of document 
> ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp-1.lstmf
> Loaded 138/138 pages (1-138) of document 
> ../tesstutorial/vedic/san.Adobe_Devanagari.exp-2.lstmf
> Loaded 138/138 pages (1-138) of document 
> ../tesstutorial/vedic/san.Adobe_Devanagari.exp1.lstmf
>
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Sat, Aug 5, 2017 at 6:43 PM, ShreeDevi Kumar <shree...@gmail.com 
> > wrote:
>
>> did you build the training tools again?
>>
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Sat, Aug 5, 2017 at 6:37 PM, Ava Nimaee <beigy@gmail.com 
>> > wrote:
>>
>>> yes, you said me and i clone last tesseract-master and insatll it and 
>>> leptoica again and make tiff and box file and unicharest and then use this 
>>> syntax:
>>> training/tesstrain.sh \
>>>   --fonts_dir /usr/share/fonts \
>>>   --lang eng  \
>>>   --training_text langdata/eng/eng.training_text \
>>>   --linedata_only \
>>>   --noextract_font_properties  --langdata_dir langdata \
>>>   --tessdata_dir ./tessdata \
>>>   --fontlist "Times New Roman," \
>>>   --output_dir tesstutorial/engtrian
>>> 
>>> training/tesstrain.sh \
>>>   --fonts_dir /usr/share/fonts \
>>>   --lang eng  \
>>>   --training_text langdata/eng/eng.training_text \
>>>   --linedata_only \
>>>   --noextract_font_properties  --langdata_dir langdata \
>>>   --tessdata_dir ./tessdata \
>>>   --output_dir tesstutorial/engeval
>>> and finally i use the last code that i said took error.
>>> and for last syntax i put langdata/eng on folder of engtrian
>>>
>>>
>>> On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote:
>>>>
>>>> Are you using the latest source of programs from github for building 
>>>> tesseract?
>>>>
>>>> ShreeDevi
>>>> 
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> On Sat, Aug 5, 2017 at 6:21 PM, Ava Nimaee <beigy@gmail.com> wrote:
>>>>
>>>>> Hi 
>>>>> i used this syntax:
>>>>>
>>>>> training/lstmtraining --debug_interval 100 \
>>>>>   --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
>>>>>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' 
>>>>> \
>>>>>   --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
>>>>>   --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
>>>>>   --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
>>>>>   --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log
>>>>>
>>>>> an

Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread Ava Nimaee
i'm using linux ubuntu 16.04

On Saturday, August 5, 2017 at 5:57:01 PM UTC+4:30, shree wrote:
>
> Are you using linux or windows?
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Sat, Aug 5, 2017 at 6:55 PM, Ava Nimaee <beigy@gmail.com 
> > wrote:
>
>> thanks alot i try again
>>
>>
>> On Saturday, August 5, 2017 at 5:50:59 PM UTC+4:30, shree wrote:
>>
>>> ​tesseract -v
>>> tesseract 4.00.00dev-594-g044e06e-2085
>>>  leptonica-1.74.4
>>>   libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : 
>>> zlib 1.2.8
>>>
>>>  Found AVX
>>>  Found SSE
>>>
>>>
>>> The above version is working ok on linux
>>>
>>>  nice lstmtraining \
>>>--old_traineddata ../tessdata/best/san.traineddata \
>>>   --continue_from ../tessdata/best/san.lstm \
>>>--traineddata ../tesstutorial/vedic/san/san.traineddata  \
>>>--train_listfile ../tesstutorial/vedic/san.training_files.txt \
>>>--eval_listfile ../tesstutorial/vedic/san.eval_files.txt \
>>>   --model_output ../tesstutorial/vedic/santune \
>>>   --max_iterations 200 \
>>>--debug_interval 0
>>>
>>> Loaded file ../tessdata/best/san.lstm, unpacking...
>>> Warning: LSTMTrainer deserialized an LSTMRecognizer!
>>> Code range changed from 145 to 2308!!
>>> Num (Extended) outputs,weights in Series:
>>>   1,36,0,1:1, 0
>>> Num (Extended) outputs,weights in Series:
>>>   C3,3:9, 0
>>>   Ft16:16, 160
>>> Total weights = 160
>>>   [C3,3Ft16]:16, 160
>>>   Mp3,3:16, 0
>>>   Lfys48:48, 12480
>>>   Lfx96:96, 55680
>>>   Lrx96:96, 74112
>>>   Lfx192:192, 221952
>>>   Fc2308:2308, 445444
>>> Total weights = 809828
>>> Previous null char=2 mapped to 2
>>> Continuing from ../tessdata/best/san.lstm
>>> Loaded 138/138 pages (1-138) of document 
>>> ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp0.lstmf
>>> Loaded 138/138 pages (1-138) of document 
>>> ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp-1.lstmf
>>> Loaded 138/138 pages (1-138) of document 
>>> ../tesstutorial/vedic/san.Adobe_Devanagari.exp-2.lstmf
>>> Loaded 138/138 pages (1-138) of document 
>>> ../tesstutorial/vedic/san.Adobe_Devanagari.exp1.lstmf
>>>
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Sat, Aug 5, 2017 at 6:43 PM, ShreeDevi Kumar <shree...@gmail.com> 
>>> wrote:
>>>
>>>> did you build the training tools again?
>>>>
>>>>
>>>> ShreeDevi
>>>> 
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> On Sat, Aug 5, 2017 at 6:37 PM, Ava Nimaee <beigy@gmail.com> wrote:
>>>>
>>>>> yes, you said me and i clone last tesseract-master and insatll it and 
>>>>> leptoica again and make tiff and box file and unicharest and then use 
>>>>> this 
>>>>> syntax:
>>>>> training/tesstrain.sh \
>>>>>   --fonts_dir /usr/share/fonts \
>>>>>   --lang eng  \
>>>>>   --training_text langdata/eng/eng.training_text \
>>>>>   --linedata_only \
>>>>>   --noextract_font_properties  --langdata_dir langdata \
>>>>>   --tessdata_dir ./tessdata \
>>>>>   --fontlist "Times New Roman," \
>>>>>   --output_dir tesstutorial/engtrian
>>>>> 
>>>>> training/tesstrain.sh \
>>>>>   --fonts_dir /usr/share/fonts \
>>>>>   --lang eng  \
>>>>>   --training_text langdata/eng/eng.training_text \
>>>>>   --linedata_only \
>>>>>   --noextract_font_properties  --langdata_dir langdata \
>>>>>   --tessdata_dir ./tessdata \
>>>>>   --output_dir tesstutorial/engeval
>>>>> and finally i use the last code that i said took error.
>>>>> and for last syntax i put langdata/eng on folder of engtrian
>>>>>
>>>>>
>>>>> On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote:
>>>>>>
>>>

[tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread Ava Nimaee
Hi 
i used this syntax:

training/lstmtraining --debug_interval 100 \
  --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
  --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
  --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
  --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
  --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
  --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log

and put eng.traineddata on right path but has an error:

ERROR: Non-existent flag --traineddata

can you help me?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Failed to load list of training filenames from

2017-08-05 Thread Ava Nimaee
we tried but for some word and font, it is not so good and we decied train 
it 

On Friday, August 4, 2017 at 7:30:04 PM UTC+4:30, shree wrote:
>
> Please try the ocr with new tessdata/best/far.traineddata - farsi - 
> persian and provide your feedback for Ray to improve the training.
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Fri, Aug 4, 2017 at 6:40 PM, Ava Nimaee <beigy@gmail.com 
> > wrote:
>
>> Thanks alot.
>> Im so sorry beacuse i strart train tesseract 4.0 for persian and i dont 
>> have any experiance about it. i've tried alot. but i face alot of error.
>> Many thanks for your assistance in our project 
>>
>> On Friday, August 4, 2017 at 4:12:34 PM UTC+4:30, shree wrote:
>>>
>>> ​Please check tesseract training wiki for new instructions.
>>>
>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>>
>>> Use the latest code from github.​
>>>
>>> ShreeDevi
>>> ________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Fri, Aug 4, 2017 at 5:03 PM, Ava Nimaee <beigy@gmail.com> wrote:
>>>
>>>> Hi sorry i have an error 
>>>> can you help me?
>>>> I use this syntax:
>>>> lstmtraining -U ../tesstutorial/englayer_from_eng/eng.unicharset \
>>>>   --script_dir langdata --debug_interval 0 \
>>>>   --continue_from   ../tesstutorial/englayer_from_eng/eng.lstm \
>>>>   --append_index 5 --net_spec '[Lfx256 O1c105]' \
>>>>   --model_output ../tesstutorial/englayer_from_eng/englayer \
>>>>   --train_listfile ../tesstutorial/engtrain/eng.training_files.txt \
>>>>   --eval_listfile ../tesstutorial/engeval/eng.training_files.txt \
>>>>   --max_iterations 5
>>>> but take an error :
>>>> Failed to load list of training filenames from 
>>>> ../tesstutorial/engtrain/eng.training_files.txt
>>>>
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/986d017a-b04a-442b-8cfe-877aed950858%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/986d017a-b04a-442b-8cfe-877aed950858%40googlegroups.com?utm_medium=email_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/406f2fde--4f86-b152-0b4358eaaeb7%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/406f2fde--4f86-b152-0b4358eaaeb7%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/03bcdac4-ab33-41d7-9428-3799d03e7e46%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Failed to load list of training filenames from

2017-08-04 Thread Ava Nimaee
Hi sorry i have an error 
can you help me?
I use this syntax:
lstmtraining -U ../tesstutorial/englayer_from_eng/eng.unicharset \
  --script_dir langdata --debug_interval 0 \
  --continue_from   ../tesstutorial/englayer_from_eng/eng.lstm \
  --append_index 5 --net_spec '[Lfx256 O1c105]' \
  --model_output ../tesstutorial/englayer_from_eng/englayer \
  --train_listfile ../tesstutorial/engtrain/eng.training_files.txt \
  --eval_listfile ../tesstutorial/engeval/eng.training_files.txt \
  --max_iterations 5
but take an error :
Failed to load list of training filenames from 
../tesstutorial/engtrain/eng.training_files.txt


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/986d017a-b04a-442b-8cfe-877aed950858%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Failed to load list of training filenames from

2017-08-04 Thread Ava Nimaee
Thanks alot.
Im so sorry beacuse i strart train tesseract 4.0 for persian and i dont 
have any experiance about it. i've tried alot. but i face alot of error.
Many thanks for your assistance in our project 

On Friday, August 4, 2017 at 4:12:34 PM UTC+4:30, shree wrote:
>
> ​Please check tesseract training wiki for new instructions.
>
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>
> Use the latest code from github.​
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Fri, Aug 4, 2017 at 5:03 PM, Ava Nimaee <beigy@gmail.com 
> > wrote:
>
>> Hi sorry i have an error 
>> can you help me?
>> I use this syntax:
>> lstmtraining -U ../tesstutorial/englayer_from_eng/eng.unicharset \
>>   --script_dir langdata --debug_interval 0 \
>>   --continue_from   ../tesstutorial/englayer_from_eng/eng.lstm \
>>   --append_index 5 --net_spec '[Lfx256 O1c105]' \
>>   --model_output ../tesstutorial/englayer_from_eng/englayer \
>>   --train_listfile ../tesstutorial/engtrain/eng.training_files.txt \
>>   --eval_listfile ../tesstutorial/engeval/eng.training_files.txt \
>>   --max_iterations 5
>> but take an error :
>> Failed to load list of training filenames from 
>> ../tesstutorial/engtrain/eng.training_files.txt
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/986d017a-b04a-442b-8cfe-877aed950858%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/986d017a-b04a-442b-8cfe-877aed950858%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/406f2fde--4f86-b152-0b4358eaaeb7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] train a new font for language of persian

2017-05-04 Thread Ava Nimaee
hi every one. i want start to use tesseract to first. i need learn about 
where i shuld start? i want train a new font for persian language .but i 
have been confused.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ca426caf-727c-42c6-93f4-17f2b4ac12bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

2017-08-21 Thread Ava Nimaee
 Hi shree, Thanks alot for attention.
i corrected all syntax and i can generate some base70.229_1900.checkpoint 
and have just files hike it.
but in tutorials, there is eng.lstm. how can i create it . actually what is 
eng.lstm.
and what is lstm-punc-dawg? it is similar eng.punc's file that Mr.Smit put 
in landgata/eng?

On Wednesday, August 16, 2017 at 8:07:47 PM UTC+4:30, shree wrote:
>
> Please check the updated tutorials in the wiki. There have been many 
> changes.
>
> On 16-Aug-2017 3:50 PM, "Ava Nimaee" <beigy@gmail.com > 
> wrote:
>
>> sorry i have a qustion:
>> what is the output of this syntax.because i after that i have alot of 
>>  base44.409_2195.checkpoint. but in tutorials i saw eng.lstm
>> and i have not that. whic syntax create eng.lstm?
>>
>> I must thank you for your support at this time
>>
>>
>> On Wednesday, August 16, 2017 at 5:50:18 AM UTC+4:30, 
>> roberty...@gmail.com wrote:
>>>
>>> Hi, I don't encounter this error.
>>>
>>> But you may check your traineddata whether in the correct directory, as 
>>> well as some other paths.
>>>
>>> 在 2017年8月15日星期二 UTC+8下午5:45:17,Ava Nimaee写道:
>>>>
>>>> Hi thanks for your help
>>>> i used your link. but i got this error:
>>>> mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file 
>>>> ../lstm/lstmtrainer.h, line 110
>>>> Segmentation fault (core dumped)
>>>> I wanna start train persian language.so im trying english first. i 
>>>> creat boxfile and unicharset .then eng.charset_size=110.txt 
>>>> ,eng.Times_New_Roman.exp0.lstmf , eng.traineddata , eng.training_files.txt 
>>>> , eng.unicharset
>>>> that all of those have created with this syntax:
>>>> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng   
>>>>  --training_text training/langdata/eng/eng.training_text 
>>>> --linedata_only \
>>>>   --noextract_font_properties --langdata_dir training/langdata \
>>>>   --tessdata_dir ./tessdata \
>>>>   --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian
>>>> and now i have error that i told you
>>>>
>>>> On Monday, August 14, 2017 at 1:00:02 PM UTC+4:30, roberty...@gmail.com 
>>>> wrote:
>>>>>
>>>>>  What problems do you encounter? Please give more information about 
>>>>> the problems.
>>>>>
>>>>> I later used the new tutorial (
>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact)
>>>>>  
>>>>> to train data, and I didn't have any problems. Hope to help you.
>>>>>
>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/df771d0a-f104-4f0b-9628-f281f8c3da3f%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/df771d0a-f104-4f0b-9628-f281f8c3da3f%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d8720e3f-39eb-4171-9993-a81e4a8b0105%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

2017-08-28 Thread Ava Nimaee
Hi shree
I read instructions on the training wiki page but i dont have eng.lstm
non of the syntaxs create eng.lstm. how can i create it. even i check 
langdata which i download it form git amd there is't there.
i spend alot of time but i don't khonw how i can create it.
can you tell me.

On Monday, August 21, 2017 at 7:41:41 PM UTC+4:30, shree wrote:
>
> lstm file is the language model. It is saved in traineddata file.
>
> dawgs are a kind of compressed files, created from lists of words, 
> punctuation or numbers.
>
> You can use dawg2wordlist to unpack them.
>
> Please follow the instructions on the training wiki page.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b671d71c-181d-4cac-8def-122c74a0af12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] create unicharset for RTL language

2017-08-31 Thread Ava Nimaee
Hi i need your help
i need to create boxfile and unicharset for Persian language. i used the 
syntax that i used for Latin. but the results are revers. could you please 
tell me how do i  do this? 
thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c086eee3-2d3e-4ec0-9ad2-6550dadbd753%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Create boxfile and unicharset for RTL language

2017-08-31 Thread Ava Nimaee
Hi i need your help
i need to create boxfile and unicharset for Persian language. i used the 
syntax that i used for Latin. but the results are revers. could you please 
tell me how do i  do this? 
thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/42bf0393-8b56-43c2-b88d-af68b4967c71%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] unicharset and boxfile for tesseract 4

2017-09-04 Thread Ava Nimaee
Hi
i want know about unicharset and box file in tesseract 4 for RTL script.
i trained but the result is not good.can anyone give me the link about 
it?and also xheight

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2ede74a5-15a5-454f-b87a-48a6614942b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Create boxfile and unicharset for RTL language

2017-09-01 Thread Ava Nimaee
I understand just difference RTL language with LTR is at unicharset.
i create unichraset with its tool but how can i create xheight for persian. 
there is my unicharset after convert it to RTL
36
NULL 0 NULL 0
Joined 7 0,69,188,255,486,1218,0,30,486,1188 Latin 1 0 1 Joined # Joined 
[4a 6f 69 6e 65 64 ]a
|Broken|0|1 f 0,69,186,255,892,2138,0,80,892,2058 Common 2 10 2 |Broken|0|1 # 
Broken
س‍ 1 0,255,0,255,0,0,0,0,0,0 Arabic 3 13 3 س‍ # س‍ [633 200d ]x
‍ل‍ 1 0,255,0,255,0,0,0,0,0,0 Inherited 4 18 4 ‍ل‍ # ‍ل‍ [200d 644 200d ]x
‍ا 1 0,255,0,255,0,0,0,0,0,0 Inherited 5 18 5 ‍ا # ‍ا [200d 627 ]x
م 1 0,64,134,241,51,272,0,46,56,313 Arabic 6 13 6 م # م [645 ]x
ع‍ 1 0,255,0,255,0,0,0,0,0,0 Arabic 7 13 7 ع‍ # ع‍ [639 200d ]x
‍ی‍ 1 0,255,0,255,0,0,0,0,0,0 Inherited 8 18 8 ‍ی‍ # ‍ی‍ [200d 6cc 200d ]x
‍ک‍ 1 0,255,0,255,0,0,0,0,0,0 Inherited 9 18 9 ‍ک‍ # ‍ک‍ [200d 6a9 200d ]x
‍م 1 0,255,0,255,0,0,0,0,0,0 Inherited 10 18 10 ‍م # ‍م [200d 645 ]x
م‍ 1 0,255,0,255,0,0,0,0,0,0 Arabic 11 13 11 م‍ # م‍ [645 200d ]x
‍ه‍ 1 0,255,0,255,0,0,0,0,0,0 Inherited 12 18 12 ‍ه‍ # ‍ه‍ [200d 647 200d ]x
‍ذ 1 0,255,0,255,0,0,0,0,0,0 Inherited 13 18 13 ‍ذ # ‍ذ [200d 630 ]x
ا 1 26,117,200,255,11,181,7,82,33,222 Arabic 14 13 14 ا # ا [627 ]x
ک‍ 1 0,255,0,255,0,0,0,0,0,0 Arabic 15 13 15 ک‍ # ک‍ [6a9 200d ]x
‍ج‍ 1 0,255,0,255,0,0,0,0,0,0 Inherited 16 18 16 ‍ج‍ # ‍ج‍ [200d 62c 200d ]x
ی‍ 1 0,255,0,255,0,0,0,0,0,0 Arabic 17 13 17 ی‍ # ی‍ [6cc 200d ]x
‍ی 1 0,255,0,255,0,0,0,0,0,0 Inherited 18 18 18 ‍ی # ‍ی [200d 6cc ]x
ش‍ 1 0,255,0,255,0,0,0,0,0,0 Arabic 19 13 19 ش‍ # ش‍ [634 200d ]x
‍م‍ 1 0,255,0,255,0,0,0,0,0,0 Inherited 20 18 20 ‍م‍ # ‍م‍ [200d 645 200d ]x
ل‍ 1 0,255,0,255,0,0,0,0,0,0 Arabic 21 13 21 ل‍ # ل‍ [644 200d ]x
‍ن 1 0,255,0,255,0,0,0,0,0,0 Inherited 22 18 22 ‍ن # ‍ن [200d 646 ]x
‍ب‍ 1 0,255,0,255,0,0,0,0,0,0 Inherited 23 18 23 ‍ب‍ # ‍ب‍ [200d 628 200d ]x
‍ز 1 0,255,0,255,0,0,0,0,0,0 Inherited 24 18 24 ‍ز # ‍ز [200d 632 ]x
‍ت 1 0,255,0,255,0,0,0,0,0,0 Inherited 25 18 25 ‍ت # ‍ت [200d 62a ]x
. 10 12,108,64,140,18,52,9,77,52,193 Common 26 6 26 . # . [2e ]p
و 1 0,68,137,238,65,290,0,27,62,256 Arabic 27 13 27 و # و [648 ]x
ن‍ 1 0,255,0,255,0,0,0,0,0,0 Arabic 28 13 28 ن‍ # ن‍ [646 200d ]x
‍س‍ 1 0,255,0,255,0,0,0,0,0,0 Inherited 29 18 29 ‍س‍ # ‍س‍ [200d 633 200d ]x
ن 1 0,88,163,255,68,321,0,52,76,354 Arabic 30 13 30 ن # ن [646 ]x
ب‍ 1 0,255,0,255,0,0,0,0,0,0 Arabic 31 13 31 ب‍ # ب‍ [628 200d ]x
‍و 1 0,255,0,255,0,0,0,0,0,0 Inherited 32 18 32 ‍و # ‍و [200d 648 ]x
پ‍ 1 0,255,0,255,0,0,0,0,0,0 Arabic 33 13 33 پ‍ # پ‍ [67e 200d ]x
‍ر 1 0,255,0,255,0,0,0,0,0,0 Inherited 34 18 34 ‍ر # ‍ر [200d 631 ]x
ی 1 0,71,148,225,95,253,0,45,103,279 Arabic 35 13 35 ی # ی [6cc ]x
but "Inherited"  don't have any unicharset in langdata and without it train 
is not so good
fpr example i fine tune for "لا".   it is part of "Inherited".
can you please tell me how can i create xheight for persian's font and 
about "Inherited" and also about appropriate RTL flags for persian language.
thanks
On Thursday, August 31, 2017 at 5:32:59 PM UTC+4:30, shree wrote:
>
> Use tesstrain.sh for training.
>
> It should apply the appropriate RTL flags for persian language.
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Thu, Aug 31, 2017 at 2:39 PM, Ava Nimaee <beigy@gmail.com 
> > wrote:
>
>> Hi i need your help
>> i need to create boxfile and unicharset for Persian language. i used the 
>> syntax that i used for Latin. but the results are revers. could you please 
>> tell me how do i  do this? 
>> thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/42bf0393-8b56-43c2-b88d-af68b4967c71%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/42bf0393-8b56-43c2-b88d-af68b4967c71%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d9dc6ac8-9803-4596-adf4-79fcf6fb5559%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Size of font

2018-05-10 Thread Ava Nimaee
Hi,
I have been training Tesseract for Persian language. i have been doing this 
from scratch and now i want to know how  i can set size of font.
In fact the size is important?
 Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0526250f-4111-4bb5-a902-28db099a707e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.