Thank you for your help. I have checked it many times. Could you tell me 
where I am doing wrong? It takes my 3 tiff box pairs for example and copies 
it into train directoey. Then it overwrites exp0.tif file with randomly 
generated text and text2image tool. Although 3 tiff box pairs are accepted 
it only creates lstmf of 1st file generated by text2image and ignores rest. 
I have attached generate_training_data.sh script. also the screeshot of the 
folder where lstmf files are generated.

Also one more doubt is when I use lstm.train command a text file also gets 
generated with lstmf file.
I have named image files as per convention
tesseract  eng.Arial_Regular.exp0.png eng.Arial_Regular.exp0 lstm.train
Image is attached above. and two files generated are also attached.
On Tuesday, June 18, 2019 at 3:08:19 PM UTC+5:30, shree wrote:
>
> It should work if your files follow similar naming convention.
>
> lang.xxxnnn.exp0.tif
> lang.xxxnnn.exp0.box
>
> Where lang is your language code eg. eng
>
> xxxnnn is any unique random string (fontname in files generated by 
> text2image)
>
>   
>
> On Tue, Jun 18, 2019 at 2:54 PM hrishikesh kaulwar <hpka...@gmail.com 
> <javascript:>> wrote:
>
>> Greetings,
>>     I just got to know that tesstrain.sh is modified to support user 
>> provided box/tiff pairs by adding a tiff/box directory flag. I used that 
>> version of tesseract source to use my own tiff/box pairs. But when I ran 
>> tesstrain.sh I got to know that it just copies tiff/box pairs provided by 
>> me to training directory but .lstmf file is generated from 
>> eng.training_text file. My tiff/box pairs are not getting used in creating 
>> training data. Can someone point out what mistake I am making? or some way 
>> to only use user provided tiff/box pairs to create training data?
>>  Thanks in advance.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/f49566cf-0b6c-4b84-8c47-014ee31d3f60%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/f49566cf-0b6c-4b84-8c47-014ee31d3f60%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/84b0f7d4-b98a-487c-a300-e32a7b5fdc59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
 
 


Attachment: eng.Arial_Regular.exp0.lstmf
Description: Binary data

rm -rf train/*
tesstrain.sh --fonts_dir fonts \
	     --fontlist 'Arial Regular' \
	     --lang eng \
	     --linedata_only \
	     --langdata_dir langdata_lstm \
	     --tessdata_dir /usr/local/share/tessdata \
	     --my_boxtiff_dir tiff \
             --save_box_tiff \
	     --maxpages 2 \
	     --output_dir train

Reply via email to