Thank you for your help. I have checked it many times. Could you tell me where I am doing wrong? It takes my 3 tiff box pairs for example and copies it into train directoey. Then it overwrites exp0.tif file with randomly generated text and text2image tool. Although 3 tiff box pairs are accepted it only creates lstmf of 1st file generated by text2image and ignores rest. I have attached generate_training_data.sh script. also the screeshot of the folder where lstmf files are generated.
Also one more doubt is when I use lstm.train command a text file also gets generated with lstmf file. I have named image files as per convention tesseract eng.Arial_Regular.exp0.png eng.Arial_Regular.exp0 lstm.train Image is attached above. and two files generated are also attached. On Tuesday, June 18, 2019 at 3:08:19 PM UTC+5:30, shree wrote: > > It should work if your files follow similar naming convention. > > lang.xxxnnn.exp0.tif > lang.xxxnnn.exp0.box > > Where lang is your language code eg. eng > > xxxnnn is any unique random string (fontname in files generated by > text2image) > > > > On Tue, Jun 18, 2019 at 2:54 PM hrishikesh kaulwar <hpka...@gmail.com > <javascript:>> wrote: > >> Greetings, >> I just got to know that tesstrain.sh is modified to support user >> provided box/tiff pairs by adding a tiff/box directory flag. I used that >> version of tesseract source to use my own tiff/box pairs. But when I ran >> tesstrain.sh I got to know that it just copies tiff/box pairs provided by >> me to training directory but .lstmf file is generated from >> eng.training_text file. My tiff/box pairs are not getting used in creating >> training data. Can someone point out what mistake I am making? or some way >> to only use user provided tiff/box pairs to create training data? >> Thanks in advance. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesser...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/f49566cf-0b6c-4b84-8c47-014ee31d3f60%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/f49566cf-0b6c-4b84-8c47-014ee31d3f60%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/84b0f7d4-b98a-487c-a300-e32a7b5fdc59%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
eng.Arial_Regular.exp0.lstmf
Description: Binary data
rm -rf train/* tesstrain.sh --fonts_dir fonts \ --fontlist 'Arial Regular' \ --lang eng \ --linedata_only \ --langdata_dir langdata_lstm \ --tessdata_dir /usr/local/share/tessdata \ --my_boxtiff_dir tiff \ --save_box_tiff \ --maxpages 2 \ --output_dir train