Hello ShreeDevi, I solved this error lstm.train, i have given wrong path.
mkdir -p ~/tesstutorial/engoutput training/lstmtraining *-U ~/tesstutorial/engtrain/eng.unicharset \* --script_dir ../langdata --debug_interval 100 \* --net_spec '[1,36,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 O1c105]' \* --model_output ~/tesstutorial/engoutput/base \ --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log 1)Can u plz tell tell me how to generate unicharset file for my image files after genearting box files with tesseract. 2)And also please clarify about netspec param and what input should be given to it Thanks On Wednesday, April 5, 2017 at 1:59:56 PM UTC+5:30, shree wrote: > > You do not have the LSTM.train config file. > > - excuse the brevity, sent from mobile > > On 05-Apr-2017 1:55 PM, <[email protected] <javascript:>> wrote: > >> After u have said, >> >> I tried in two ways and i am stuck at lstm step: >> >> Training >> >> command used: >> >> /home/p/Documents/T/tesseract-master/training/lstmtraining -U >> /home/p/Documents/T/img_frm_3/eng.unicharset \ >> > --script_dir /home/p/Documents/T/TESS_4_ALPHA/langdata-master >> --debug_interval 100 \ >> > --net_spec '[1,36,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 >> O1c105]' \ >> > --model_output /home/p/Documents/T/ \ >> > --train_listfile /home/p/Documents/T/img_frm_3/eng.ArialBold.exp0.txt >> \ >> > --eval_listfile /home/p/Documents/T/img_frm_3/eng.ArialBold.exp0.txt \ >> > --max_iterations 5000 &>/home/p/Documents/T/basetrain.log >> >> tail -f basetrain.log >> Error getting is : >> >> >> Deserialize header failed: BnO. 005 SUBHISHIs TOWN CENTRE >> Deserialize header failed: MOKILA SHAKARPALLY >> Deserialize header failed: PHONE: 040-8989898989 >> Load of page 0 failed! >> Load of images failed!! >> Deserialize header failed: TIN: 8989898989 >> Deserialize header failed: Station 1D: 01 Time: 03:26:46 PM >> Deserialize header failed: CASHIER ID:; 3001 Date: 21-02-2017 >> Deserialize header failed: (null) >> Deserialize header failed: (null) >> >> >> >> >> >> >> >> >> Fine tuning: >> >> command used:- >> >> /home/plianto/Documents/Tvat/tesseract-master/training/tesstrain.sh >> --fonts_dir /usr/share/fonts --lang eng --linedata_only \ >> --training_text >> /home/plianto/Documents/Tvat/img_frm_3/eng.ArialBold.exp0.txt \ >> --langdata_dir >> /home/plianto/Documents/Tvat/TESS_4_ALPHA/langdata-master --tessdata_dir >> /usr/share/tesseract-ocr/tessdata \ >> --fontlist "Arial Bold" \ >> --output_dir /home/plianto/Documents/Tvat/engoutput/ >> >> error: >> >> === Phase E: Generating lstmf files === >> Using TESSDATA_PREFIX=/usr/share/tesseract-ocr/tessdata >> [Wed Apr 5 13:53:05 IST 2017] /usr/local/bin/tesseract >> /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0.tif >> /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0 lstm.train >> read_params_file: Can't open lstm.train >> Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica >> Page 1 >> ERROR: /tmp/tmp.KTk3WgBTWk/eng/eng.Arial_Bold.exp0.lstmf does not exist >> or is not readable >> >> >> >> >> >> >> >> >> >> On Wednesday, April 5, 2017 at 9:07:40 AM UTC+5:30, shree wrote: >>> >>> Read >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 >>> >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Finetune >>> >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replacing-Top-Layer-Example >>> >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replace-Top-Layer >>> >>> and >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/Documentation >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/Fonts >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/FAQ >>> >>> >>> >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Wed, Apr 5, 2017 at 12:54 AM, <[email protected]> wrote: >>> >>>> Can you please post some experiences in this post, as there are no >>>> posts to train tesseract 4. >>>> >>>> 1)And also, is there any way to add the new trained data file to old >>>> trained data file, without replacing the old file. >>>> 2)If we dont know what font we may get in our images, then how should >>>> we proceed in training the tessract >>>> >>>> On Tuesday, April 4, 2017 at 9:27:06 PM UTC+5:30, Saurabh Srivastav >>>> wrote: >>>>> >>>>> Yes, i trained my tesseract for eng font and make them read the >>>>> characters from image. >>>>> >>>>>> thanks, >>>>>>> Saurabh Srivastav >>>>>>> >>>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/9c88494c-6d80-4b31-b247-dbbacd48bc19%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/9c88494c-6d80-4b31-b247-dbbacd48bc19%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/6e9e098f-da2f-4c4a-a866-24f9938bdb1b%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/6e9e098f-da2f-4c4a-a866-24f9938bdb1b%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cc3eb859-f01a-44eb-bc0a-51c1590767c5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

