I have trained tesseract 3 with 64 fonts using respective box and .tr 
files, But now i want to use the same trained data for training tesseract 4 
after creating the starter trained data using the "Using tesstrain

The setup for running tesstrain.sh is the same as for base Tesseract. Use 
--linedata_only option for LSTM training. Note that it is beneficial to 
have more training text and make more pages though, as neural nets don't 
generalize as well and need to train on something similar to what they will 
be running on. If the target domain is severely limited, then all the dire 
warnings about needing a lot of training data may not apply, but the 
network specification may need to be changed.

Training data is created using tesstrain.sh 
<https://github.com/tesseract-ocr/tesseract/blob/master/src/training/tesstrain.sh>
 as 
follows: Note that your fonts location may vary.

training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
  --noextract_font_properties --langdata_dir ../langdata \
  --tessdata_dir ./tessdata --output_dir ~/tesstutorial/engtrain

The above command makes LSTM training data equivalent to the data used to 
train base Tesseract for English. For making a general-purpose LSTM-based 
OCR engine, it is woefully inadequate, but makes a good tutorial demo.

Now try this to make eval data for the 'Impact' font:

training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
  --noextract_font_properties --langdata_dir ../langdata \
  --tessdata_dir ./tessdata \ 

  --fontlist "Impact Condensed" --output_dir ~/tesstutorial/engeval"



Now i want to proceed further using my previous trained data to do the 
training but the problem is that the previous trained data had .tr files 
and box files but tesseract 4 requires .lstmf files .
Requesting for any solution.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f3d6c64e-7763-478e-b047-a64edd032d99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to