Re: [tesseract-ocr] Can :traineddata" for Tesseract 3 be used for Tesseract 4

ShreeDevi Kumar Wed, 13 Jun 2018 08:08:09 -0700

If you have box tiff pairs in tesseract4 format you can generate the lstmf
files by running


tesseract   lang.file.exp0.tif     lang.file.exp0   lstm.train

lstm.train is  a config file.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Wed, Jun 13, 2018 at 6:46 PM chandra churh chatterjee <
[email protected]> wrote:

> I have trained tesseract 3 with 64 fonts using respective box and .tr
> files, But now i want to use the same trained data for training tesseract 4
> after creating the starter trained data using the "Using tesstrain
>
> The setup for running tesstrain.sh is the same as for base Tesseract. Use
> --linedata_only option for LSTM training. Note that it is beneficial to
> have more training text and make more pages though, as neural nets don't
> generalize as well and need to train on something similar to what they will
> be running on. If the target domain is severely limited, then all the dire
> warnings about needing a lot of training data may not apply, but the
> network specification may need to be changed.
>
> Training data is created using tesstrain.sh
> <https://github.com/tesseract-ocr/tesseract/blob/master/src/training/tesstrain.sh>
>  as
> follows: Note that your fonts location may vary.
>
> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only 
> \
>   --noextract_font_properties --langdata_dir ../langdata \
>   --tessdata_dir ./tessdata --output_dir ~/tesstutorial/engtrain
>
> The above command makes LSTM training data equivalent to the data used to
> train base Tesseract for English. For making a general-purpose LSTM-based
> OCR engine, it is woefully inadequate, but makes a good tutorial demo.
>
> Now try this to make eval data for the 'Impact' font:
>
> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only 
> \
>   --noextract_font_properties --langdata_dir ../langdata \
>   --tessdata_dir ./tessdata \
>
>   --fontlist "Impact Condensed" --output_dir ~/tesstutorial/engeval"
>
>
>
> Now i want to proceed further using my previous trained data to do the
> training but the problem is that the previous trained data had .tr files
> and box files but tesseract 4 requires .lstmf files .
> Requesting for any solution.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/f3d6c64e-7763-478e-b047-a64edd032d99%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f3d6c64e-7763-478e-b047-a64edd032d99%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWD0-BJ6sq4mypJhnc5FKudVcmSeBg%2BB5w5EARV4NPL4g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Can :traineddata" for Tesseract 3 be used for Tesseract 4

Reply via email to