training Tesseract 4.0 from images is not officially .supported .   Different
people have had success in doing LSTM training with box/tiff pairs. but it
requires hacks/programming on their part to create 4.0.0 compatible box
files.

tesstrain.sh creates box/tiff files in the /tmp directory, these are used
to create the lstmf files for LSTMtraining. tesstrain.sh can create a 3.0x
compatible traineddata or 4.0.0 compatible starter traineddata depending on
options that are chosen. For 4.0.0 this starter traineddata alongwith the
lstmf files is used for LSTM training.

The format of traineddata files for 3.0x and 4.0.0 is different.

For different components of a traineddata file, See

https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc

For creating 4.0 compatible box files see

https://github.com/tesseract-ocr/langdata/issues/83#issuecomment-375247341

https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#training-tesseract-lstm-engine

Please note that all these are unsupported options.


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Apr 13, 2018 at 12:09 PM, <denniscf...@berkeley.edu> wrote:

> Hi all,
>
> I read in a different post that training Tesseract 4.0 from images is not
> supported, is this true? I have been able to successfully train Tesseract
> 4.0 so far using font data. When using tesstrain.sh, the script creates a
> number of files, including an lstmf file alongside the usual trainedata
> file (and there are some others like unicharset). I was wondering if it is
> possible to use the traineddata generation from image and boxfile described
> in the Tesseract 3.0 training instructions to create these training files
> to train Tesseract 4.0. Tesseract 3.0 instructions already produce a
> traineddata file, how can I generate the lstmf file (and the others) if it
> is possible?
>
> Thank you,
> Dennis
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/bc664de6-5386-45b3-ae4d-70ac5338938c%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/bc664de6-5386-45b3-ae4d-70ac5338938c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUTs%2BZCSOUa6mQ6W%3DqQ9q-r%2BeBPa%3D3qjAss6zowy44nZQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to