To those who come across this old thread:
Training from single line images and their groundtruth is now possible using
the makefile in tesstrain repo.
https://stackoverflow.com/questions/43352918/how-do-i-train-tesseract-4-with-image-data-instead-of-a-font-file
The above link has a good
Please take a look at tesstrain_utils.sh and language-specific.sh in
training directory for more details about how training works.
As mentioned before training with box/tiff pairs is not supported.
On Mon 16 Apr, 2018, 8:19 AM , wrote:
> Hi Shree,
>
> Thanks for
Hi Shree,
Thanks for your help, I was able to successfully train with the boxfiles.
Is it possible to not provide any font data at all? Theoretically, if I was
training for a document that did not have any font data available on the
web, what would I do then?
In tesstrain.sh, after I copy the
Hi Dennis,
1. Copy 4.0 format box/tiff pairs to langdata/$lang directory or any other
folder of your choice.
2. Modify tesstrain.sh to copy these files to your /tmp directory - see
following for where the lines need to be added
source "$(dirname $0)/tesstrain_utils.sh"
ARGV=("$@")
parse_flags
Hi shree,
Thanks for your reply. Is there any option to use tesstrain.sh in tesseract
4.0 to generate the traineddata and lstm files using the image and
boxfiles? Or do I still have to go through the process as listed in the
Tesseract 3.0 instructions? In which case, I would be able to
training Tesseract 4.0 from images is not officially .supported . Different
people have had success in doing LSTM training with box/tiff pairs. but it
requires hacks/programming on their part to create 4.0.0 compatible box
files.
tesstrain.sh creates box/tiff files in the /tmp directory, these
Hi all,
I read in a different post that training Tesseract 4.0 from images is not
supported, is this true? I have been able to successfully train Tesseract
4.0 so far using font data. When using tesstrain.sh, the script creates a
number of files, including an lstmf file alongside the usual
7 matches
Mail list logo