Re: [tesseract-ocr] Training tessract 4.0 using images?

2020-06-16 Thread shree
To those who come across this old thread: Training from single line images and their groundtruth is now possible using the makefile in tesstrain repo. https://stackoverflow.com/questions/43352918/how-do-i-train-tesseract-4-with-image-data-instead-of-a-font-file The above link has a good

Re: [tesseract-ocr] Training tessract 4.0 using images?

2018-04-15 Thread ShreeDevi Kumar
Please take a look at tesstrain_utils.sh and language-specific.sh in training directory for more details about how training works. As mentioned before training with box/tiff pairs is not supported. On Mon 16 Apr, 2018, 8:19 AM , wrote: > Hi Shree, > > Thanks for

Re: [tesseract-ocr] Training tessract 4.0 using images?

2018-04-15 Thread denniscfeng
Hi Shree, Thanks for your help, I was able to successfully train with the boxfiles. Is it possible to not provide any font data at all? Theoretically, if I was training for a document that did not have any font data available on the web, what would I do then? In tesstrain.sh, after I copy the

Re: [tesseract-ocr] Training tessract 4.0 using images?

2018-04-15 Thread ShreeDevi Kumar
Hi Dennis, 1. Copy 4.0 format box/tiff pairs to langdata/$lang directory or any other folder of your choice. 2. Modify tesstrain.sh to copy these files to your /tmp directory - see following for where the lines need to be added source "$(dirname $0)/tesstrain_utils.sh" ARGV=("$@") parse_flags

Re: [tesseract-ocr] Training tessract 4.0 using images?

2018-04-15 Thread denniscfeng
Hi shree, Thanks for your reply. Is there any option to use tesstrain.sh in tesseract 4.0 to generate the traineddata and lstm files using the image and boxfiles? Or do I still have to go through the process as listed in the Tesseract 3.0 instructions? In which case, I would be able to

Re: [tesseract-ocr] Training tessract 4.0 using images?

2018-04-13 Thread ShreeDevi Kumar
training Tesseract 4.0 from images is not officially .supported . Different people have had success in doing LSTM training with box/tiff pairs. but it requires hacks/programming on their part to create 4.0.0 compatible box files. tesstrain.sh creates box/tiff files in the /tmp directory, these

[tesseract-ocr] Training tessract 4.0 using images?

2018-04-13 Thread denniscfeng
Hi all, I read in a different post that training Tesseract 4.0 from images is not supported, is this true? I have been able to successfully train Tesseract 4.0 so far using font data. When using tesstrain.sh, the script creates a number of files, including an lstmf file alongside the usual