Re: [tesseract-ocr] Training tessract 4.0 using images?

2018-04-15 Thread ShreeDevi Kumar
Please take a look at tesstrain_utils.sh and language-specific.sh in training directory for more details about how training works. As mentioned before training with box/tiff pairs is not supported. On Mon 16 Apr, 2018, 8:19 AM , wrote: > Hi Shree, > > Thanks for

Re: [tesseract-ocr] Training tessract 4.0 using images?

2018-04-15 Thread denniscfeng
Hi Shree, Thanks for your help, I was able to successfully train with the boxfiles. Is it possible to not provide any font data at all? Theoretically, if I was training for a document that did not have any font data available on the web, what would I do then? In tesstrain.sh, after I copy the

[tesseract-ocr] Change text from training

2018-04-15 Thread Fanatico
What is the correct way to change the training text from a traineddata that I'm working? I'm training an new traineddata and it started to get some results, but now I want to change the text used to train it and continue from where I stopped. How can I do it? -- You received this message

Re: [tesseract-ocr] Training tessract 4.0 using images?

2018-04-15 Thread ShreeDevi Kumar
Hi Dennis, 1. Copy 4.0 format box/tiff pairs to langdata/$lang directory or any other folder of your choice. 2. Modify tesstrain.sh to copy these files to your /tmp directory - see following for where the lines need to be added source "$(dirname $0)/tesstrain_utils.sh" ARGV=("$@") parse_flags

Re: [tesseract-ocr] Training tessract 4.0 using images?

2018-04-15 Thread denniscfeng
Hi shree, Thanks for your reply. Is there any option to use tesstrain.sh in tesseract 4.0 to generate the traineddata and lstm files using the image and boxfiles? Or do I still have to go through the process as listed in the Tesseract 3.0 instructions? In which case, I would be able to