Re: [tesseract-ocr] I am unable to train a new font to tesseract, I am getting a deserialize failed error

2023-11-22 Thread Des Bw
Probably your issue is contingent with this one: https://github.com/tesseract-ocr/tesseract/issues/792 Are you in Windows or Ubuntu? You might try by upgrading tesseract to version 5. I am not well versed into tesseract. So, my knowledge is very limited. On Thursday, November 23, 2023 at

Re: [tesseract-ocr] I am unable to train a new font to tesseract, I am getting a deserialize failed error

2023-11-22 Thread Adepu Sai Rahul
the tif files are not corrupted and box files are not of size zero On Thursday, November 23, 2023 at 12:51:49 PM UTC+5:30 desal...@gmail.com wrote: > Make sure that the tif files are not corrupted; or the box files are not > zero size. > > Des > > On 23 Nov 2023 at 9:26:39 AM, Adepu Sai

Re: [tesseract-ocr] I am unable to train a new font to tesseract, I am getting a deserialize failed error

2023-11-22 Thread Des Bw
Make sure that the tif files are not corrupted; or the box files are not zero size. Des On 23 Nov 2023 at 9:26:39 AM, Adepu Sai Rahul wrote: > > chinnu@SaiRahul2507:~/tesseract_tutorial/tesstrain$ > TESSDATA_PREFIX=../tesseract/tessdata make training MODEL_NAME=Y145 > START_MODEL=eng

[tesseract-ocr] I am unable to train a new font to tesseract, I am getting a deserialize failed error

2023-11-22 Thread Adepu Sai Rahul
chinnu@SaiRahul2507:~/tesseract_tutorial/tesstrain$ TESSDATA_PREFIX=../tesseract/tessdata make training MODEL_NAME=Y145 START_MODEL=eng TESSDATA=../tesseract/tessdata MAX_ITERATIONS=200 You are using make version: 4.3 lstmtraining \ --debug_interval 0 \ --traineddata

[tesseract-ocr] Re: Training Metrics

2023-11-22 Thread Des Bw
The character rate is the most common measure of the quality of your training. - train with large data. Run it on a couple of epochs; so that your CER will be as close as 0.01. That is the most common strategy. On Wednesday, November 22, 2023 at 4:50:45 PM UTC+3 smon...@gmail.com wrote: > As

[tesseract-ocr] Re: Training Metrics

2023-11-22 Thread Des Bw
Most people seem to watch the character error. That is supposed to be the most important indicator of accuracy. I think character error of less than 1% is what is mostly sought for. On Wednesday, November 22, 2023 at 4:50:45 PM UTC+3 smon...@gmail.com wrote: > As I am training my model I got

[tesseract-ocr] Re: Training from Scratch

2023-11-22 Thread Des Bw
>From my limited experience, you need a lot more data than that to train from scratch. If you can't make more than that data, you might first try to fine tune:and then train by removing the top layer of the best model. On Wednesday, November 22, 2023 at 4:46:53 PM UTC+3 smon...@gmail.com

[tesseract-ocr] Training Metrics

2023-11-22 Thread Simon
As I am training my model I got in contact with the following metrics: E.g.: At iteration 6345/6500/6500, Mean rms=6.246%, delta=7.139%, char train=68.07%, word train=92.2%, skip ratio=0%, New best char error = 68.07 wrote checkpoint. Unfortunately I don't find any proper and detailed

[tesseract-ocr] Training from Scratch

2023-11-22 Thread Simon
As it is not properly possible to combine my traineddata from scratch with an existing one, I have decided to also train my traineddata model numbers. Therefore I wrote a script which synthetically generates groundtruth data with text2image. This script uses dozens of different fonts and