[tesseract-ocr] Re: Training from Scratch

Des Bw Wed, 22 Nov 2023 06:27:06 -0800

>From my limited experience, you need a lot more data than that to train 
from scratch. If you can't make more than that data, you might first try to 
fine tune:and then train by removing the top layer of the best model.


On Wednesday, November 22, 2023 at 4:46:53 PM UTC+3 [email protected] wrote:

> As it is not properly possible to combine my traineddata from scratch with 
> an existing one, I have decided to also train my traineddata model numbers. 
> Therefore I wrote a script which synthetically generates groundtruth data 
> with text2image. 
> This script uses dozens of different fonts and creates numbers for the 
> following formats. 
> X.XXX
> X.XX
> X,XX
> X,XXX
> I generated 10,000 files to train the numbers. But unfortunately numbers 
> get recognized pretty poorly with the best model. (most of times only "0."; 
> "0" or "0," gets recognized)  
> So I wanted to ask if It is not enough training (ground truth data) for 
> proper recognition when I train several fonts. 
> Thanks in advance for you help. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7bb9eb1b-3e6e-47f7-bb13-03fc0fb5505dn%40googlegroups.com.

[tesseract-ocr] Re: Training from Scratch

Reply via email to