>From my limited experience, you need a lot more data than that to train from scratch. If you can't make more than that data, you might first try to fine tune:and then train by removing the top layer of the best model.
On Wednesday, November 22, 2023 at 4:46:53 PM UTC+3 smon...@gmail.com wrote: > As it is not properly possible to combine my traineddata from scratch with > an existing one, I have decided to also train my traineddata model numbers. > Therefore I wrote a script which synthetically generates groundtruth data > with text2image. > This script uses dozens of different fonts and creates numbers for the > following formats. > X.XXX > X.XX > X,XX > X,XXX > I generated 10,000 files to train the numbers. But unfortunately numbers > get recognized pretty poorly with the best model. (most of times only "0."; > "0" or "0," gets recognized) > So I wanted to ask if It is not enough training (ground truth data) for > proper recognition when I train several fonts. > Thanks in advance for you help. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7bb9eb1b-3e6e-47f7-bb13-03fc0fb5505dn%40googlegroups.com.