The character rate is the most common measure of the quality of your training. - train with large data. Run it on a couple of epochs; so that your CER will be as close as 0.01. That is the most common strategy.
On Wednesday, November 22, 2023 at 4:50:45 PM UTC+3 smon...@gmail.com wrote: > As I am training my model I got in contact with the following metrics: > > E.g.: > At iteration 6345/6500/6500, Mean rms=6.246%, delta=7.139%, char > train=68.07%, word train=92.2%, skip ratio=0%, New best char error = 68.07 > wrote checkpoint. > > Unfortunately I don't find any proper and detailed description or > explanation of these metrics on the web. > > To evaluate the metrics this information would be really helpful, as right > now It feels more like guessing what values are "good". As most developers > are lacking in experience it is pretty hard to tell what values are "good" > or "bad". > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/25a33ea3-9dfc-4a33-ac1d-1d7d888c8da0n%40googlegroups.com.