Hi

I am trying to train my own Tesseract model (V. 4, by replacing top layer 
as described in the tutorial). Besides of non-explainable OCR problems (see 
https://github.com/tesseract-ocr/tesseract/issues/734#issuecomment-299132760), 
when I compare outputs produced by my model and by one of the standard 
models, I observe quite big differences. 

I trained a model until the 0.005 convergence level (*below* the default 
value 0.01), and then evaluated the model on small data it was trained 
with. The confidence values (produced by my model) are between 40-55 (even 
for very frequent and unambiguous words), whereas a standard model achieves 
between 80-95, with 50-70 for visually ambiguous words. 

I was wondering if you achieve confidence levels close to tessdata models? 
If so, how did you achieve this. Are the standard tesseract models 
overfitted (Try to OCR a common but misspelled word ;)? 

Cheers,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/60273552-d4bc-4c24-a20f-e026c73cebd1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to