Hi,
I keep having problems with duplicated letters with custom fine-tuned
models.

For example an M becomes MH.

I'm using ocrd-train with actual crops and I noticed that the lstmf files
are generated with psm 6.

At runtime I use psm 7. Do you think this may make a difference? From a
quick test it does not seem the case.

The problem gets worse if I use psm 13 for recognition this is why I'm
wondering if there is a relation.

Is there something else that I'm doing wrong that might lead to this
problem? Or something I can improve?

I have only one font (ocr-b) with fixed height (44px plus 2px white margin).

According to this post the sweet spot seems to be closer to 30px (for most
fonts)

https://groups.google.com/forum/?#!msg/tesseract-ocr/Wdh_JJwnw94/cHjYD3cDEQAJ




Thanks, bye

Lorenzo

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwi%3DywrYE%3D_z4J4%3D_ZpxDLAtw8vEFS27nUzLBuHASBsUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to