[tesseract-ocr] Tesseract unable to recognise Ubuntu and Inter fonts, it returned - 1809_Homer

Kehinde Adeoya Fri, 20 May 2022 06:00:52 -0700

I have newly trained new fonts successfully. I trained Ubuntu and Inter 
fonts. I am using Tesseract 3.0.5, and Tessdata-3.0.4.


1. I noticed Tesseract does not recognize them, but kept returning a 
strange name for the fonts. It returned the 1809_Homer font name for 
Ubuntu, and Inter. This kept me wondering if there is anything wrong with 
the training.
2. Secondly, Tesseract seems not to be able to differentiate between 
font-weight: 700, and font-weight: bold. These are the same, but Tesseract 
sees font-weight: 700 as a normal font. What can I do to remedy this?

This is how I trained the new tessdata
PANGOCAIRO_BACKEND=fc sh tesstrain.sh --fontlist "Ubuntu" "Ubuntu Bold" 
"Ubuntu Bold Italic" "Ubuntu Italic" "Ubuntu Light" "Ubuntu Light Italic" 
"Ubuntu Medium" "Ubuntu Medium Italic" "Inter" "Inter Bold" "Inter Heavy" 
"Inter Light" "Inter Medium" "Inter Semi-Bold" "Inter Ultra-Bold" "Inter 
weight=250" --fonts_dir /Library/Fonts --lang nld --langdata_dir 
/tessapp/langdata --output_dir /fonts/samples --training_text 
/tessapp/langdata/nld/nld.training_text --tessdata_dir 
/tessapp/tesseract-3.05.02/tessdata --langdata_dir /tessapp/langdata

I got this as the output
nld.traineddata

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/306ed183-d2f1-439d-923b-3af3e4ca89d5n%40googlegroups.com.

[tesseract-ocr] Tesseract unable to recognise Ubuntu and Inter fonts, it returned - 1809_Homer

Reply via email to