[tesseract-ocr] Tesseract not detecting Ubuntu and Inter Google fonts but returning the wrong font - 1809_Homer

Kehinde Adeoya Fri, 20 May 2022 05:46:35 -0700

I have newly trained new fonts successfully. I trained Ubuntu and Inter 
fonts. I am using Tesseract 3.0.5, and Tessdata-3.0.4.
1. I noticed Tesseract does not recognize them, but kept returning a 
strange name for the fonts. It returned the 1809_Homer font name for 
Ubuntu, and kept me wondering if there is anything wrong with the training.
2. Secondly, Tesseract seems not to be able to differentiate between 
font-weight: 700, and font-weight: bold. These are the same, but Tesseract 
sees font-weight: 700 as a normal font. What can I do to remedy this?


This is how I trained the new tessdata
PANGOCAIRO_BACKEND=fc sh tesstrain.sh --fontlist "Ubuntu" "Ubuntu Bold" 
"Ubuntu Bold Italic" "Ubuntu Italic" "Ubuntu Light" "Ubuntu Light Italic" 
"Ubuntu Medium" "Ubuntu Medium Italic" "Inter" "Inter Bold" "Inter Heavy" 
"Inter Light" "Inter Medium" "Inter Semi-Bold" "Inter Ultra-Bold" "Inter 
weight=250" --fonts_dir /Library/Fonts --lang nld --langdata_dir 
/tessapp/langdata --output_dir /fonts/samples --training_text 
/tessapp/langdata/nld/nld.training_text --tessdata_dir 
/tessapp/tesseract-3.05.02/tessdata --langdata_dir /tessapp/langdata

I got this as the output
nld.traineddata

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0302b35b-72a0-4c8b-9ffc-5d109bb2d85en%40googlegroups.com.

[tesseract-ocr] Tesseract not detecting Ubuntu and Inter Google fonts but returning the wrong font - 1809_Homer

Reply via email to