I have newly trained new fonts successfully. I trained Ubuntu and Inter fonts. I am using Tesseract 3.0.5, and Tessdata-3.0.4. 1. I noticed Tesseract does not recognize them, but kept returning a strange name for the fonts. It returned the 1809_Homer font name for Ubuntu, and kept me wondering if there is anything wrong with the training. 2. Secondly, Tesseract seems not to be able to differentiate between font-weight: 700, and font-weight: bold. These are the same, but Tesseract sees font-weight: 700 as a normal font. What can I do to remedy this?
This is how I trained the new tessdata PANGOCAIRO_BACKEND=fc sh tesstrain.sh --fontlist "Ubuntu" "Ubuntu Bold" "Ubuntu Bold Italic" "Ubuntu Italic" "Ubuntu Light" "Ubuntu Light Italic" "Ubuntu Medium" "Ubuntu Medium Italic" "Inter" "Inter Bold" "Inter Heavy" "Inter Light" "Inter Medium" "Inter Semi-Bold" "Inter Ultra-Bold" "Inter weight=250" --fonts_dir /Library/Fonts --lang nld --langdata_dir /tessapp/langdata --output_dir /fonts/samples --training_text /tessapp/langdata/nld/nld.training_text --tessdata_dir /tessapp/tesseract-3.05.02/tessdata --langdata_dir /tessapp/langdata I got this as the output nld.traineddata -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0302b35b-72a0-4c8b-9ffc-5d109bb2d85en%40googlegroups.com.

