I have newly trained new fonts successfully. I trained Ubuntu and Inter fonts. I am using Tesseract 3.0.5, and Tessdata-3.0.4.
1. I noticed Tesseract does not recognize them, but kept returning a strange name for the fonts. It returned the 1809_Homer font name for Ubuntu, and Inter. This kept me wondering if there is anything wrong with the training. 2. Secondly, Tesseract seems not to be able to differentiate between font-weight: 700, and font-weight: bold. These are the same, but Tesseract sees font-weight: 700 as a normal font. What can I do to remedy this? This is how I trained the new tessdata PANGOCAIRO_BACKEND=fc sh tesstrain.sh --fontlist "Ubuntu" "Ubuntu Bold" "Ubuntu Bold Italic" "Ubuntu Italic" "Ubuntu Light" "Ubuntu Light Italic" "Ubuntu Medium" "Ubuntu Medium Italic" "Inter" "Inter Bold" "Inter Heavy" "Inter Light" "Inter Medium" "Inter Semi-Bold" "Inter Ultra-Bold" "Inter weight=250" --fonts_dir /Library/Fonts --lang nld --langdata_dir /tessapp/langdata --output_dir /fonts/samples --training_text /tessapp/langdata/nld/nld.training_text --tessdata_dir /tessapp/tesseract-3.05.02/tessdata --langdata_dir /tessapp/langdata I got this as the output nld.traineddata -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/306ed183-d2f1-439d-923b-3af3e4ca89d5n%40googlegroups.com.

