I have a drawings (see picture). I have a font used on them. Trying to teach tesseract with no success[image: 2022-04-22_185212.png] And here is my output: ./tesstrain.sh --lang eng --langdata_dir /usr/share/tesseract-ocr/langdata --tessdata_dir /usr/share/tesseract-ocr/tessdata --fonts_dir fonts --fontlist "PAS_GTNF"
=== Starting training for language 'eng' [Fri Apr 22 15:23:14 UTC 2022] /usr/bin/text2image --fonts_dir=fonts --font=PAS_GTNF --outputbase=/tmp/font_tmp.cCxEGH0jBv/sample_text.txt --text=/tmp/font_tmp.cCxEGH0jBv/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.cCxEGH0jBv Stripped 1 unrenderable words Rendered page 0 to file /tmp/font_tmp.cCxEGH0jBv/sample_text.txt.tif === Phase I: Generating training images === Rendering using PAS_GTNF [Fri Apr 22 15:23:14 UTC 2022] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.cCxEGH0jBv --fonts_dir=fonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.lJZrntRcEh/eng/eng.PAS_GTNF.exp0 --max_pages=3 --font=PAS_GTNF --text=/usr/share/tesseract-ocr/langdata/eng/eng.training_text Stripped 5 unrenderable words Rendered page 0 to file /tmp/tmp.lJZrntRcEh/eng/eng.PAS_GTNF.exp0.tif === Phase UP: Generating unicharset and unichar properties files === [Fri Apr 22 15:23:15 UTC 2022] /usr/bin/unicharset_extractor --output_unicharset /tmp/tmp.lJZrntRcEh/eng/eng.unicharset --norm_mode 1 /tmp/tmp.lJZrntRcEh/eng/eng.PAS_GTNF.exp0.box Extracting unicharset from box file /tmp/tmp.lJZrntRcEh/eng/eng.PAS_GTNF.exp0.box Other case D of d is not in unicharset Other case N of n is not in unicharset Other case p of P is not in unicharset Other case M of m is not in unicharset Other case H of h is not in unicharset Other case T of t is not in unicharset Other case R of r is not in unicharset Other case G of g is not in unicharset Other case E of e is not in unicharset Other case J of j is not in unicharset Other case I of i is not in unicharset Other case F of f is not in unicharset Other case K of k is not in unicharset Wrote unicharset file /tmp/tmp.lJZrntRcEh/eng/eng.unicharset [Fri Apr 22 15:23:15 UTC 2022] /usr/bin/set_unicharset_properties -U /tmp/tmp.lJZrntRcEh/eng/eng.unicharset -O /tmp/tmp.lJZrntRcEh/eng/eng.unicharset -X /tmp/tmp.lJZrntRcEh/eng/eng.xheights --script_dir=/usr/share/tesseract-ocr/langdata Loaded unicharset of size 32 from file /tmp/tmp.lJZrntRcEh/eng/eng.unicharset Setting unichar properties Other case D of d is not in unicharset Other case N of n is not in unicharset Other case p of P is not in unicharset Other case M of m is not in unicharset Other case H of h is not in unicharset Other case T of t is not in unicharset Other case R of r is not in unicharset Other case G of g is not in unicharset Other case E of e is not in unicharset Other case J of j is not in unicharset Other case I of i is not in unicharset Other case F of f is not in unicharset Other case K of k is not in unicharset Setting script properties Failed to load script unicharset from:/usr/share/tesseract-ocr/langdata/Latin.unicharset Warning: properties incomplete for index 3 = d Warning: properties incomplete for index 4 = \ Warning: properties incomplete for index 5 = 0 Warning: properties incomplete for index 6 = . -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3e9c4344-b061-4518-ad40-404c14b187bbn%40googlegroups.com.
PAS_GTNF.ttf
Description: application/font-ttf
PAS_GTF.ttf
Description: application/font-ttf

