I have a drawings (see picture). I have a font used on them. Trying to 
teach tesseract with no success[image: 2022-04-22_185212.png]
And here is my output:
./tesstrain.sh --lang eng --langdata_dir /usr/share/tesseract-ocr/langdata 
--tessdata_dir /usr/share/tesseract-ocr/tessdata --fonts_dir fonts 
--fontlist "PAS_GTNF"

=== Starting training for language 'eng'
[Fri Apr 22 15:23:14 UTC 2022] /usr/bin/text2image --fonts_dir=fonts 
--font=PAS_GTNF --outputbase=/tmp/font_tmp.cCxEGH0jBv/sample_text.txt 
--text=/tmp/font_tmp.cCxEGH0jBv/sample_text.txt 
--fontconfig_tmpdir=/tmp/font_tmp.cCxEGH0jBv
Stripped 1 unrenderable words
Rendered page 0 to file /tmp/font_tmp.cCxEGH0jBv/sample_text.txt.tif

=== Phase I: Generating training images ===
Rendering using PAS_GTNF
[Fri Apr 22 15:23:14 UTC 2022] /usr/bin/text2image 
--fontconfig_tmpdir=/tmp/font_tmp.cCxEGH0jBv --fonts_dir=fonts 
--strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 
--outputbase=/tmp/tmp.lJZrntRcEh/eng/eng.PAS_GTNF.exp0 --max_pages=3 
--font=PAS_GTNF 
--text=/usr/share/tesseract-ocr/langdata/eng/eng.training_text
Stripped 5 unrenderable words
Rendered page 0 to file /tmp/tmp.lJZrntRcEh/eng/eng.PAS_GTNF.exp0.tif

=== Phase UP: Generating unicharset and unichar properties files ===
[Fri Apr 22 15:23:15 UTC 2022] /usr/bin/unicharset_extractor 
--output_unicharset /tmp/tmp.lJZrntRcEh/eng/eng.unicharset --norm_mode 1 
/tmp/tmp.lJZrntRcEh/eng/eng.PAS_GTNF.exp0.box
Extracting unicharset from box file 
/tmp/tmp.lJZrntRcEh/eng/eng.PAS_GTNF.exp0.box
Other case D of d is not in unicharset
Other case N of n is not in unicharset
Other case p of P is not in unicharset
Other case M of m is not in unicharset
Other case H of h is not in unicharset
Other case T of t is not in unicharset
Other case R of r is not in unicharset
Other case G of g is not in unicharset
Other case E of e is not in unicharset
Other case J of j is not in unicharset
Other case I of i is not in unicharset
Other case F of f is not in unicharset
Other case K of k is not in unicharset
Wrote unicharset file /tmp/tmp.lJZrntRcEh/eng/eng.unicharset
[Fri Apr 22 15:23:15 UTC 2022] /usr/bin/set_unicharset_properties -U 
/tmp/tmp.lJZrntRcEh/eng/eng.unicharset -O 
/tmp/tmp.lJZrntRcEh/eng/eng.unicharset -X 
/tmp/tmp.lJZrntRcEh/eng/eng.xheights 
--script_dir=/usr/share/tesseract-ocr/langdata
Loaded unicharset of size 32 from file 
/tmp/tmp.lJZrntRcEh/eng/eng.unicharset
Setting unichar properties
Other case D of d is not in unicharset
Other case N of n is not in unicharset
Other case p of P is not in unicharset
Other case M of m is not in unicharset
Other case H of h is not in unicharset
Other case T of t is not in unicharset
Other case R of r is not in unicharset
Other case G of g is not in unicharset
Other case E of e is not in unicharset
Other case J of j is not in unicharset
Other case I of i is not in unicharset
Other case F of f is not in unicharset
Other case K of k is not in unicharset
Setting script properties
Failed to load script unicharset 
from:/usr/share/tesseract-ocr/langdata/Latin.unicharset

Warning: properties incomplete for index 3 = d
Warning: properties incomplete for index 4 = \
Warning: properties incomplete for index 5 = 0

Warning: properties incomplete for index 6 = .

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3e9c4344-b061-4518-ad40-404c14b187bbn%40googlegroups.com.

Attachment: PAS_GTNF.ttf
Description: application/font-ttf

Attachment: PAS_GTF.ttf
Description: application/font-ttf

Reply via email to