Hello,
I use the new tutorial to fine tuning the traineddata. I want to add some
specific symbols to the existing chi_sim.traineddata model.
First, I use the command:* training/tesstrain.sh --fonts_dir
/usr/share/fonts --lang chi_sim --linedata_only --noextract_font_properties
--langdata_dir ../langdata --fontlist "SIMSUN" --tessdata_dir ./tessdata
--output_dir ~/tesstutorial/trainspecial* to create the new training data.
But some specific symbols cannot be added to the unicharset file.
A part of output information showed following:
=== Phase UP: Generating unicharset and unichar properties files ===
[2017年 08月 14日 星期一 15:59:17 CST] /usr/local/bin/unicharset_extractor -D
/tmp/tmp.78WyISy4o7/chi_sim/
/tmp/tmp.78WyISy4o7/chi_sim/chi_sim.SIMSUN.exp0.box
Extracting unicharset from
/tmp/tmp.78WyISy4o7/chi_sim/chi_sim.SIMSUN.exp0.box
Wrote unicharset file /tmp/tmp.78WyISy4o7/chi_sim//unicharset.
[2017年 08月 14日 星期一 15:59:17 CST] /usr/local/bin/set_unicharset_properties
-U /tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset -O
/tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset -X
/tmp/tmp.78WyISy4o7/chi_sim/chi_sim.xheights --script_dir=../langdata
Loaded unicharset of size 1129 from file
/tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset
Setting unichar properties
Other case Л of л is not in unicharset
Other case Υ of υ is not in unicharset
Other case Π of π is not in unicharset
Other case Β of β is not in unicharset
Mirror ∼ of ∽ is not in unicharset
Mirror ⧵ of ∕ is not in unicharset
Other case σ of Σ is not in unicharset
Other case Ρ of ρ is not in unicharset
Mirror 》 of 《 is not in unicharset
Other case j of J is not in unicharset
Mirror 【 of 】 is not in unicharset
Mirror 「 of 」 is not in unicharset
Other case K of k is not in unicharset
Mirror { of } is not in unicharset
Other case q of Q is not in unicharset
Mirror 〗 of 〖 is not in unicharset
Setting script properties
Warning: properties incomplete for index 57 = )
Warning: properties incomplete for index 60 = :
Warning: properties incomplete for index 64 = !
Warning: properties incomplete for index 67 = ?
Warning: properties incomplete for index 73 = >
Warning: properties incomplete for index 81 = ;
Warning: properties incomplete for index 82 = ~
Warning: properties incomplete for index 90 = .
Warning: properties incomplete for index 98 = (
Warning: properties incomplete for index 99 = ゜
Warning: properties incomplete for index 115 = <
Warning: properties incomplete for index 190 = ,
Writing unicharset to file /tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset
which shows that some specific symbols such as 'Л', '》', ..., cannot be
added to the unicharset.
How can I add these symbols to the unicharset? Should I add them manually?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/1b2e87fb-ebca-4b92-a561-1a6ccc4a27ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.