Hello,

I use the new tutorial to fine tuning the traineddata. I want to add some 
specific symbols to the existing chi_sim.traineddata model.

First, I use the command:* training/tesstrain.sh --fonts_dir 
/usr/share/fonts --lang chi_sim --linedata_only --noextract_font_properties 
--langdata_dir ../langdata --fontlist "SIMSUN" --tessdata_dir ./tessdata 
--output_dir ~/tesstutorial/trainspecial* to create the new training data. 
But some specific symbols cannot be added to the unicharset file.

A part of output information showed following:

=== Phase UP: Generating unicharset and unichar properties files ===
[2017年 08月 14日 星期一 15:59:17 CST] /usr/local/bin/unicharset_extractor -D 
/tmp/tmp.78WyISy4o7/chi_sim/ 
/tmp/tmp.78WyISy4o7/chi_sim/chi_sim.SIMSUN.exp0.box
Extracting unicharset from 
/tmp/tmp.78WyISy4o7/chi_sim/chi_sim.SIMSUN.exp0.box
Wrote unicharset file /tmp/tmp.78WyISy4o7/chi_sim//unicharset.
[2017年 08月 14日 星期一 15:59:17 CST] /usr/local/bin/set_unicharset_properties 
-U /tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset -O 
/tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset -X 
/tmp/tmp.78WyISy4o7/chi_sim/chi_sim.xheights --script_dir=../langdata
Loaded unicharset of size 1129 from file 
/tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset
Setting unichar properties
Other case Л of л is not in unicharset
Other case Υ of υ is not in unicharset
Other case Π of π is not in unicharset
Other case Β of β is not in unicharset
Mirror ∼ of ∽ is not in unicharset
Mirror ⧵ of ∕ is not in unicharset
Other case σ of Σ is not in unicharset
Other case Ρ of ρ is not in unicharset
Mirror 》 of 《 is not in unicharset
Other case j of J is not in unicharset
Mirror 【 of 】 is not in unicharset
Mirror 「 of 」 is not in unicharset
Other case K of k is not in unicharset
Mirror { of } is not in unicharset
Other case q of Q is not in unicharset
Mirror 〗 of 〖 is not in unicharset
Setting script properties
Warning: properties incomplete for index 57 = )
Warning: properties incomplete for index 60 = :
Warning: properties incomplete for index 64 = !
Warning: properties incomplete for index 67 = ?
Warning: properties incomplete for index 73 = >
Warning: properties incomplete for index 81 = ;
Warning: properties incomplete for index 82 = ~
Warning: properties incomplete for index 90 = .
Warning: properties incomplete for index 98 = (
Warning: properties incomplete for index 99 = ゜
Warning: properties incomplete for index 115 = <
Warning: properties incomplete for index 190 = ,
Writing unicharset to file /tmp/tmp.78WyISy4o7/chi_sim/chi_sim.unicharset


which shows that some specific symbols such as 'Л', '》', ...,   cannot be 
added to the unicharset.


How can I add these symbols to the unicharset? Should I add them manually?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1b2e87fb-ebca-4b92-a561-1a6ccc4a27ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to