https://github.com/tesseract-ocr/tesseract/pull/1134/files
should fix it.


On Thursday, September 14, 2017 at 1:50:26 PM UTC+5:30, 
[email protected] wrote:
>
>  Hello,
>
> I'm trying to train my traineddata model with Tess4.0, following the 
> commands in the* TrainingTesseract 4.00 *tutorial. The first command to 
> creat training data is showed as follows:
>
> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang chi_sim 
> --linedata_only \
> --noextract_font_properties --langdata_dir ../langdata \
> --fontlist "SIMSUN" --tessdata_dir ./tessdata --output_dir 
> ~/tesstutorial/trainspecial
>
>
> And the execution log for this command is as follows:
>
> === Phase I: Generating training images ===
> Rendering using SIMSUN
> [2017年 09月 14日 星期四 16:01:57 CST] /usr/local/bin/text2image 
> --fontconfig_tmpdir=/tmp/font_tmp.whlzhytMkp --fonts_dir=/usr/share/fonts 
> --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 
> --outputbase=/tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.SIMSUN.exp0 --max_pages=3 
> --font=SIMSUN --text=../langdata/chi_sim/chi_sim.training_text
> Rendered page 0 to file /tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.SIMSUN.exp0.tif
>
> === Phase UP: Generating unicharset and unichar properties files ===
> [2017年 09月 14日 星期四 16:01:58 CST] /usr/local/bin/unicharset_extractor 
> --output_unicharset /tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.unicharset 
> --norm_mode 1 /tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.SIMSUN.exp0.box
> Extracting unicharset from box file 
> /tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.SIMSUN.exp0.box
> Invalid Unicode codepoint: 0xffffffe8
> IsValidCodepoint(ch):Error:Assert failed:in file normstrngs.cpp, line 225
> ERROR: /tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.unicharset does not exist or is 
> not readable
>
>
> But an error appears in this progress, which shows that chi_sim.unicharset 
> extracted error. I have checked the directory of 
> /tmp/tmp.8JcoYdZI17/chi_sim/, 
> and chi_sim.unicharset file does not exist.
>
> How can I modify this error? Can you help me? Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/28ccb34e-699a-486e-b56d-abe032d4c042%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to