Your simple unicharset does not look right. You can make a list of
characters needed in a file and create unicharset from it.

unicharset_extractor --output_unicharset cp.unicharset --norm_mode 1
cp.syllables.txt

combine_lang_model \
--input_unicharset cp.unicharset \
--script_dir ~/langdata \
--output_dir ./ \
--lang cp-recoder

See attached zip file for generated traineddata with and without recoder
option.


On Tue, Mar 12, 2019 at 9:59 PM 童虎 <[email protected]> wrote:

> I use this command follow by a post to create a xx.tessdata
>
> combine_lang_model \
> --input_unicharset cp.unicharset \
> --script_dir /Users/th/source/langdata \
> --output_dir output \
> --pass_through_recoder \
> --lang cp
>
> and the cp.unicharset is very simple:
>  https://gist.github.com/huhuang03/62391f632d420f7e293e61e878953535
> <https://gist.github.com/huhuang03/62391f632d420f7e293e61e878953535>
>
> but the shell output is
>
> Loaded unicharset of size 15 from file mh.unicharset
> Setting unichar properties
> Setting script properties
> Failed to load script unicharset from:/Users/th/source/langdata/ommon.
> unicharset
> Error writing unicharset!!
>
>
>
> I can't understand why it look up ommon.unicharset instead of
> Common.unicharset
>
> And my be I have other wrong?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/7a201c4b-25c9-4817-85be-f01c29355a36%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/7a201c4b-25c9-4817-85be-f01c29355a36%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUQYZckTWU%2BM2xaOdskXBsyUQ50R9wY3_hF7WFHBijGOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

<<attachment: cp.zip>>

Reply via email to