>

it says  "*Modify**langdata/eng/eng.training_text to include some samples
of ±."*

*That is part of a training tutorial, where the goal is to add a new
character **± to the eng.traineddata so that it can be recognized by the
finetuned traineddata.*

It is only an example. You have to modify it based on what you need.

Please read the documentation.

https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc#components
https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_lang_model.1.asc

etc.

On Wed, Jan 30, 2019 at 3:25 PM 易鑫 <[email protected]> wrote:

> Hello,everyone:
>
>      I get some confusion about "*Fine Tuning for ± a few characters*".
> In the wiki *(*
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters
> *),*
>
> it says  "*Modify**langdata/eng/eng.training_text to include some samples
> of ±."*
>
>      My question is why should we do that,what is eng,training_text file
> used for?
>
> I list the files in the langdata/eng folder.
>
>
> -rwxrwxrwx 1 yixin yixin      249 1月  23 16:29 desired_characters
> -rwxrwxrwx 1 yixin yixin     2235 1月  23 16:29 eng.numbers
> -rwxrwxrwx 1 yixin yixin     6082 1月  23 16:29 eng.punc
> -rwxrwxrwx 1 yixin yixin     6801 1月  23 16:29 eng.training_text
> -rwxrwxrwx 1 yixin yixin    80847 1月  23 16:29
> eng.training_text.bigram_freqs
> -rwxrwxrwx 1 yixin yixin     1063 1月  23 16:29
> eng.training_text.unigram_freqs
> -rwxrwxrwx 1 yixin yixin     1058 1月  23 16:29 eng.unicharambigs
> -rwxrwxrwx 1 yixin yixin 15836450 1月  23 16:29 eng.word.bigrams
> -rwxrwxrwx 1 yixin yixin  3852057 1月  23 16:29 eng.wordlist
>
> what are these files used for?
>
> I think desired_characters is corresponding to Unicharset,and I can see
> there are totally 119 different characters in desired_characters.
> eng.number is corresponding to Number dawg. eng.punc is corresponding to
> Punctuation pattern dawg,    eng.word.list is corresponding to Word dawg.,
> am I right?
>
> and what are other files used for? thank you in advance.
>
> Sorry for my poor English.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/0853a38c-6426-42d6-9c8d-de4062b50832%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/0853a38c-6426-42d6-9c8d-de4062b50832%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVOjV5F9tV0sQJMU8-1BH7RU13fAxq3hyXsShs7xT-VnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to