Thank you for your quick response. So if I understand you correctly I have 
to change the original ones from "~/tesseract-ocr/langdata_lstm/eng" 
directory.
What about the eng.wordlist file in this directory. I think it is useless 
for numbers only. I only want to detect numbers between 0..1000, should I 
create a own one which include these numbers?

desired_characters  eng.numbers  eng.punc  eng.singles_text  eng.training_text  
eng.unicharambigs  eng.unicharset  eng.wordlist  okfonts.txt

desired_characters  eng.numbers  eng.punc  eng.singles_text  eng.training_text  
eng.unicharambigs  eng.unicharset  eng.wordlist  okfonts.



Am Sonntag, 20. Januar 2019 08:56:23 UTC+2 schrieb shree:
>
> It depends on what you are fine tuning for.
>
>  I had changed the punc and numbers file so that only those punctuation 
> characters were used which were in the unicharset eg. For a digits trained 
> data which is for 0-9 and decimal point, comma and minus sign, I removed 
> all other punctuation marks and kept only . , and -
>
> Similarly the numbers file was modified for the patterns expected.
>
> On Sun, 20 Jan 2019, 12:15 nahibi <[email protected] <javascript:> wrote:
>
>> Hello,
>>
>> I try to finetune tesseract 4.0 like it is explained here:
>>
>>
>> https://github.com/Shreeshrii/tessdata_shreetest/commit/b69b7e6ba6c7b0bd15f1b5541ac8fa5746383ad4
>>
>> "- custom training text, punc and numbers files are used by updating the 
>> files in langdata/eng folder"
>>
>>
>> I do not know what I have to do with the punc and numbers files. 
>> Do I have to create new files in the same directory like custom training 
>> text file?
>> Do I have to replace the original ones from "
>> ~/tesseract-ocr/langdata_lstm/eng"?
>> Something else?
>>
>> Best Regards
>> nahibi 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2b4acaf3-61a9-4878-891d-20df6e990953%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/2b4acaf3-61a9-4878-891d-20df6e990953%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7fec6294-ff6e-4e36-9922-aee7241ef6d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to