Thank you for your quick response. So if I understand you correctly I have to change the original ones from "~/tesseract-ocr/langdata_lstm/eng" directory. What about the eng.wordlist file in this directory. I think it is useless for numbers only. I only want to detect numbers between 0..1000, should I create a own one which include these numbers?
desired_characters eng.numbers eng.punc eng.singles_text eng.training_text eng.unicharambigs eng.unicharset eng.wordlist okfonts.txt desired_characters eng.numbers eng.punc eng.singles_text eng.training_text eng.unicharambigs eng.unicharset eng.wordlist okfonts. Am Sonntag, 20. Januar 2019 08:56:23 UTC+2 schrieb shree: > > It depends on what you are fine tuning for. > > I had changed the punc and numbers file so that only those punctuation > characters were used which were in the unicharset eg. For a digits trained > data which is for 0-9 and decimal point, comma and minus sign, I removed > all other punctuation marks and kept only . , and - > > Similarly the numbers file was modified for the patterns expected. > > On Sun, 20 Jan 2019, 12:15 nahibi <[email protected] <javascript:> wrote: > >> Hello, >> >> I try to finetune tesseract 4.0 like it is explained here: >> >> >> https://github.com/Shreeshrii/tessdata_shreetest/commit/b69b7e6ba6c7b0bd15f1b5541ac8fa5746383ad4 >> >> "- custom training text, punc and numbers files are used by updating the >> files in langdata/eng folder" >> >> >> I do not know what I have to do with the punc and numbers files. >> Do I have to create new files in the same directory like custom training >> text file? >> Do I have to replace the original ones from " >> ~/tesseract-ocr/langdata_lstm/eng"? >> Something else? >> >> Best Regards >> nahibi >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2b4acaf3-61a9-4878-891d-20df6e990953%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2b4acaf3-61a9-4878-891d-20df6e990953%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7fec6294-ff6e-4e36-9922-aee7241ef6d2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

