I replaced the original punc and numbers file from "~/tesseract-ocr/langdata_lstm/eng" and deleted all other files. But when I check the generated eng.unicharset file in my output folder "~/tesstut/testy1/output/eng" it is still containing letters. I think this is not normal and I am doing something wrong.
Am Sonntag, 20. Januar 2019 08:45:25 UTC+2 schrieb nahibi: > > Hello, > > I try to finetune tesseract 4.0 like it is explained here: > > > https://github.com/Shreeshrii/tessdata_shreetest/commit/b69b7e6ba6c7b0bd15f1b5541ac8fa5746383ad4 > > "- custom training text, punc and numbers files are used by updating the > files in langdata/eng folder" > > > I do not know what I have to do with the punc and numbers files. > Do I have to create new files in the same directory like custom training > text file? > Do I have to replace the original ones from " > ~/tesseract-ocr/langdata_lstm/eng"? > Something else? > > Best Regards > nahibi > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c63159e4-d109-4f17-848b-d9ed99465b88%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

