I replaced the original punc and numbers file from 
"~/tesseract-ocr/langdata_lstm/eng" 
and deleted all other files.
But when I check the generated eng.unicharset file in my output folder 
"~/tesstut/testy1/output/eng" 
it is still containing letters.
I think this is not normal and I am doing something wrong.


Am Sonntag, 20. Januar 2019 08:45:25 UTC+2 schrieb nahibi:
>
> Hello,
>
> I try to finetune tesseract 4.0 like it is explained here:
>
>
> https://github.com/Shreeshrii/tessdata_shreetest/commit/b69b7e6ba6c7b0bd15f1b5541ac8fa5746383ad4
>
> "- custom training text, punc and numbers files are used by updating the 
> files in langdata/eng folder"
>
>
> I do not know what I have to do with the punc and numbers files. 
> Do I have to create new files in the same directory like custom training 
> text file?
> Do I have to replace the original ones from "
> ~/tesseract-ocr/langdata_lstm/eng"?
> Something else?
>
> Best Regards
> nahibi 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c63159e4-d109-4f17-848b-d9ed99465b88%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to