[tesseract-ocr] How to prepare fonts folder to train from scratch

Essam Zaky Tue, 24 Mar 2020 13:05:28 -0700

Hi Dears ,

I would like to build *.traindata from scratch specially for English and 
Arabic


So lets talk about English as example
my question how to prepare fonts folder? 

i read the 
https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh
 
file
i found the this file contain about only 32 font name 
should i add other Latin fonts installed in the training  machine to the 
previous file "language-specific.sh" ?


i used "font manger" tool and i found about 147 font installed in training 
machine 
i opended 
https://github.com/tesseract-ocr/langdata_lstm/blob/master/eng/okfonts.txt 
and it contain 4567 font name
should i search and download and install all missing fonts in the training 
machine ?

should i collect all fonts files from training machine and create new fonts 
folder "HOME/.fonts" and paste all fonts in that folder? 

i see fonts have diffirent extentions "*.ttf , *.otf , *.afm , ... "
does all font types work in training or i need specific type ?


I will write another question about the required text data .  

Thanks for help



Regards
Essam

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e605a197-000c-444a-9969-dd10346f2028%40googlegroups.com.

[tesseract-ocr] How to prepare fonts folder to train from scratch

Reply via email to