[tesseract-ocr] Re: How to prepare fonts folder to train from scratch

Essam Zaky Wed, 25 Mar 2020 01:55:18 -0700

@Lorenozo 
I need to do that because because the accuracy of current Arabic not very 
good as English , and i have a lot fonts need to add to Arabic model
adding them by fine tune will affect the model so  i need to build from 
scratch and make the model more generalized
so i need to know what is done in English model and take it as a reference 
to make new Arabic model



بتاريخ الثلاثاء، 24 مارس، 2020 10:05:03 م UTC+2، كتب Essam Zaky:
>
> Hi Dears ,
>
> I would like to build *.traindata from scratch specially for English and 
> Arabic
>
> So lets talk about English as example
> my question how to prepare fonts folder? 
>
> i read the 
> https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh
>  
> file
> i found the this file contain about only 32 font name 
> should i add other Latin fonts installed in the training  machine to the 
> previous file "language-specific.sh" ?
>
>
> i used "font manger" tool and i found about 147 font installed in training 
> machine 
> i opended 
> https://github.com/tesseract-ocr/langdata_lstm/blob/master/eng/okfonts.txt 
> and it contain 4567 font name
> should i search and download and install all missing fonts in the training 
> machine ?
>
> should i collect all fonts files from training machine and create new 
> fonts folder "HOME/.fonts" and paste all fonts in that folder? 
>
> i see fonts have diffirent extentions "*.ttf , *.otf , *.afm , ... "
> does all font types work in training or i need specific type ?
>
>
> I will write another question about the required text data .  
>
> Thanks for help
>
>
>
> Regards
> Essam
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f74b7970-db67-4cb5-aec4-7a17192dc0ef%40googlegroups.com.

[tesseract-ocr] Re: How to prepare fonts folder to train from scratch

Reply via email to