@Lorenozo I need to do that because because the accuracy of current Arabic not very good as English , and i have a lot fonts need to add to Arabic model adding them by fine tune will affect the model so i need to build from scratch and make the model more generalized so i need to know what is done in English model and take it as a reference to make new Arabic model
بتاريخ الثلاثاء، 24 مارس، 2020 10:05:03 م UTC+2، كتب Essam Zaky: > > Hi Dears , > > I would like to build *.traindata from scratch specially for English and > Arabic > > So lets talk about English as example > my question how to prepare fonts folder? > > i read the > https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh > > file > i found the this file contain about only 32 font name > should i add other Latin fonts installed in the training machine to the > previous file "language-specific.sh" ? > > > i used "font manger" tool and i found about 147 font installed in training > machine > i opended > https://github.com/tesseract-ocr/langdata_lstm/blob/master/eng/okfonts.txt > and it contain 4567 font name > should i search and download and install all missing fonts in the training > machine ? > > should i collect all fonts files from training machine and create new > fonts folder "HOME/.fonts" and paste all fonts in that folder? > > i see fonts have diffirent extentions "*.ttf , *.otf , *.afm , ... " > does all font types work in training or i need specific type ? > > > I will write another question about the required text data . > > Thanks for help > > > > Regards > Essam > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f74b7970-db67-4cb5-aec4-7a17192dc0ef%40googlegroups.com.

