Re: [tesseract-ocr] Re: How to prepare fonts folder to train from scratch

2020-03-25 Thread Essam Zaky
My target is to recognize Arabic with numbers and punctuation + English there are some English lines contain Arabic word and Some Arabic lines contain English word i did some page layout analysis and split the text to lines and try to detect the language of each word depending on word geometry

Re: [tesseract-ocr] Re: How to prepare fonts folder to train from scratch

2020-03-25 Thread Shree Devi Kumar
The issue with Arabic is related to RTL processing and how punctuation and digits are handled. If your training text does not have them, you will have greater success. On Wed, Mar 25, 2020, 15:32 Essam Zaky wrote: > Thanx @Loranzo and @Shree > i will give try to fine tune , and if the result

[tesseract-ocr] Re: How to prepare fonts folder to train from scratch

2020-03-25 Thread Essam Zaky
Thanx @Loranzo and @Shree i will give try to fine tune , and if the result still not satisfied will switch again to build from scratch بتاريخ الثلاثاء، 24 مارس، 2020 10:05:03 م UTC+2، كتب Essam Zaky: > > Hi Dears , > > I would like to build *.traindata from scratch specially for English and >

Re: [tesseract-ocr] Re: How to prepare fonts folder to train from scratch

2020-03-25 Thread Lorenzo Bolzani
I think fine tuning may work very well in this case, no need to train from scratch. Training from scratch does not guarantee better results, especially if you don't do it correctly. I suggest to try fine tuning first and see if the results are good enough for you. In this way you get comfortable

[tesseract-ocr] Re: How to prepare fonts folder to train from scratch

2020-03-25 Thread Essam Zaky
@Lorenozo I need to do that because because the accuracy of current Arabic not very good as English , and i have a lot fonts need to add to Arabic model adding them by fine tune will affect the model so i need to build from scratch and make the model more generalized so i need to know what is

Re: [tesseract-ocr] Re: How to prepare fonts folder to train from scratch

2020-03-25 Thread Shree Devi Kumar
AFAIK Ray is involved in other projects at Google. Unlikely to get a reply from him. See https://github.com/tesseract-ocr/tesstrain/wiki for training done by @stweil on similar scale for Fraktur. The pages list the hardware requirements, time taken etc. Please check that you have enough

[tesseract-ocr] Re: How to prepare fonts folder to train from scratch

2020-03-25 Thread Essam Zaky
Thanks @shreeshrii Would answer the questions depending on your experience , also is it possible to get help from Ray ? بتاريخ الثلاثاء، 24 مارس، 2020 10:05:03 م UTC+2، كتب Essam Zaky: > > Hi Dears , > > I would like to build *.traindata from scratch specially for English and > Arabic > > So