> > > On Mon, Aug 17, 2015 at 6:07 AM, ShreeDevi Kumar <[email protected]> > wrote: > >> Ray was looking for comparative feedback regarding the new traineddata >> for RTL languages, so this will be useful. >> > >>>> Ray - https://groups.google.com/forum/#!msg/tesseract-dev/qcFtWCAAlT8/SZ4xBS5DHwwJ
Another caveat worth noting is that I only tested a small fraction of these languages - maybe 25? I suspect, for instance, that all the Arabic-based langages except ara don't work very well. I would be interested in an more feedback on how bad it is in any of them, and will take suggestions into account for the next version after 3.04. >> As far as I know, Google Docs does not use tesseract OCR engine for >> recognizing the text. >> > > Interesting. Can you please clarify source of your knowledge? > > >> Its OCR accuracy is better than Tesseract for some Indian languages also. >> However, it doesn't seem to handle tifs, and processes only first 10 pages >> of a pdf. >> > https://support.google.com/drive/answer/176692?hl=en > >> >> On Sun, Aug 16, 2015 at 7:14 PM, Hossein Razizadeh <[email protected]> >> wrote: >> >>> It seems 'fas' is for Persian, but there are no cube files, resulting in >>> poor results. Arabic language files work much better for Persian images. >>> There is another 'per' folder for Persian, but there isn't even >>> '.traieddata' file for it. Does anyone know if 'Google Doc' has used >>> 'Tesseract' for its OCR engine? Google Docs performs OCR for Persian images >>> with good accuracy! >>> >>> On Saturday, July 18, 2015 at 8:14:07 AM UTC+4:30, Jeff Breidenbach >>> wrote: >>>> >>>> I think 'fas' is the language code for Persian. >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/edd64e28-9e52-4b44-80cc-0aaa442caa85%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/edd64e28-9e52-4b44-80cc-0aaa442caa85%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX%2B9UqeXbWr-E7sADWK3SeyjiyUiJBH6wSJoMy_E2geuQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX%2B9UqeXbWr-E7sADWK3SeyjiyUiJBH6wSJoMy_E2geuQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wxnq4BBwAZD%2BL-7rg80z2FmRpCQg4b8QMaXi-SLUoUcQ%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wxnq4BBwAZD%2BL-7rg80z2FmRpCQg4b8QMaXi-SLUoUcQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUuHrGarj9Ek8u01R36y7HjmCGH7zqmPCxbBoCc3xpp2w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

