Hi Mayce, Abbyy does not disclose how they train their system, and Google releases some information, but certainly not all. Are you concerned about a particular type of document? You can train it yourself to focus on a given domain, or inquire here (check archives first) about it. --Sven
On Mon, Sep 26, 2011 at 9:08 AM, Mayce Al <[email protected]> wrote: > Hi All, > I was looking for more information about which datasets have been use to > train Tesseract and Abbyy to recognize English documents. I could not find > further information, except that Tesseract is tested on UNLV-datasets. > Does anyone have any idea about this? > Best Regards > Mayce > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

