On Oct 18, 4:18 pm, "Robert B." <[email protected]> wrote: > Hi all, > > Does anyone know if OCRopus is up to the challenge of recognizing a > page from, say, a French-English dictionary?
> Such a dictionary would > feature two columns, italic characters, accented characters, French > words, and words that would not appear in any model of the language > (for example, a breakdown of syllables). > > How far off is OCRopus from recognizing such a page? What is the work > that would need to be done? Layout analysis and italics are there. For accented characters, we just spend basically the last year doing what's necessary to support Unicode; this prompted the move to Python. We're slowly getting the bugs out and will basically be releasing once that works. Language modeling like what you need is already fully supported through the use of OpenFST language models; you can create weighted combinations of, say, a dictionary and a syllabic model. Tom -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
