[ocropus] Re: Recognizing multi-language text with accented characters?

Tom Sun, 14 Nov 2010 23:03:04 -0800

On Oct 18, 4:18 pm, "Robert B." <[email protected]> wrote:
> Hi all,
>
> Does anyone know if OCRopus is up to the challenge of recognizing a
> page from, say, a French-English dictionary?


> Such a dictionary would
> feature two columns, italic characters, accented characters, French
> words, and words that would not appear in any model of the language
> (for example, a breakdown of syllables).
>
> How far off is OCRopus from recognizing such a page? What is the work
> that would need to be done?

Layout analysis and italics are there.

For accented characters, we just spend basically the last year doing
what's necessary to support Unicode; this prompted the move to
Python.  We're slowly getting the bugs out and will basically be
releasing once that works.

Language modeling like what you need is already fully supported
through the use of OpenFST language models; you can create weighted
combinations of, say, a dictionary and a syllabic model.

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

[ocropus] Re: Recognizing multi-language text with accented characters?

Reply via email to