Re: Accentuated characters in non english text

tmb Mon, 08 Jun 2009 15:37:57 -0700

The latest version of OCRopus uses a new line recognizer that is
different from Tesseract.  The new line recognizer has not been
trained on accented characters, so it's not recognizing them.


It would be good if someone could train European accented character
models (of course, we first need to document better how to do this).

Tom

On Jun 8, 6:29 pm, Gabriel <[email protected]> wrote:
> I have downloaded and compiled the latest sources 
> fromhttp://mercurial.iupr.org
> with scons.
> My OS is Ubuntu 9.04.
>
> I used OCRopus to recognize French text with a sample page in PNG
> format.
>   ocropus page page01.png
>
> Before I did
>   export tesslanguage=fra
> to be sure that Tesseract had the right language.
>
> The result was nice but accentuated characters are not recognized.
>
> But when I try directly using Tesseract (with the same page in TIFF
> format) its seems OK with accentuated characters.
>
> Did I need special dictionnaries for OCROpus ?
>
> Gabriel
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Accentuated characters in non english text

Reply via email to