Re: Accentuated characters in non english text

tmb Mon, 08 Jun 2009 15:39:26 -0700

I should add that there will be Tesseract support again; it's just not
part of the default build right now.


In fact, an easy way of training a new language is by using Tesseract
to bootstrap the new line recognizer; any language that Tesseract
supports can be trained that way, and the OCRopus recognizer can give
you better performance than the Tesseract recognizer even if it was
initially trained based on Tesseract output.

Tom

On Jun 8, 6:29 pm, Gabriel <[email protected]> wrote:
> I have downloaded and compiled the latest sources 
> fromhttp://mercurial.iupr.org
> with scons.
> My OS is Ubuntu 9.04.
>
> I used OCRopus to recognize French text with a sample page in PNG
> format.
>   ocropus page page01.png
>
> Before I did
>   export tesslanguage=fra
> to be sure that Tesseract had the right language.
>
> The result was nice but accentuated characters are not recognized.
>
> But when I try directly using Tesseract (with the same page in TIFF
> format) its seems OK with accentuated characters.
>
> Did I need special dictionnaries for OCROpus ?
>
> Gabriel
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Accentuated characters in non english text

Reply via email to