[ocropus] Re: OCRopus 0.5.4 and UTF-8 encoding

Tom Wed, 25 Jul 2012 12:00:53 -0700

OCRopus hasn't been trained with accented characters, so it can't recognize 
them for now.


OCRopus internals should be largely Unicode-compliant now and we have 
trained version 0.5 with German.  Current tip versions contain new language 
modeling and recognition code that may still break with Unicode, but that 
should be fixed by 0.6.

Tom

On Monday, July 23, 2012 11:35:04 PM UTC+2, c.kruk wrote:
>
> I prepared a few sample PNG files including Polish-language text using 
> different TeX fonts. I processed them with OCRopus and I stated the program 
> ignores all diacritic characters replacing them with the similar ASCII 
> characters. For example the phrase: "pójdź kińże tę chmurność w głąb 
> flaszy" is rendered as: "pdjd2 kih2e tg ChmurnosC W glqb flaszy". 
>
>
>  I read a little about the previous OCRopus versions using the Tesseract 
> program and I learned that UTF-8 recognition was one of the biggest 
> advantages of these applications. The new OCRopus is poorly documented as 
> yet so I don't know why OCRopus ignores UTF-8 encoded characters.
>
>
>  I use the simple 'ocropus file.png' command. What should I do in order 
> to allow OCRopus to use UTF-8? Maybe I should use some switch in the 
> command line? Or maybe I should learn OCRopus the TeX fonts? Or maybe I 
> should install some additional packages? 
>
>
>  I have no idea what could I do. Every help will be welcomed.
>
>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msg/ocropus/-/9d5r1igrBDwJ.
For more options, visit https://groups.google.com/groups/opt_out.

[ocropus] Re: OCRopus 0.5.4 and UTF-8 encoding

Reply via email to