European characters problem, some are ok, some not.

tobltobs Fri, 06 Mar 2009 04:44:53 -0800

Hello

I am running tesseract 2.03 on debian 4 (compiled, not the packaged
version) from the command line.
It is a great peace of software and runs quite well.
But I have problems with european characters like the german umlauts.


The only non ascii character which is recognized is the é . All other
special caracters are not recognized. It doesn't make any difference
which language I specify. The results with the eurotext test image are
always the same.

If I open deu.unicharset with nano I have a few lines which look
strange, like
�^�^� 0
> 0
ö 3
¢ 0
$ 0
é 3
�^�^� 0

The result of the tesseract the eurotext image is:

The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from [email protected] is spam.
Der ,.schnelle” braune Fuchs springt
iiber den faulen Hund. Le renard brun
<<rapide» saute par·dessus le chien
paresseux. La volpe marrone rapida
salta sopra il cane pigro. El zorro
marrén répido salta sobre el perro

Does anybody has an idea where the problem is?

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

European characters problem, some are ok, some not.

Reply via email to