I fixed the problem with the strange chars in the deu.unicharset. It was a wrong configured locale setting. But the problem that no Umlaute are recoognized is still not fixed. On an another system with Ubuntu 8.04 I don't have this problem. The svn Version and the packaged version do both recognize umlauts. Is this a bug in the 2.03 version? As the system with the umlaut problem is a productive system, I would like to have a good chance that a new installation or build will fix the problem before I shut down the server.
Thanks for your help. On Mar 6, 1:44 pm, tobltobs <[email protected]> wrote: > Hello > > I am running tesseract 2.03 on debian 4 (compiled, not the packaged > version) from the command line. > It is a great peace of software and runs quite well. > But I have problems with european characters like the german umlauts. > > The only non ascii character which is recognized is the é . All other > special caracters are not recognized. It doesn't make any difference > which language I specify. The results with the eurotext test image are > always the same. > > If I open deu.unicharset with nano I have a few lines which look > strange, like > ^ ^ 0> 0 > > ö 3 > ¢ 0 > $ 0 > é 3 > ^ ^ 0 > > The result of the tesseract the eurotext image is: > > The (quick) [brown] {fox} jumps! > Over the $43,456.78 <lazy> #90 dog > & duck/goose, as 12.5% of E-mail > from [email protected] is spam. > Der ,.schnelle” braune Fuchs springt > iiber den faulen Hund. Le renard brun > <<rapide» saute par·dessus le chien > paresseux. La volpe marrone rapida > salta sopra il cane pigro. El zorro > marrén répido salta sobre el perro > > Does anybody has an idea where the problem is? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

