Sounds like maybe a bad version of Tess -- which version do you have? Latest svn would be good for what you're doing. Sven
On Saturday, October 1, 2011, Slavko Kocjancic <[email protected]> wrote: > Dne 29.9.2011 21:39, piše Sven Pedersen: >> >> Thanks Calomer. >> >> Bonny, is the language you're trying to improve using a different set >> of characters (alphabet)? If so, you'll need to do a lot of training >> as Calomer described. Otherwise you'll just need some tweaks. The font >> may be an issue. >> --Sven >> >> > > Seems that I'm not clear enougth or just my english is not good enougth. So I try to explain again. > I have sacns of english text. But in the text is a lot of foregin names (but just english characters) > And when I apply the OCR the text is recongnized without problems. But the names is many times wrong, and confidence (I use commandline and hOCR output) is low on that words (names). > > As I wan't to proffread the text I write application to show text in editor and image in other window. And I get confidence from hOCR to show text where tess means that can be wrong. And all the names is marked red in example as they are not in dictionary. (I use prebuilt eng.traineddata). The attached page is just index and that names appear in the book many times. So I just wonder if I can put that words (names) in eng.user-words to make confidence better. So I don't want to train new characters or new font. Just wan't to add new word to dictionary. And just to be used in particiculary book. Is that possible? > > As I discowered for now just adding text file eng.user-words has no efect. So what steps are required to put it on? > > hopefuy It's clear enougth now. > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

