You should try cube (multiple language) "-l eng+deu" with the European scripts to see if a combination works well, but honestly unless this list is very long it would be easier to OCR it and correct it by hand for scannos (OCR equiv. of typos). Otherwise, you'll need to train starting with English -- not from scratch. Follow the FAQ and let us know if you have problems. -_Sven
On Wed, Jan 9, 2013 at 10:57 PM, TerrenceW <[email protected] > wrote: > I have an image containing English words with their phonetic > equivalents printed alongside (a pronunciation guide.) It's actually > for a spelling bee. The organizers of the bee would very much like to > get this image as a textual list (so they can manipulate it, pull > words from it randomly, and so on) and they came to me for help. I've > used Tesseract before, but only to recognize English and French text. > > The pronunciation guide contains text with lots of umlauts, upside > down letter e's, etc. More specialized characters than are contained > in French (which is one thing I tried, in an effort to improve the > recognition.) Does anyone have any advice for recognizing characters > like this? Should I start training Tesseract to recognize them? I've > never done any training before, which is why I'm a bit reluctant. > > Thanks in advance! > > Terrence > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

