Thanks Sven. The problem is that no legend appears to be have been provided. I am looking to *automatically* produce a word to phonemes mapping or list (using a program) and its hard to determine where one phoneme ends and another starts. The problem is that a character "a" for instance might be used in more than one phoneme e.g. "ax", "ae", and smiple greedy matching won't work.
Now if only I could get my hands on the Abbyy Fine Reader project file ... I'd represent each phoneme by a unique character for a start and go from there. On Wednesday, January 16, 2013 3:20:04 PM UTC, sventech wrote: > > That particular dictionary has already been OCRed with Abbyy Fine Reader: > > http://archive.org/stream/everymansenglish00jone/everymansenglish00jone_djvu.txt > > Although not perfect, a little cleanup would render that text quite usable. > --Sven > > > On Wed, Jan 16, 2013 at 8:44 AM, Sven Pedersen > <[email protected]<javascript:> > > wrote: > >> You would need to train tesseract to recognize those symbols. The web >> page outlines how to do that. >> --Sven >> >> >> On Tue, Jan 15, 2013 at 6:43 PM, <[email protected] <javascript:>> wrote: >> >>> Is Tesseract-OCR capable of recognizing phonetic symbols? I would like >>> to extract the phonetic transcriptions of the following (out of copyright) >>> document >>> http://archive.org/stream/everymansenglish00jone#page/2/mode/2up >>> >>> Regards, >>> >>> - Olumide >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected]<javascript:> >>> To unsubscribe from this group, send email to >>> [email protected] <javascript:> >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> >> >> -- >> ``All that is gold does not glitter, >> not all those who wander are lost; >> the old that is strong does not wither, >> deep roots are not reached by the frost. >> From the ashes a fire shall be woken, >> a light from the shadows shall spring; >> renewed shall be blade that was broken, >> the crownless again shall be king.” >> > > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

