Hi Olumide, Do take a quick look at the training documentation, to see if you want to have a go at training Tesseract. It would be great to have a training file for with all the phonetic symbols available, if you have the time to do it - I'm sure it would be useful to quite a few other people.
Nick On Wed, Jan 16, 2013 at 03:36:10PM -0800, [email protected] wrote: > Thanks Sven. The problem is that no legend appears to be have been provided. I > am looking to *automatically* produce a word to phonemes mapping or list > (using > a program) and its hard to determine where one phoneme ends and another > starts. > The problem is that a character "a" for instance might be used in more than > one > phoneme e.g. "ax", "ae", and smiple greedy matching won't work. > > Now if only I could get my hands on the Abbyy Fine Reader project file ... I'd > represent each phoneme by a unique character for a start and go from there. > > On Wednesday, January 16, 2013 3:20:04 PM UTC, sventech wrote: > > That particular dictionary has already been OCRed with Abbyy Fine Reader: > http://archive.org/stream/everymansenglish00jone/ > everymansenglish00jone_djvu.txt > > Although not perfect, a little cleanup would render that text quite > usable. > --Sven > > > On Wed, Jan 16, 2013 at 8:44 AM, Sven Pedersen <[email protected]> > wrote: > > You would need to train tesseract to recognize those symbols. The web > page outlines how to do that. > --Sven > > > On Tue, Jan 15, 2013 at 6:43 PM, <[email protected]> wrote: > > Is Tesseract-OCR capable of recognizing phonetic symbols? I would > like to extract the phonetic transcriptions of the following (out > of copyright) document > http://archive.org/stream/everymansenglish00jone#page/2/mode/2up > > Regards, > > - Olumide > > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > > > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

