Hi Olumide,

Do take a quick look at the training documentation, to see if you want
to have a go at training Tesseract. It would be great to have a
training file for with all the phonetic symbols available, if you
have the time to do it - I'm sure it would be useful to quite a few
other people.

Nick

On Wed, Jan 16, 2013 at 03:36:10PM -0800, [email protected] wrote:
> Thanks Sven. The problem is that no legend appears to be have been provided. I
> am looking to *automatically* produce a word to phonemes mapping or list 
> (using
> a program) and its hard to determine where one phoneme ends and another 
> starts.
> The problem is that a character "a" for instance might be used in more than 
> one
> phoneme e.g. "ax", "ae", and smiple greedy matching won't work.
> 
> Now if only I could get my hands on the Abbyy Fine Reader project file ... I'd
> represent each phoneme by a unique character for a start and go from there.
> 
> On Wednesday, January 16, 2013 3:20:04 PM UTC, sventech wrote:
> 
>     That particular dictionary has already been OCRed with Abbyy Fine Reader:
>     http://archive.org/stream/everymansenglish00jone/
>     everymansenglish00jone_djvu.txt
> 
>     Although not perfect, a little cleanup would render that text quite 
> usable.
>     --Sven
> 
> 
>     On Wed, Jan 16, 2013 at 8:44 AM, Sven Pedersen <[email protected]> 
> wrote:
> 
>         You would need to train tesseract to recognize those symbols. The web
>         page outlines how to do that.
>         --Sven
> 
> 
>         On Tue, Jan 15, 2013 at 6:43 PM, <[email protected]> wrote:
> 
>             Is Tesseract-OCR capable of recognizing phonetic symbols? I would
>             like to extract the phonetic transcriptions of the following (out
>             of copyright) document
>             http://archive.org/stream/everymansenglish00jone#page/2/mode/2up
> 
>             Regards,
> 
>             - Olumide
> 
> 
>             --
>             You received this message because you are subscribed to the Google
>             Groups "tesseract-ocr" group.
>             To post to this group, send email to [email protected]
>             To unsubscribe from this group, send email to
>             [email protected]
>             For more options, visit this group at
>             http://groups.google.com/group/tesseract-ocr?hl=en
> 
> 
> 
> 
>         --
>         ``All that is gold does not glitter,
>           not all those who wander are lost;
>         the old that is strong does not wither,
>           deep roots are not reached by the frost.
>         From the ashes a fire shall be woken,
>           a light from the shadows shall spring;
>         renewed shall be blade that was broken,
>           the crownless again shall be king.”
> 
> 
> 
> 
>     --
>     ``All that is gold does not glitter,
>       not all those who wander are lost;
>     the old that is strong does not wither,
>       deep roots are not reached by the frost.
>     From the ashes a fire shall be woken,
>       a light from the shadows shall spring;
>     renewed shall be blade that was broken,
>       the crownless again shall be king.”
> 
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to