Unfortunately the University of Florida Library that put the material up as 
out of copyright issued their retraction and declared that the book is in 
copyright and therefore probably cannot be OCR'ed :((((((


On Thursday, January 17, 2013 10:55:48 AM UTC, Nick White wrote:
>
> Hi Olumide, 
>
> Do take a quick look at the training documentation, to see if you want 
> to have a go at training Tesseract. It would be great to have a 
> training file for with all the phonetic symbols available, if you 
> have the time to do it - I'm sure it would be useful to quite a few 
> other people. 
>
> Nick 
>
> On Wed, Jan 16, 2013 at 03:36:10PM -0800, [email protected] <javascript:>wrote: 
> > Thanks Sven. The problem is that no legend appears to be have been 
> provided. I 
> > am looking to *automatically* produce a word to phonemes mapping or list 
> (using 
> > a program) and its hard to determine where one phoneme ends and another 
> starts. 
> > The problem is that a character "a" for instance might be used in more 
> than one 
> > phoneme e.g. "ax", "ae", and smiple greedy matching won't work. 
> > 
> > Now if only I could get my hands on the Abbyy Fine Reader project file 
> ... I'd 
> > represent each phoneme by a unique character for a start and go from 
> there. 
> > 
> > On Wednesday, January 16, 2013 3:20:04 PM UTC, sventech wrote: 
> > 
> >     That particular dictionary has already been OCRed with Abbyy Fine 
> Reader: 
> >     http://archive.org/stream/everymansenglish00jone/ 
> >     everymansenglish00jone_djvu.txt 
> > 
> >     Although not perfect, a little cleanup would render that text quite 
> usable. 
> >     --Sven 
> > 
> > 
> >     On Wed, Jan 16, 2013 at 8:44 AM, Sven Pedersen <[email protected]> 
> wrote: 
> > 
> >         You would need to train tesseract to recognize those symbols. 
> The web 
> >         page outlines how to do that. 
> >         --Sven 
> > 
> > 
> >         On Tue, Jan 15, 2013 at 6:43 PM, <[email protected]> wrote: 
> > 
> >             Is Tesseract-OCR capable of recognizing phonetic symbols? I 
> would 
> >             like to extract the phonetic transcriptions of the following 
> (out 
> >             of copyright) document 
> >             
> http://archive.org/stream/everymansenglish00jone#page/2/mode/2up 
> > 
> >             Regards, 
> > 
> >             - Olumide 
> > 
> > 
> >             -- 
> >             You received this message because you are subscribed to the 
> Google 
> >             Groups "tesseract-ocr" group. 
> >             To post to this group, send email to 
> [email protected] 
> >             To unsubscribe from this group, send email to 
> >             [email protected] 
> >             For more options, visit this group at 
> >             http://groups.google.com/group/tesseract-ocr?hl=en 
> > 
> > 
> > 
> > 
> >         -- 
> >         ``All that is gold does not glitter, 
> >           not all those who wander are lost; 
> >         the old that is strong does not wither, 
> >           deep roots are not reached by the frost. 
> >         From the ashes a fire shall be woken, 
> >           a light from the shadows shall spring; 
> >         renewed shall be blade that was broken, 
> >           the crownless again shall be king.” 
> > 
> > 
> > 
> > 
> >     -- 
> >     ``All that is gold does not glitter, 
> >       not all those who wander are lost; 
> >     the old that is strong does not wither, 
> >       deep roots are not reached by the frost. 
> >     From the ashes a fire shall be woken, 
> >       a light from the shadows shall spring; 
> >     renewed shall be blade that was broken, 
> >       the crownless again shall be king.” 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "tesseract-ocr" group. 
> > To post to this group, send email to 
> > [email protected]<javascript:> 
> > To unsubscribe from this group, send email to 
> > [email protected] <javascript:> 
> > For more options, visit this group at 
> > http://groups.google.com/group/tesseract-ocr?hl=en 
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to