Thanks, I appreciate the suggestions! On Friday, November 2, 2012 1:48:45 PM UTC-4, sventech wrote: > > Cutting off the borders and possibly adding white borders might help. > Normalizing out the text that bleeds through the page would also help. > The text is clear, so you might not need to retrain. > --Sven > > On Fri, Nov 2, 2012 at 10:32 AM, Devin Bean > <[email protected]<javascript:>> > wrote: > > Hi, > > > > Apologies for the noob questions. Trying to get the hang of Tesseract. > > > > I have a number of images of Chinese genealogies that I'd love to be > able to > > run OCR on. Most of them are similar to the two images linked below: > > wood-block fairly standard print, or, for newer images, actually printed > > standard font. > > > > Wood block print: http://www.flickr.com/photos/63588871@N05/8138563082/ > > Standard font print: > http://www.flickr.com/photos/63588871@N05/8147864815/ > > > > Questions > > - What options do I use to tell Tesseract to read top-to-bottom, > > left-to-right? (I'm using Tesseract 3.02) > > - I expect that Tesseract will need to be train for the wood block texts > at > > least. I can edit these images so that just the central text portion > remains > > and so that the contrast is greater between the background and the > > characters. I can also generate text files with the characters in the > image. > > How do I construct training files that use images where the lines are > > top-to-bottom and left-to-right? > > > > If you have any other advice for processing images like these, I'd > really > > appreciate it. > > > > Thanks for your help! > > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to > > [email protected]<javascript:> > > To unsubscribe from this group, send email to > > [email protected] <javascript:> > > For more options, visit this group at > > http://groups.google.com/group/tesseract-ocr?hl=en > > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

