Thanks, I appreciate the suggestions!

On Friday, November 2, 2012 1:48:45 PM UTC-4, sventech wrote:
>
> Cutting off the borders and possibly adding white borders might help. 
> Normalizing out the text that bleeds through the page would also help. 
> The text is clear, so you might not need to retrain. 
> --Sven 
>
> On Fri, Nov 2, 2012 at 10:32 AM, Devin Bean 
> <[email protected]<javascript:>> 
> wrote: 
> > Hi, 
> > 
> > Apologies for the noob questions. Trying to get the hang of Tesseract. 
> > 
> > I have a number of images of Chinese genealogies that I'd love to be 
> able to 
> > run OCR on. Most of them are similar to the two images linked below: 
> > wood-block fairly standard print, or, for newer images, actually printed 
> > standard font. 
> > 
> > Wood block print: http://www.flickr.com/photos/63588871@N05/8138563082/ 
> > Standard font print: 
> http://www.flickr.com/photos/63588871@N05/8147864815/ 
> > 
> > Questions 
> > - What options do I use to tell Tesseract to read top-to-bottom, 
> > left-to-right? (I'm using Tesseract 3.02) 
> > - I expect that Tesseract will need to be train for the wood block texts 
> at 
> > least. I can edit these images so that just the central text portion 
> remains 
> > and so that the contrast is greater between the background and the 
> > characters. I can also generate text files with the characters in the 
> image. 
> > How do I construct training files that use images where the lines are 
> > top-to-bottom and left-to-right? 
> > 
> > If you have any other advice for processing images like these, I'd 
> really 
> > appreciate it. 
> > 
> > Thanks for your help! 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "tesseract-ocr" group. 
> > To post to this group, send email to 
> > [email protected]<javascript:> 
> > To unsubscribe from this group, send email to 
> > [email protected] <javascript:> 
> > For more options, visit this group at 
> > http://groups.google.com/group/tesseract-ocr?hl=en 
>
>
>
> -- 
> ``All that is gold does not glitter, 
>   not all those who wander are lost; 
> the old that is strong does not wither, 
>   deep roots are not reached by the frost. 
> From the ashes a fire shall be woken, 
>   a light from the shadows shall spring; 
> renewed shall be blade that was broken, 
>   the crownless again shall be king.” 
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to