Hi, Apologies for the noob questions. Trying to get the hang of Tesseract.
I have a number of images of Chinese genealogies that I'd love to be able to run OCR on. Most of them are similar to the two images linked below: wood-block fairly standard print, or, for newer images, actually printed standard font. Wood block print: http://www.flickr.com/photos/63588871@N05/8138563082/ Standard font print: http://www.flickr.com/photos/63588871@N05/8147864815/ Questions - What options do I use to tell Tesseract to read top-to-bottom, left-to-right? (I'm using Tesseract 3.02) - I expect that Tesseract will need to be train for the wood block texts at least. I can edit these images so that just the central text portion remains and so that the contrast is greater between the background and the characters. I can also generate text files with the characters in the image. How do I construct training files that use images where the lines are top-to-bottom and left-to-right? If you have any other advice for processing images like these, I'd really appreciate it. Thanks for your help! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

