Chinese OCR - top-down right-left orientation and training

Devin Bean Fri, 02 Nov 2012 09:26:26 -0700

Hi,

Apologies for the noob questions. Trying to get the hang of Tesseract.


I have a number of images of Chinese genealogies that I'd love to be able 
to run OCR on. Most of them are similar to the two images linked below: 
wood-block fairly standard print, or, for newer images, actually printed 
standard font.

Wood block print: http://www.flickr.com/photos/63588871@N05/8138563082/
Standard font print: http://www.flickr.com/photos/63588871@N05/8147864815/

Questions
- What options do I use to tell Tesseract to read top-to-bottom, 
left-to-right? (I'm using Tesseract 3.02)
- I expect that Tesseract will need to be train for the wood block texts at 
least. I can edit these images so that just the central text portion 
remains and so that the contrast is greater between the background and the 
characters. I can also generate text files with the characters in the 
image. How do I construct training files that use images where the lines 
are top-to-bottom and left-to-right?

If you have any other advice for processing images like these, I'd really 
appreciate it.

Thanks for your help!

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Chinese OCR - top-down right-left orientation and training

Reply via email to