OCR very large images - smart method to split into regions first?

walter23 Thu, 17 Nov 2011 18:49:52 -0800

I'm trying to come up with a method to OCR very large images (poster
sized) with lots of regular sized text... for example 40" wide with 12
point font.  One big limitation I have is that memory is easily
exhausted with images that take up half a gigabyte or more of RAM
(40x30" @ 300DPI is pretty big).


I am trying to find out a smart method of automatically reducing the
image to continuous regions of text so that I do not chop text lines
in half (either horizontally or vertically).

One idea was to maybe use page segmentation on a lower resolution
image and use this page layout to split the image up, but looking at
the layout results I see some problems with this.

Has anybody tackled this kind of problem before?  Suggestions for
approaches to take?

Many thanks

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

OCR very large images - smart method to split into regions first?

Reply via email to