I'm trying to come up with a method to OCR very large images (poster
sized) with lots of regular sized text... for example 40" wide with 12
point font.  One big limitation I have is that memory is easily
exhausted with images that take up half a gigabyte or more of RAM
(40x30" @ 300DPI is pretty big).

I am trying to find out a smart method of automatically reducing the
image to continuous regions of text so that I do not chop text lines
in half (either horizontally or vertically).

One idea was to maybe use page segmentation on a lower resolution
image and use this page layout to split the image up, but looking at
the layout results I see some problems with this.

Has anybody tackled this kind of problem before?  Suggestions for
approaches to take?

Many thanks

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to