Thanks Tom.

I did some more experimenting.

I opened a page of text from the NY times in print mode, and captured
a portion of the screen. Ocropus got the headline perfectly! I was
happy to see that. However the rest of the article was all gibberish.
The font was a reasonably large font size, probably a bit larger than
I usually use to read. I measured a lowercase h in pixels, it's 14
pixels high and 10 pixels wide. Maybe that's too small?

I took an image including graphics and text and ran it through
ocropus, and it got a portion of the text on the image correctly. At
first, it didn't get any of the text. I tripled image size in
photoshop with resampling of the image, saved as PNG. Then I got a
"PNG error" when trying the recognition. Not sure what that's about. I
cropped the image to include only a portion of the original image, in
case it was too large now, and ran it through ocropus. This time, it
was able to detect some of the text on the image and interpret it
properly. The portion that it looked at contained both non-text
graphics as well as text.

Since a portion of the data returned was non-gibberish, it seems we
could at least use this to detect whether or not an image contains
text. I.e., if ocropus returns a string of at least X characters of a-
z, then probably contains text. The gibberish seems to be mainly
symbols such as #, etc. That could be useful info, whether or not an
image contains text. Does ocropus want the entire image to be just
text? Does it confuse it if text is present along with graphics? Is
there perhaps a way to run it such that it only tries to identify
obvious text and return whether or not it thinks text is present on
the page? We could potentially do the programming for that. If that's
all the info that's needed, I assume it could run a lot faster. We'd
like it to be able to do a lot more, but I'm just exploring some
possibilities here.

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

Reply via email to