> Hi everyone,
>
> I'm working on a project that requires recognition of numerical codes
> from within a box at the top of a page, there will be various other
> content below this box but it is largely irrelevant. Is Tesseract
> right for me? or is it too low level? Is there something more complex
> for instance that might recognise that text is in a box and pull it
> out, or is this too wishful at this point in time?

If all your scans are going to be at the same resolution, you could
use imagemagick or something to extract just the numerical code you
want (as an image) and have tesseract OCR that.

> On a seperate note, my HP scanner software will seperate out pages
> based on 'seperator' barcodes - can anyone provide advice about how I
> might do this seperately, e.g. have a generic separator page that is
> perhaps a series of diagonal stripes, and have tesseract recognise it
> somehow?
>
> Thanks
>
> Henri
> >
>



-- 
Michael Moore
-------------------------
Share your families' genealogy and family history books. It's easy and
free : http://bookscanned.com

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to