It is not clear what you need but you could have a page with a SYMBOL on it 
that is not common to your text (like a Greek letter Theta or Phi) and have the 
program recognize it.

Or write your own rough prerpocessor to toss pages that have a certian 2D 
barcode on them like Datamtrix.

Hussein Al-Hussein

From: [email protected]
To: [email protected]
Subject: Re: Is Tesseract Suitable for me and Separator pages

Generally some scanners provides blank page removal / barcode based sepration  
but when comes to OCR engines some are only providing these functionalities.as 
i assume that  it is nt dat much good practise by OCR engines..if that 
seperated the productivity will be high,,,


On Mon, Jan 12, 2009 at 3:33 AM, HC <[email protected]> wrote:



Hi everyone,



I'm working on a project that requires recognition of numerical codes

from within a box at the top of a page, there will be various other

content below this box but it is largely irrelevant. Is Tesseract

right for me? or is it too low level? Is there something more complex

for instance that might recognise that text is in a box and pull it

out, or is this too wishful at this point in time?



On a seperate note, my HP scanner software will seperate out pages

based on 'seperator' barcodes - can anyone provide advice about how I

might do this seperately, e.g. have a generic separator page that is

perhaps a series of diagonal stripes, and have tesseract recognise it

somehow?



Thanks



Henri










--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to