It is not clear what you need but you could have a page with a SYMBOL on it that is not common to your text (like a Greek letter Theta or Phi) and have the program recognize it.
Or write your own rough prerpocessor to toss pages that have a certian 2D barcode on them like Datamtrix. Hussein Al-Hussein From: [email protected] To: [email protected] Subject: Re: Is Tesseract Suitable for me and Separator pages Generally some scanners provides blank page removal / barcode based sepration but when comes to OCR engines some are only providing these functionalities.as i assume that it is nt dat much good practise by OCR engines..if that seperated the productivity will be high,,, On Mon, Jan 12, 2009 at 3:33 AM, HC <[email protected]> wrote: Hi everyone, I'm working on a project that requires recognition of numerical codes from within a box at the top of a page, there will be various other content below this box but it is largely irrelevant. Is Tesseract right for me? or is it too low level? Is there something more complex for instance that might recognise that text is in a box and pull it out, or is this too wishful at this point in time? On a seperate note, my HP scanner software will seperate out pages based on 'seperator' barcodes - can anyone provide advice about how I might do this seperately, e.g. have a generic separator page that is perhaps a series of diagonal stripes, and have tesseract recognise it somehow? Thanks Henri --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

