Hi David,

You're right, that would be useful. Tesseract has a basic version of 
that, called "patterns"; search the manpage for a bit of information 
on them.

However at present they can't be assigned per region, only as 
possible patterns for the whole OCR job. Also they aren't 
restrictive, but more "suggestive".

If you were using the API you could totally set only the pattern you 
wanted, and only recognise the region you with the zone, and that 
should work quite well. Give it a try if you have time, and let us 
know how it works.

Nick 

On Tue, Jul 29, 2014 at 01:27:10PM -0700, David Arnold wrote:
> Hello,
> 
> for a theoretical application of advanced invoice registration/indexing there,
> it would be very useful, if besides of training a specific invoice template, 
> to
> pass a RegEx-Filter to Zone Scans.
> 
> Imagine you wan't to retrive the date of a receipt which is in a zone you
> either mark by hand or which is fixed. In the environment of a specific 
> invoice
> template (training file) this might be always in the format:
> 
> 24/JUL/2014
> 
> thus using only the followning broader subset:
> 
> ##/AAA/####
> 
> or the following narrower subset:
> 
> dd/MMM/YYYY
> 
> I think if it would be possible to pass such a regex to the individual scan
> tellin tesseract to use only that specific subset of characters to process the
> image zone would giva a neer 100% accuracy even on dirty receipt scans which
> have seen a tropical monsun befor they have been scanned...
> 
> What do you know about / think about?
> 
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email
> to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> tesseract-ocr/418748be-e224-49ce-93b2-a8386cbbf7f5%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/20140812165648.GC18932%40manta.lan.
For more options, visit https://groups.google.com/d/optout.

Reply via email to