*Hello Nick, *
thank you very much for your answer. I'm coming from the user space, my 
technical background is limited. This is why my understanding is always 
only half of the game, but at least getting some directions.

So I found this: http://www.openocr.net/ - I've worked with docker, so I 
understand a bit the architecture, and I like that it hides away the more 
diffcult stuff and it is nicely boxed :)
If you scroll down, there is a features list. The 4th item says:

   - Pass arguments to Tesseract such as character whitelist and page 
   segment mode

Is this what we are looking for? "page segment mode" and "character 
whitelist"? If I puzzle that together correctly, this is the api access you 
talked about... But I don't feel confident enough to make a conclusion. I 
would prefer to abstract the things, I don't understand. This is fair, 
isn't it? :)

Thanks and I hope this might inspire anyone, who happens to read this 
anyhow...


Am Dienstag, 29. Juli 2014 15:27:10 UTC-5 schrieb David Arnold:
>
> Hello,
>
> for a theoretical application of advanced invoice registration/indexing 
> there, it would be very useful, if besides of training a specific invoice 
> template, to pass a RegEx-Filter to Zone Scans.
>
> Imagine you wan't to retrive the date of a receipt which is in a zone you 
> either mark by hand or which is fixed. In the environment of a specific 
> invoice template (training file) this might be always in the format:
>
> 24/JUL/2014
>
> thus using only the followning broader subset:
>
> ##/AAA/####
>
> or the following narrower subset:
>
> dd/MMM/YYYY
>
> I think if it would be possible to pass such a regex to the individual 
> scan tellin tesseract to use only that specific subset of characters to 
> process the image zone would giva a neer 100% accuracy even on dirty 
> receipt scans which have seen a tropical monsun befor they have been 
> scanned...
>
> What do you know about / think about?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3b8ab424-1cb8-4e4d-a438-73f0ca6489e1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to