One problem is 'registration', where you shift the scanned image left/right and up/down until it matches the original. I suspect OCRopus layout analysis could help with this. Does the scanned image need to be scaled?
If you do not care whether there are checkmarks or X's then this is no longer an OCR project. You may do well to script something using perlmagick or gimp script. You said 'written numbers'. That is a difficult OCR problem, unless you can capture stylus strokes as they do on a smartphone screen. On Tuesday, October 7, 2014 1:34:23 PM UTC-4, kingIZZZY wrote: > > BH > > Hello, > > I am an experienced programmer, but absolute newbie to OCR / document > analysis / all computer optical recognition. > > *Desired Effect *(workflow I'm trying to program) > > - Dynamically generate a form intended to be printed and filled out IRL > - Scan the completed form and obtain its data > > > *Type of Data* > > - *Highest Priority*: Check boxes, filled in by pen / pencil / marker > etc., marked with check-mark, X, diagonal strike, etc. > - Optional: Written Numbers, circled options, > > > *Theoretical Coding Solution* > > - When generating a form, store layout / coordinate information of > form elements > - Place recognizable anchors (rotated 'L' s or '+' symbols) at the > corners of the printed page to define a general known rectangular area > - Print a bar-code or numeric identifier at pre-defined coordinates in > the rectangle area > - Obtain data out of form elements using layout/format information & > coordinates previously stored for this identified form > > > *Bottom line*: Is this possible? *How to do this*? What do I need to > learn in order to get to a point where I know how to use OCRopus (or other > libraries) to achieve these results? > > > ------------------------------ > > Related Links (describe some technical aspects & bits of theoretical > solutions, but no practical road-map of how to actualize this) > > - > > http://stackoverflow.com/questions/15227243/what-is-the-proper-way-to-test-if-checkbox-is-ticked-on-scanned-document > - > > https://groups.google.com/forum/#!searchin/tesseract-ocr/checkbox/tesseract-ocr/kvyILJMuuCI/iJeQc0ga-OkJ > > > -- You received this message because you are subscribed to the Google Groups "ocropus" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ocropus/d26a8acb-62c1-46d7-b12a-c519e7e6d78e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
