Karsten Hilbert wrote: > On Mon, Feb 19, 2007 at 07:04:35AM +1100, Tim Churches wrote: >> No, we need the data in computable form > OK, that kills the easy solution. Or it might not. If you > don't blend both sources of information (background image > and user input) but rather keep them separate and blend on > display/printing you'd still have the computable user input. > The drawback is that it lacks any metadata (apart from which > form it belongs to) as all the metadata would be encoded in > the *location* of what the user typed. Which in itself just > *might* lend itself to an OCR-like solution where a mask > image is overlaid onto the data thereby adding metadata to > it.
Hmm, that's a nice idea. It would be interesting to use PIL (Python image library) to do the form subtraction that you mention, leaving just the handwritten entries, and then present that to Tesseract OCR (which is in C and could be wrapped as a Python library, I'm sure), and see how it performs. But that's several days or a week of fiddling, and it may be fruitless, but if it worked... Re Tesseract, see also: http://www.groklaw.net/article.php?story=20061210115516438 Ah, so many ideas, so little time. Tim C
