Karsten Hilbert wrote:
> On Mon, Feb 19, 2007 at 07:04:35AM +1100, Tim Churches wrote:
>> No, we need the data in computable form
> OK, that kills the easy solution. Or it might not. If you
> don't blend both sources of information (background image
> and user input) but rather keep them separate and blend on
> display/printing you'd still have the computable user input.
> The drawback is that it lacks any metadata (apart from which
> form it belongs to) as all the metadata would be encoded in
> the *location* of what the user typed. Which in itself just
> *might* lend itself to an OCR-like solution where a mask
> image is overlaid onto the data thereby adding metadata to
> it.

Hmm, that's a nice idea. It would be interesting to use PIL (Python
image library) to do the form subtraction that you mention, leaving just
the handwritten entries, and then present that to Tesseract OCR (which
is in C and could be wrapped as a Python library, I'm sure), and see how
it performs. But that's several days or a week of fiddling, and it may
be fruitless, but if it worked...

Re Tesseract, see also:
http://www.groklaw.net/article.php?story=20061210115516438

Ah, so many ideas, so little time.

Tim C

Reply via email to