Tim Churches wrote: > Karsten Hilbert wrote: >> Well, the path of least resistance here is to scan it and >> use it as a background image in some text editor or other so >> that what you type appears to be written into the fields >> while it is (technically) written on top of the background >> image. We then save the result as any other old document >> tied into the medical record. > > No, we need the data in computable form for epidemiological (aggregate) > analysis - images of numbers nd characters must be converted to ASCII or > Unicode bytes. There is a commercial product, Teleform, which does this > reasonably well - see > http://www.cardiff.com/products/teleform/index.html - and we may just > provide an interface which can load data which has been scanned off > hand-written forms using that, but gee, an open source solution would be > nice. Suggestions very welcome.
A few months ago Google released Tesseract OCR, an oCR engine developed in the 1990s by Hewlett-Packard. Apparently it was state-of-the-art in 1995, but that's over a decade ago, and has not been developed since. There don't seem to be any other open source OCR engines around that are being actively developed or which are anything more than demos or proofs-of-concept. And Teleform seems to have the OCR-from-paper-forms market almost to themselves. I think we'll have to build a batch input interface that Teleform can be plugged into - I think it exports to XML, or at the very least CSV files. But if anyone can suggest an alternative for turning data recorded on paper forms into data (as opposed to raster image) files, we'd love to hear of it. Tim C
