I have recently published a small C library, which might be of some use to folks on this list.
gomr is a GPLv3 library which provides basic Optical Mark Recognition (OMR) and Code 3of9 barcode reading (as well as some scan cleaning features). It was developed to score student 'bubble' sheets, and has been refined over the last few years for speed and accuracy. to date, more than 4 million images have been processed, with a very low error rate. gomr can process more than 2000 pages per minute on a single (albeit fast) cpu. the code relies chiefly on features of the sheet in order to correct scanning errors, so these might not be useful in your case: * rotation is identified by trying to 'strike' lines across the page at a variety of angles, finding the angle which produces the most entirely white lines. the searching algorithm uses interlacing and caching for speed. perhaps a future version will support black-background as well. * horizontal/vertical offset and upside-down scans are identified by locating a prominent barcode, and centering that with the caller's defaults. * the 'bubbles' are assumed to come in large blocks, with reasonable internal and external whitespace. There are also some more general functions: * remove speckles * remove trash stripes * make GIF thumbnails speed is of utmost importance in the commercial operation of gomr, so: * only low-resolution binary data is required. * gomr will open zlib compressed images in ram * rotation is corrected using a 'double-sheer' algorithm. * the barcode algo is split into two parts- a fast 'finder' and a 'reader'. * etc... The code works well in our use, but will most likely NOT work for you without modification. Its GPL, so you can do that yourself, or you can contract with us, but make sure you understand and follow the license! www.thebility.com/gomr/ allan -- "The truth is an offense, but not a sin"
