> From:Glen Batchelor > I hope that your PO number is visually unique, because > it will be difficult to locate and match by text in > any OCR unless it's: > > 1) in the same physical place on every invoice > 2) always starts or ends with the same unique string > > Many newer OCR+PDF convertor software apps support > barcode detection, but the best I've made use of it is > to have the barcode show up better in the final PDF. > I've not used it to route paperwork, yet.
The responses so far explain my confusion. There are email text invoices, Emails with attachments in various formats, Fax, hard copy, EDI, XML according to OASIS or proprietary standards, web services... The data acquisition will be unique for each medium, so the notion of having one software package to do it all seemed a bit magical. Ignoring the electronic EDI/XML types, I think all of the other document formats can be scanned physically if they're not in digital format already, then as others suggest they can be OCR'd, but the software needs to be "trainable" if the process is to be automated, since all vendors have different document formatting and you will find the invoice number in different places. As to a solution that does that optical scanning part, http://kofax.com/ has various offerings, and we have researched something like this for a client and the best tools for development that we found were on this site: http://www.leadtools.com/home2/VertMkts/LTProdOvrvw.htm. Since those are components, a solution would need to be written around them. If you're looking to spend tens of thousands of dollars on this (and that's not unusual in this area) then you might prefer a less expensive DIY solution. While I offered to write a solution using LeadTools, our client never took it past the investigation stage, so I can't comment as to whether making or buying is more effective. One thing that came loud and clear out of that research - don't skimp on the scanning tools: a low-quality or low-resolution scanner on the front-end will create a need for lots of manual intervention to resolve errors later. Get good equipment up-front so the OCR-related software has good bits to work with. And before purchasing a solution, do a trial with a wide variety of your documents so that you can be sure it is fast enough and accurate enough for your purpose. As to document management with indexing after you have some metadata, there are many products like http://docuxplorer.com/ which can be integrated with Universe, and http://www.1mage.com/ specializes in the MV market. It may help to separate out the tasks in order to create a better definition of what you're looking for. These include data acquisition, digital scanning, data scanning, indexing, and retrieval. Any solution should provide an API, and make the indexing and retrieval part with Universe fairly trivial compared to the rest of the package. I hope that helps. Tony Gravagno Nebula Research and Development TG@ remove.pleaseNebula-RnD.com Nebula R&D sells mv.NET and other Pick/MultiValue products worldwide, and provides related development and training services (Nebula R&D does not sell any of the offerings mentioned and has no affiliation with any of the companies.) ------- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/