On Monday 03 November 2008 07:00:04 m. allan noah wrote: > On Mon, Nov 3, 2008 at 4:44 AM, Rod De Beer <REDeBeer at dla.gov.za> wrote: > > hi Allan > > > > Thanks so much for your reply !!!!! > > > > A little more info -: > > > > i am going with the idea of dropping the windows based scanning system > > loading Linux > > on my severs and workstations and looking for a linux based high volume > > scan interface > > where i can save the documents as multipage PDF and slot them as blob in > > the database > > with the METADATA using Mysql as the DB and i develop my system around > > that ???? > > I personally don't know of any such system, but perhaps someone else > will. Writing something like that would not be very hard, gscan2pdf > already does the front half of it, and it is perl, so very easy to > extend to a backend db. > > > My scanners requirements are page size upto and including A3 black and > > White 200dpi multipage PDF (imbedded metadata) and volumes as mentioned > > below, At the moment i am leasing about 200 Canon dr9080c 's. YES !! the > > Fijisu does > > look like a strong option to go for outright purchase. > > oh, the canon backend i am working on is to support the DR-9080C, so > you might be able to continue your lease :) > > allan > -- > "The truth is an offense, but not a sin"
Scanning 5000 pages/day/scanner works out to around 10ppm (assuming an 8-hr work day). Many low end scanners can achieve more than 10ppm of A4 at 200dpi b/w. Document prep and archiving will probably be at least as important as the actual time spent scanning. It's certainly faster to scan one 100-page document and upload it to a database than to scan 100 1-page documents and upload them to a db with individual filenames and index field data. A search on www.sourceforge.net for "document archiving scanning" brings up a number of opensource solutions such as OpenLSD, Maarch Archiving DMS, and Maxview Document Management. There are also a number of content management systems (DMS) such as knowledgetree (a php-4 based web application) which could be used to store scanned documents in a db backend. I myself have used archiveindex's Repository program to store scanned documents in a db. The Repository is a C-based cgi program which stores document path+filenames in a heirarchical BerkleyDB-4 file. The scanned documents are uploaded (and renamed to a hex representation of system time) to user-defined storage areas with size quotas (each storage area can be setup to hold a max of e.g. one CD or one DVD worth of file storage). The web app presents the user with a heirarchical storage scheme with folder icons and thumbnails of documents (pdf, jpeg, dvi, ...). There are also scripts which can be run from the command line (or other programs) to do eg batch queries or batch uploads/downloads etc. Scanned documents are automatically tokenized and indexed. There is also an optional OCR program (for Windows only). The webapp provides access to your db from any browser (but you can restrict access to your webserver however you like). Some drawbacks of the program are: 1) not opensource (your data is locked into proprietary format) -- the greater your financial investment in scanning the riskier the proprietary lock in feels. 2) 32-bit binaries (with dependencies on berkley db-4.0, netpbm and certain system utilities e.g. the keylock program parses the output of 'ifconfig eth0| grep HWaddr' and won't work with sys-apps/net-tools-1.60-r13 or greater) It should probably be fairly straightforward to implement something similar using modern opensource tools such as CatalystFramework or RubyonRails for the web-based frontend and mysql or postgresql for the backend.