Re: [CODE4LIB] Recommend book scanner?
Printed test sheets: http://www.diytrade.com/china/4/products/1707979/IEEE_Resolution_Chart.html?r=0 or http://www.aig-imaging.com/mm5/merchant.mvc?Screen=PRODStore_Code=AIIPIProduct_Code=QA-60Category_Code=Video-Scanner-Resolution-Charts At 04:54 PM 5/2/2009 -0700, st...@archive.org wrote: On 5/1/09 8:27 PM, Lars Aronsson wrote: Does anybody have a printed test sheet that we can scan or photo, and then compare the resulting digital images? It should have lines at various densities and areas of different colours, just like an old TV test image. Can you buy such calibration sheets? archive.org scans typically include a color card target image near the back (or front) of the book, e.g. http://www-steve.us.archive.org/public/data/eg/birdsthateverych00doub/birdsthateverych00doub_jp2/birdsthateverych00doub_0371.jp2 typical specs for our scanning rig (scribe) are roughly: 1 8x8x5' scribe structure 2 Canon EOS 5Ds 2 light boxes 1 orthogonal glass platen and cradle 1 foot pedal, pulley system Linux PC LAMP stack custom web-based UI gphoto, imagemagick, leptonica, rsync fast internet we scan over 1,000 books a day with about 100 scribes like this. /st...@archive.org
Re: [CODE4LIB] Recommend book scanner?
The National Archives has the guideline which describes target that you can use for scanning comparison. There are other targets used in other books/articles. I suggest that you check the National Archives' guidelines. http://www.archives.gov/preservation/technical/guidelines.html -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Lars Aronsson Sent: Friday, May 01, 2009 8:27 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Recommend book scanner? Mike Taylor wrote: Or not. Cheap cameras may well produce JPEGs that contain eight million pixels, but that doesn't mean that they are using all or even much of that resolution. Does anybody have a printed test sheet that we can scan or photo, and then compare the resulting digital images? It should have lines at various densities and areas of different colours, just like an old TV test image. Can you buy such calibration sheets? We could make it a standard routine, to always shoot such a sheet at the beginning of any captured book, to give the reader an idea of the digitization quality of the used equipment. They are called technical target in figure 14, page 149, of Lisa L. Fox (ed.), Preservation Microfilming, 2nd ed. (1996), ISBN 0-8389-0653-2. The example there is manufactured by AP International, http://www.a-p-international.com/ However, their price list is $100-400 per package of 50 sheets. I wouldn't pay more for the calibration targets than for the camera, if I could avoid it. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se Project Runeberg - free Nordic literature - http://runeberg.org/
Re: [CODE4LIB] Recommend book scanner?
Joe Atzberger writes: If you want real 300 dpi images, at anything like the quality you get from a flatbed scanner, then you're going to need cameras much more expensive than $100. Or just wait, say, about 3 years. Well, maybe. I guess not, though: the factor limiting image quality is not the electronics, which tends to fall rapidly in price, but the lens, which does not. Still, fingers crossed. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ It seems to me absurd to doubt that a man may be an ardent Theist and an evolutionist -- Charles Darwin, letter to John Fordyce, 7th May 1879.
Re: [CODE4LIB] Recommend book scanner?
On 5/1/09 8:27 PM, Lars Aronsson wrote: Does anybody have a printed test sheet that we can scan or photo, and then compare the resulting digital images? It should have lines at various densities and areas of different colours, just like an old TV test image. Can you buy such calibration sheets? archive.org scans typically include a color card target image near the back (or front) of the book, e.g. http://www-steve.us.archive.org/public/data/eg/birdsthateverych00doub/birdsthateverych00doub_jp2/birdsthateverych00doub_0371.jp2 typical specs for our scanning rig (scribe) are roughly: 1 8x8x5' scribe structure 2 Canon EOS 5Ds 2 light boxes 1 orthogonal glass platen and cradle 1 foot pedal, pulley system Linux PC LAMP stack custom web-based UI gphoto, imagemagick, leptonica, rsync fast internet we scan over 1,000 books a day with about 100 scribes like this. /st...@archive.org
Re: [CODE4LIB] Recommend book scanner?
st...@archive.org wrote: archive.org scans typically include a color card target image near the back (or front) of the book, e.g. That's great. But where do you buy these target cards? And are they useful for testing small compact cameras? An important difference between the bkrpr.org (book ripper) and the Scribe is that cheap cameras with cheap lenses and uneven lighting are used. That means we need to calibrate (or test) the resolution and colors in different corners of the image. With the Scribe, you can assume that these are even across the whole image. My standard Ubuntu and Firefox failed to view the Jpeg-2000 image. 1 8x8x5' scribe structure 2 Canon EOS 5Ds These are design decisions from 2003 or so. Today you would at least use the Digital Rebel XSi (EOS 450D) which gives 12 megapixels at a fraction of the price of the EOS 5D. I also think (?) the Scribe uses a 100 mm lens, which puts the camera at more than 1 meter away from the glass. A 50 mm lens would cut that distance (and the size of the whole machine) in half. But all such design revisions only bring us to 2007; they fall flat compared to the 2009 book ripper's radical use of the PowerShot A590 compact camera, mounted 12 inches away from the book, all inside a single plexiglass cube. we scan over 1,000 books a day with about 100 scribes like this. Are any of these in Europe? Is there a plan to convince European libraries to join the Internet Archive's system? -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/
Re: [CODE4LIB] Recommend book scanner?
On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. Cameras around $100 dollars are very low quality. You could get no where near the dpi recommended for materials that need to be OCRed. The quality of images from cameras would be not only low, but the OCR (even with the best software) would probably have many errors. For someone scanning items at home this might be ok, but for archival quality, I would not recommend cameras. If you are grant funded and the grant provider requires a certain level of quality, you need to make sure the scanning mechanism you use can scan at that quality. On Thu, Apr 30, 2009 at 11:49 AM, Erik Hetzner erik.hetz...@ucop.eduwrote: At Wed, 29 Apr 2009 13:32:08 -0400, Christine Schwartz wrote: We are looking into buying a book scanner which we'll probably use for archival papers as well--probably something in the $1,000.00 range. Any advice? Most organizations, or at least the big ones, Internet Archive and Google, seem to be using a design based on 2 fixed cameras rather than a tradition scanner type device. Is this what you had in mind? Unfortunately none of these products are cheap. Internet Archive’s Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because it has two very expensive cameras. Google’s data is unavailable. A company called Kirtas also sells what look like very expensive machines of a similar design. On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. I think that these are a real possibility for smaller organizations. The maturity of the software and workflow is problematic, but with Google’s Ocropus OCR software [4] freely available as the heart of a scanning workflow, the possibility is there. Both bkrpr and [3] have software currently available, although in the case of bkrpr at least the software is in the very early stages of development. best, Erik Hetzner 1. http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/ 2. http://bkrpr.org/doku.php 3. http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/ 4. http://code.google.com/p/ocropus/ ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3
Re: [CODE4LIB] Recommend book scanner?
Amanda P wrote: Cameras around $100 dollars are very low quality. You could get no where near the dpi recommended for materials that need to be OCRed. The quality of images from cameras would be not only low, but the OCR (even with the best software) would probably have many errors. For someone scanning items at home this might be ok, but for archival quality, I would not recommend cameras. If you are grant funded and the grant provider requires a certain level of quality, you need to make sure the scanning mechanism you use can scan at that quality. To capture an image 8.5 x 11 at 300 dpi, you need roughly 8.4 megapixels, which is well within the capabilities of an inexpensive pocket camera. (If you need 600 dpi, then you're in the 33.6 megapixel range.) As to whether the quality will be sufficient, this would depend on the goals and requirements of the project, but 300 dpi should be enough to get good OCR results for normal-sized text. Our very old version of PrimeOCR recommends 300 dpi, and suggests that 400 dpi may provide substantially better quality for text sizes smaller than 8 point, while 200 dpi will be sufficient for text 12 points and up. At 300 and 400 dpi on 19th Century small-print, variable quality texts, we are generally getting good to very good recognition: the quality of the original document itself is the limiting factor. More modern documents (and OCR software) should produce even better results. The cameras used by the Internet Archive are only 12 megapixels, though they are of substantially higher quality than a Canon PowerShot. Some applications require very high quality images, and cheap cameras might not be able to deliver the goods, but if you just want to make sure the text of your documents is digitally preserved and/or available to read online, you don't really need all that much in the way of hardware. Using a pocket camera and a stand to digitize more than a few pages is going to be slow, clumsy and painful, but for many applications, the end result may be entirely acceptable. -William
Re: [CODE4LIB] Recommend book scanner?
At Fri, 1 May 2009 09:51:19 -0500, Amanda P wrote: On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. Cameras around $100 dollars are very low quality. You could get no where near the dpi recommended for materials that need to be OCRed. The quality of images from cameras would be not only low, but the OCR (even with the best software) would probably have many errors. For someone scanning items at home this might be ok, but for archival quality, I would not recommend cameras. If you are grant funded and the grant provider requires a certain level of quality, you need to make sure the scanning mechanism you use can scan at that quality. I know very little about digital cameras, so I hope I get this right. According to Wikipedia, Google uses (or used) an 11MP camera (Elphel 323). You can get a 12MP camera for about $200. With a 12MP camera you should easily be able to get 300 DPI images of book pages and letter size archival documents. For a $100 camera you can get more or less 300 DPI images of book pages. * The problems I have always seen with OCR had much to do with alignment and artifacts than with DPI. 300 DPI is fine for OCR as far as my (limited) experience goes - as long as you have quality images. If your intention is to scan items for preservation, then, yes, you want higher quality - but I can’t imagine any setup for archival quality costing anywhere near $1000. If you just want to make scans full text OCR available, these setups seem worth looking at - especially if the software workflow can be improved. best, Erik * 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing pixels / 300). As long as you can get the camera close enough to the image to not waste much space you will be getting in the close to 300 DPI range for images of size 8.5 x 11 or less. ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgplxGVqVq0Xx.pgp Description: PGP signature
Re: [CODE4LIB] Recommend book scanner?
On Fri, May 1, 2009 at 5:39 PM, Mike Taylor m...@indexdata.com wrote: If you want real 300 dpi images, at anything like the quality you get from a flatbed scanner, then you're going to need cameras much more expensive than $100. Or just wait, say, about 3 years.
Re: [CODE4LIB] Recommend book scanner?
That is right. In addition, for certain printing (gold seal), digital camera delivers better result than scanners. -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind Sent: Friday, May 01, 2009 2:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Recommend book scanner? Yeah, I don't think people use cameras instead of flatbed scanners because they produce superior results, or are cheaper: They use them because they're _faster_ for large-scale digitization, and also make it possible to capture pages from rare/fragile materials with less damage to the materials. (Flatbeds are not good on bindings, if you want to get a good image). If these things don't apply, is there any reason not to use a flatbed scanner? Not that I know of? Jonathan Randy Stern wrote: My understanding is that a flatbed or sheetfed document scanner that produces 300 dpi will produce much better OCR results than a cheap digital camera that produces 300 dpi. The reasons have to do with the resolution and distortion of the resulting image, where resolution is defined as the number of line pairs per mm can be resolved (for example when scanning a test chart) - in other words the details that will show up for character images, and distortion is image aberration that can appear at the edges of the page image areas, particularly when illumination is not even. A scanner has much more even illumination. At 11:21 AM 5/1/2009 -0700, Erik Hetzner wrote: At Fri, 1 May 2009 09:51:19 -0500, Amanda P wrote: On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. Cameras around $100 dollars are very low quality. You could get no where near the dpi recommended for materials that need to be OCRed. The quality of images from cameras would be not only low, but the OCR (even with the best software) would probably have many errors. For someone scanning items at home this might be ok, but for archival quality, I would not recommend cameras. If you are grant funded and the grant provider requires a certain level of quality, you need to make sure the scanning mechanism you use can scan at that quality. I know very little about digital cameras, so I hope I get this right. According to Wikipedia, Google uses (or used) an 11MP camera (Elphel 323). You can get a 12MP camera for about $200. With a 12MP camera you should easily be able to get 300 DPI images of book pages and letter size archival documents. For a $100 camera you can get more or less 300 DPI images of book pages. * The problems I have always seen with OCR had much to do with alignment and artifacts than with DPI. 300 DPI is fine for OCR as far as my (limited) experience goes - as long as you have quality images. If your intention is to scan items for preservation, then, yes, you want higher quality - but I can’t imagine any setup for archival quality costing anywhere near $1000. If you just want to make scans full text OCR available, these setups seem worth looking at - especially if the software workflow can be improved. best, Erik * 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing pixels / 300). As long as you can get the camera close enough to the image to not waste much space you will be getting in the close to 300 DPI range for images of size 8.5 x 11 or less. ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3
Re: [CODE4LIB] Recommend book scanner?
My understanding is that a flatbed or sheetfed document scanner that produces 300 dpi will produce much better OCR results than a cheap digital camera that produces 300 dpi. The reasons have to do with the resolution and distortion of the resulting image, where resolution is defined as the number of line pairs per mm can be resolved (for example when scanning a test chart) - in other words the details that will show up for character images, and distortion is image aberration that can appear at the edges of the page image areas, particularly when illumination is not even. A scanner has much more even illumination. At 11:21 AM 5/1/2009 -0700, Erik Hetzner wrote: At Fri, 1 May 2009 09:51:19 -0500, Amanda P wrote: On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. Cameras around $100 dollars are very low quality. You could get no where near the dpi recommended for materials that need to be OCRed. The quality of images from cameras would be not only low, but the OCR (even with the best software) would probably have many errors. For someone scanning items at home this might be ok, but for archival quality, I would not recommend cameras. If you are grant funded and the grant provider requires a certain level of quality, you need to make sure the scanning mechanism you use can scan at that quality. I know very little about digital cameras, so I hope I get this right. According to Wikipedia, Google uses (or used) an 11MP camera (Elphel 323). You can get a 12MP camera for about $200. With a 12MP camera you should easily be able to get 300 DPI images of book pages and letter size archival documents. For a $100 camera you can get more or less 300 DPI images of book pages. * The problems I have always seen with OCR had much to do with alignment and artifacts than with DPI. 300 DPI is fine for OCR as far as my (limited) experience goes - as long as you have quality images. If your intention is to scan items for preservation, then, yes, you want higher quality - but I canât imagine any setup for archival quality costing anywhere near $1000. If you just want to make scans full text OCR available, these setups seem worth looking at - especially if the software workflow can be improved. best, Erik * 12 MP seems to equal 4256 x 2848 pixels. To take a âscanâ (photo) of a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing pixels / 300). As long as you can get the camera close enough to the image to not waste much space you will be getting in the close to 300 DPI range for images of size 8.5 x 11 or less. ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3
Re: [CODE4LIB] Recommend book scanner?
William Wueppelmann writes: Cameras around $100 dollars are very low quality. You could get no where near the dpi recommended for materials that need to be OCRed. The quality of images from cameras would be not only low, but the OCR (even with the best software) would probably have many errors. For someone scanning items at home this might be ok, but for archival quality, I would not recommend cameras. If you are grant funded and the grant provider requires a certain level of quality, you need to make sure the scanning mechanism you use can scan at that quality. To capture an image 8.5 x 11 at 300 dpi, you need roughly 8.4 megapixels, which is well within the capabilities of an inexpensive pocket camera. Or not. Cheap cameras may well produce JPEGs that contain eight million pixels, but that doesn't mean that they are using all or even much of that resolution. In my experience, most cheap cameras are producing way more data that their lenses can actually feed them, so that you can halve the resolution or more without losing any actual information. Such cameras will, in effect, give you a 150 dpi scan -- even if that scan is expressed as a 300 dpi image. If you want real 300 dpi images, at anything like the quality you get from a flatbed scanner, then you're going to need cameras much more expensive than $100. _/|____ /o ) \/ Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk )_v__/\ I think it should either be unrestricted garnishing, or a single Olympic standard mayonaisse -- Monty Python.
Re: [CODE4LIB] Recommend book scanner?
Yeah, I don't think people use cameras instead of flatbed scanners because they produce superior results, or are cheaper: They use them because they're _faster_ for large-scale digitization, and also make it possible to capture pages from rare/fragile materials with less damage to the materials. (Flatbeds are not good on bindings, if you want to get a good image). If these things don't apply, is there any reason not to use a flatbed scanner? Not that I know of? Jonathan Randy Stern wrote: My understanding is that a flatbed or sheetfed document scanner that produces 300 dpi will produce much better OCR results than a cheap digital camera that produces 300 dpi. The reasons have to do with the resolution and distortion of the resulting image, where resolution is defined as the number of line pairs per mm can be resolved (for example when scanning a test chart) - in other words the details that will show up for character images, and distortion is image aberration that can appear at the edges of the page image areas, particularly when illumination is not even. A scanner has much more even illumination. At 11:21 AM 5/1/2009 -0700, Erik Hetzner wrote: At Fri, 1 May 2009 09:51:19 -0500, Amanda P wrote: On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. Cameras around $100 dollars are very low quality. You could get no where near the dpi recommended for materials that need to be OCRed. The quality of images from cameras would be not only low, but the OCR (even with the best software) would probably have many errors. For someone scanning items at home this might be ok, but for archival quality, I would not recommend cameras. If you are grant funded and the grant provider requires a certain level of quality, you need to make sure the scanning mechanism you use can scan at that quality. I know very little about digital cameras, so I hope I get this right. According to Wikipedia, Google uses (or used) an 11MP camera (Elphel 323). You can get a 12MP camera for about $200. With a 12MP camera you should easily be able to get 300 DPI images of book pages and letter size archival documents. For a $100 camera you can get more or less 300 DPI images of book pages. * The problems I have always seen with OCR had much to do with alignment and artifacts than with DPI. 300 DPI is fine for OCR as far as my (limited) experience goes - as long as you have quality images. If your intention is to scan items for preservation, then, yes, you want higher quality - but I can’t imagine any setup for archival quality costing anywhere near $1000. If you just want to make scans full text OCR available, these setups seem worth looking at - especially if the software workflow can be improved. best, Erik * 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing pixels / 300). As long as you can get the camera close enough to the image to not waste much space you will be getting in the close to 300 DPI range for images of size 8.5 x 11 or less. ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3
Re: [CODE4LIB] Recommend book scanner?
Mike Taylor wrote: Or not. Cheap cameras may well produce JPEGs that contain eight million pixels, but that doesn't mean that they are using all or even much of that resolution. Does anybody have a printed test sheet that we can scan or photo, and then compare the resulting digital images? It should have lines at various densities and areas of different colours, just like an old TV test image. Can you buy such calibration sheets? We could make it a standard routine, to always shoot such a sheet at the beginning of any captured book, to give the reader an idea of the digitization quality of the used equipment. They are called technical target in figure 14, page 149, of Lisa L. Fox (ed.), Preservation Microfilming, 2nd ed. (1996), ISBN 0-8389-0653-2. The example there is manufactured by AP International, http://www.a-p-international.com/ However, their price list is $100-400 per package of 50 sheets. I wouldn't pay more for the calibration targets than for the camera, if I could avoid it. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se Project Runeberg - free Nordic literature - http://runeberg.org/
Re: [CODE4LIB] Recommend book scanner?
At Wed, 29 Apr 2009 13:32:08 -0400, Christine Schwartz wrote: We are looking into buying a book scanner which we'll probably use for archival papers as well--probably something in the $1,000.00 range. Any advice? Most organizations, or at least the big ones, Internet Archive and Google, seem to be using a design based on 2 fixed cameras rather than a tradition scanner type device. Is this what you had in mind? Unfortunately none of these products are cheap. Internet Archive’s Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because it has two very expensive cameras. Google’s data is unavailable. A company called Kirtas also sells what look like very expensive machines of a similar design. On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. I think that these are a real possibility for smaller organizations. The maturity of the software and workflow is problematic, but with Google’s Ocropus OCR software [4] freely available as the heart of a scanning workflow, the possibility is there. Both bkrpr and [3] have software currently available, although in the case of bkrpr at least the software is in the very early stages of development. best, Erik Hetzner 1. http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/ 2. http://bkrpr.org/doku.php 3. http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/ 4. http://code.google.com/p/ocropus/ ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpYI2WLVtxUI.pgp Description: PGP signature
Re: [CODE4LIB] Recommend book scanner?
How good are the two-camera apparatuses for scanning things other than books? The thing about the Google and Kirtas scanners is that they are not particularly recommended for dealing with fragile books or otherwise special collections materials. The University of Virginia Library is still using the single camera method for books as well as manuscripts, photographs, slides, coins, etc. It's a Hasselblad camera with a 45 megapixel Phase One digital back, but significantly out of the $1000 range. I haven't dealt with all the camera hardware you could find, but I have never seen a professional digitization hardware/software suite for as low as a thousand bucks. You could check out i2s's copibook series ( http://www.i2s-bookscanner.com/produits.asp?gamme=1003sX_Menu_selectedID=leftV_1003_MOD), but I have no idea how much they cost; they don't say. Erik's idea of building something custom is an option, but you might not necessarily get consistent quality and production rate. Have you considered partnering with Princeton University's digitization labs? The UVA Health Sciences Library occasionally borrows/trades/buys resources from the university library's digitization services (the health system and university are technically two different entities). For all I know, someone from Princeton University is on this list; I don't know what their resources are and don't presume to speak for them. That's just my idea. Ethan Gruber On Thu, Apr 30, 2009 at 12:49 PM, Erik Hetzner erik.hetz...@ucop.eduwrote: At Wed, 29 Apr 2009 13:32:08 -0400, Christine Schwartz wrote: We are looking into buying a book scanner which we'll probably use for archival papers as well--probably something in the $1,000.00 range. Any advice? Most organizations, or at least the big ones, Internet Archive and Google, seem to be using a design based on 2 fixed cameras rather than a tradition scanner type device. Is this what you had in mind? Unfortunately none of these products are cheap. Internet Archive’s Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because it has two very expensive cameras. Google’s data is unavailable. A company called Kirtas also sells what look like very expensive machines of a similar design. On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. I think that these are a real possibility for smaller organizations. The maturity of the software and workflow is problematic, but with Google’s Ocropus OCR software [4] freely available as the heart of a scanning workflow, the possibility is there. Both bkrpr and [3] have software currently available, although in the case of bkrpr at least the software is in the very early stages of development. best, Erik Hetzner 1. http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/ 2. http://bkrpr.org/doku.php 3. http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/ 4. http://code.google.com/p/ocropus/ ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3
Re: [CODE4LIB] Recommend book scanner?
Erik Hetzner wrote: At Wed, 29 Apr 2009 13:32:08 -0400, Christine Schwartz wrote: We are looking into buying a book scanner which we'll probably use for archival papers as well--probably something in the $1,000.00 range. Any advice? Most organizations, or at least the big ones, Internet Archive and Google, seem to be using a design based on 2 fixed cameras rather than a tradition scanner type device. Is this what you had in mind? This is probably the type of machine that will be needed for books if they need to remain bound throughout the scanning process. For looseleaf materials or for books that can be disbound and are in good condition, you can get inexpensive duplex sheet feeder scanners for a few hundred dollars that might be good enough. Unfortunately none of these products are cheap. Internet Archive’s Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because it has two very expensive cameras. Google’s data is unavailable. A company called Kirtas also sells what look like very expensive machines of a similar design. $15K seems pretty cheap for that kind of scanner; most that I've seen run from the tens of thousands well into the hundreds, depending on the model and features. I don't remember precisely what IA's Scribe stations cost, but I think they were more in the range of $40-60K CAD; it would probably be cheaper in the US, but not that much cheaper, and I suspect that IA gets some sort of bulk discount for buying them by the truckload. The main issues to consider are: - Type of material: is it fragile or not; is it rare; can you afford to damage or destroy a copy during the scanning process; can the items be disbound; what is the minimum and maximum size of item to be scanned; if books are to remain bound, are the bindings tight or are the margins; paper thickness; existence of damage, water spotting, show through, and other defects - Scanning resolution required - Image output (color/greyscale/black and white) and output format (TIFF, JPEG2000, PDF, JPEG). - Throughput requirement. (How much stuff do you have: dozens/hundreds/thousands/millions of pages, and how quickly do you need to get it done: days/weeks/months/years?) - How much technical work can/are you willing to do yourself? Can you invest in substantial post-processing, or do you need to be able to press Go on the scanner and produce a more or less finished product? If so, what sort of metadata, OCR, etc. requirements do you have, if any, in addition to getting the basic image? For some projects, there are suitable desktop scanners available for very little money, and in some cases, using a decent (7 megapixel or higher) digital camera in conjunction with a stand and maybe an image editor like Photoshop (or something free like Irfanview) to crop and deskew afterwards might work just fine, but in other cases, a much more elaborate setup might be needed. -- William Wueppelmann Systems Librarian/Programmer Canadiana.org http://www.canadiana.org