Re: [CODE4LIB] Recommend book scanner?

2009-05-04 Thread Randy Stern

Printed test sheets:

http://www.diytrade.com/china/4/products/1707979/IEEE_Resolution_Chart.html?r=0

or

http://www.aig-imaging.com/mm5/merchant.mvc?Screen=PRODStore_Code=AIIPIProduct_Code=QA-60Category_Code=Video-Scanner-Resolution-Charts

At 04:54 PM 5/2/2009 -0700, st...@archive.org wrote:

On 5/1/09 8:27 PM, Lars Aronsson wrote:
Does anybody have a printed test sheet that we can scan or photo, and 
then compare the resulting digital images?  It should have lines at 
various densities and areas of different colours, just like an old TV 
test image.  Can you buy such calibration sheets?


archive.org scans typically include a color card target
image near the back (or front) of the book, e.g.

http://www-steve.us.archive.org/public/data/eg/birdsthateverych00doub/birdsthateverych00doub_jp2/birdsthateverych00doub_0371.jp2

typical specs for our scanning rig (scribe) are roughly:

  1 8x8x5' scribe structure
  2 Canon EOS 5Ds
  2 light boxes
  1 orthogonal glass platen and cradle
  1 foot pedal, pulley system
  Linux PC
LAMP stack
custom web-based UI
gphoto, imagemagick, leptonica, rsync
  fast internet


we scan over 1,000 books a day with about 100 scribes like this.


/st...@archive.org


Re: [CODE4LIB] Recommend book scanner?

2009-05-04 Thread Han, Yan
The National Archives has the guideline which describes target that you
can use for scanning comparison. There are other targets used in other
books/articles. 
I suggest that you check the National Archives' guidelines.
http://www.archives.gov/preservation/technical/guidelines.html 

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Lars Aronsson
Sent: Friday, May 01, 2009 8:27 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Recommend book scanner?

Mike Taylor wrote:

 Or not.  Cheap cameras may well produce JPEGs that contain eight 
 million pixels, but that doesn't mean that they are using all or 
 even much of that resolution.

Does anybody have a printed test sheet that we can scan or photo, 
and then compare the resulting digital images?  It should have 
lines at various densities and areas of different colours, just 
like an old TV test image.  Can you buy such calibration sheets?

We could make it a standard routine, to always shoot such a sheet 
at the beginning of any captured book, to give the reader an idea 
of the digitization quality of the used equipment.

They are called technical target in figure 14, page 149, of
Lisa L. Fox (ed.), Preservation Microfilming, 2nd ed. (1996), 
ISBN 0-8389-0653-2.  The example there is manufactured by AP 
International, http://www.a-p-international.com/

However, their price list is $100-400 per package of 50 sheets.
I wouldn't pay more for the calibration targets than for the
camera, if I could avoid it.


-- 
  Lars Aronsson (l...@aronsson.se)
  Aronsson Datateknik - http://aronsson.se

  Project Runeberg - free Nordic literature - http://runeberg.org/


Re: [CODE4LIB] Recommend book scanner?

2009-05-02 Thread Mike Taylor
Joe Atzberger writes:
   If you want real 300 dpi images, at anything like the quality you
   get from a flatbed scanner, then you're going to need cameras
   much more expensive than $100.
 
  Or just wait, say, about 3 years.

Well, maybe.  I guess not, though: the factor limiting image quality
is not the electronics, which tends to fall rapidly in price, but the
lens, which does not.  Still, fingers crossed.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  It seems to me absurd to doubt that a man may be an ardent Theist
 and an evolutionist -- Charles Darwin, letter to John Fordyce,
 7th May 1879.


Re: [CODE4LIB] Recommend book scanner?

2009-05-02 Thread st...@archive.org

On 5/1/09 8:27 PM, Lars Aronsson wrote:
Does anybody have a printed test sheet that we can scan or photo, 
and then compare the resulting digital images?  It should have 
lines at various densities and areas of different colours, just 
like an old TV test image.  Can you buy such calibration sheets?


archive.org scans typically include a color card target
image near the back (or front) of the book, e.g.

http://www-steve.us.archive.org/public/data/eg/birdsthateverych00doub/birdsthateverych00doub_jp2/birdsthateverych00doub_0371.jp2

typical specs for our scanning rig (scribe) are roughly:

  1 8x8x5' scribe structure
  2 Canon EOS 5Ds
  2 light boxes
  1 orthogonal glass platen and cradle
  1 foot pedal, pulley system
  Linux PC
LAMP stack
custom web-based UI
gphoto, imagemagick, leptonica, rsync
  fast internet


we scan over 1,000 books a day with about 100 scribes like this.


/st...@archive.org


Re: [CODE4LIB] Recommend book scanner?

2009-05-02 Thread Lars Aronsson
st...@archive.org wrote:

 archive.org scans typically include a color card target
 image near the back (or front) of the book, e.g.

That's great.  But where do you buy these target cards?  And are 
they useful for testing small compact cameras?  An important 
difference between the bkrpr.org (book ripper) and the Scribe is 
that cheap cameras with cheap lenses and uneven lighting are used.  
That means we need to calibrate (or test) the resolution and 
colors in different corners of the image.  With the Scribe, you 
can assume that these are even across the whole image.

My standard Ubuntu and Firefox failed to view the Jpeg-2000 image.

   1 8x8x5' scribe structure
   2 Canon EOS 5Ds

These are design decisions from 2003 or so. Today you would at 
least use the Digital Rebel XSi (EOS 450D) which gives 12 
megapixels at a fraction of the price of the EOS 5D.

I also think (?) the Scribe uses a 100 mm lens, which puts the 
camera at more than 1 meter away from the glass.  A 50 mm lens 
would cut that distance (and the size of the whole machine) in 
half.

But all such design revisions only bring us to 2007; they fall 
flat compared to the 2009 book ripper's radical use of the 
PowerShot A590 compact camera, mounted 12 inches away from the 
book, all inside a single plexiglass cube.

 we scan over 1,000 books a day with about 100 scribes like this.

Are any of these in Europe?  Is there a plan to convince European 
libraries to join the Internet Archive's system?


-- 
  Lars Aronsson (l...@aronsson.se)
  Project Runeberg - free Nordic literature - http://runeberg.org/


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Amanda P
On the other hand, there are projects like bkrpr [2] and [3],
home-brew scanning stations build for marginally more than the cost of
a pair of $100 cameras.

Cameras around $100 dollars are very low quality. You could get no where
near the dpi recommended for materials that need to be OCRed. The quality of
images from cameras would be not only low, but the OCR (even with the best
software) would probably have many errors. For someone scanning items at
home this might be ok, but for archival quality, I would not recommend
cameras. If you are grant funded and the grant provider requires a certain
level of quality, you need to make sure the scanning mechanism you use can
scan at that quality.



On Thu, Apr 30, 2009 at 11:49 AM, Erik Hetzner erik.hetz...@ucop.eduwrote:

 At Wed, 29 Apr 2009 13:32:08 -0400,
 Christine Schwartz wrote:
 
  We are looking into buying a book scanner which we'll probably use for
  archival papers as well--probably something in the $1,000.00 range.
 
  Any advice?

 Most organizations, or at least the big ones, Internet Archive and
 Google, seem to be using a design based on 2 fixed cameras rather than
 a tradition scanner type device. Is this what you had in mind?

 Unfortunately none of these products are cheap. Internet Archive’s
 Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because
 it has two very expensive cameras. Google’s data is unavailable. A
 company called Kirtas also sells what look like very expensive
 machines of a similar design.

 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras. I think that these are a real possibility for
 smaller organizations. The maturity of the software and workflow is
 problematic, but with Google’s Ocropus OCR software [4] freely
 available as the heart of a scanning workflow, the possibility is
 there. Both bkrpr and [3] have software currently available, although
 in the case of bkrpr at least the software is in the very early stages
 of development.

 best,
 Erik Hetzner

 1. 
 http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/
 
 2. http://bkrpr.org/doku.php
 3. 
 http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/
 
 4. http://code.google.com/p/ocropus/

 ;; Erik Hetzner, California Digital Library
 ;; gnupg key id: 1024D/01DB07E3




Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread William Wueppelmann

Amanda P wrote:

Cameras around $100 dollars are very low quality. You could get no where
near the dpi recommended for materials that need to be OCRed. The quality of
images from cameras would be not only low, but the OCR (even with the best
software) would probably have many errors. For someone scanning items at
home this might be ok, but for archival quality, I would not recommend
cameras. If you are grant funded and the grant provider requires a certain
level of quality, you need to make sure the scanning mechanism you use can
scan at that quality.


To capture an image 8.5 x 11 at 300 dpi, you need roughly 8.4 
megapixels, which is well within the capabilities of an inexpensive 
pocket camera. (If you need 600 dpi, then you're in the 33.6 megapixel 
range.) As to whether the quality will be sufficient, this would depend 
on the goals and requirements of the project, but 300 dpi should be 
enough to get good OCR results for normal-sized text. Our very old 
version of PrimeOCR recommends 300 dpi, and suggests that 400 dpi may 
provide substantially better quality for text sizes smaller than 8 
point, while 200 dpi will be sufficient for text 12 points and up. At 
300 and 400 dpi on 19th Century small-print, variable quality texts, we 
are generally getting good to very good recognition: the quality of the 
original document itself is the limiting factor. More modern documents 
(and OCR software) should produce even better results. The cameras used 
by the Internet Archive are only 12 megapixels, though they are of 
substantially higher quality than a Canon PowerShot.


Some applications require very high quality images, and cheap cameras 
might not be able to deliver the goods, but if you just want to make 
sure the text of your documents is digitally preserved and/or available 
to read online, you don't really need all that much in the way of 
hardware. Using a pocket camera and a stand to digitize more than a few 
pages is going to be slow, clumsy and painful, but for many 
applications, the end result may be entirely acceptable.


-William


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Erik Hetzner
At Fri, 1 May 2009 09:51:19 -0500,
Amanda P wrote:
 
 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras.
 
 Cameras around $100 dollars are very low quality. You could get no where
 near the dpi recommended for materials that need to be OCRed. The quality of
 images from cameras would be not only low, but the OCR (even with the best
 software) would probably have many errors. For someone scanning items at
 home this might be ok, but for archival quality, I would not recommend
 cameras. If you are grant funded and the grant provider requires a certain
 level of quality, you need to make sure the scanning mechanism you use can
 scan at that quality.

I know very little about digital cameras, so I hope I get this right.

According to Wikipedia, Google uses (or used) an 11MP camera (Elphel
323). You can get a 12MP camera for about $200.

With a 12MP camera you should easily be able to get 300 DPI images of
book pages and letter size archival documents. For a $100 camera you
can get more or less 300 DPI images of book pages. *

The problems I have always seen with OCR had much to do with alignment
and artifacts than with DPI. 300 DPI is fine for OCR as far as my
(limited) experience goes - as long as you have quality images.

If your intention is to scan items for preservation, then, yes, you
want higher quality - but I can’t imagine any setup for archival
quality costing anywhere near $1000. If you just want to make scans 
full text OCR available, these setups seem worth looking at -
especially if the software  workflow can be improved.

best,
Erik

* 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of
a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing
pixels / 300). As long as you can get the camera close enough to the
image to not waste much space you will be getting in the close to 300
DPI range for images of size 8.5 x 11 or less. 
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgplxGVqVq0Xx.pgp
Description: PGP signature


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Joe Atzberger
On Fri, May 1, 2009 at 5:39 PM, Mike Taylor m...@indexdata.com wrote:


 If you want real 300 dpi images, at anything like the quality you get
 from a flatbed scanner, then you're going to need cameras much more
 expensive than $100.


Or just wait, say, about 3 years.


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Han, Yan
That is right. 
In addition, for certain printing (gold seal), digital camera delivers better 
result than scanners. 

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of 
Jonathan Rochkind
Sent: Friday, May 01, 2009 2:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Recommend book scanner?

Yeah, I don't think people use cameras instead of flatbed scanners 
because they produce superior results, or are cheaper: They use them 
because they're _faster_ for large-scale digitization, and also make it 
possible to capture pages from rare/fragile materials with less damage 
to the materials. (Flatbeds are not good on bindings, if you want to get 
a good image).

If these things don't apply, is there any reason not to use a flatbed 
scanner? Not that I know of?

Jonathan

Randy Stern wrote:
 My understanding is that a flatbed or sheetfed document scanner that 
 produces 300 dpi will produce much better OCR results than a cheap digital 
 camera that produces 300 dpi. The reasons have to do with the resolution 
 and distortion of the resulting image, where resolution is defined as the 
 number of line pairs per mm can be resolved (for example when scanning a 
 test chart) - in other words the details that will show up for character 
 images, and distortion is image aberration that can appear at the edges of 
 the page image areas, particularly when illumination is not even. A scanner 
 has much more even illumination.

 At 11:21 AM 5/1/2009 -0700, Erik Hetzner wrote:
   
 At Fri, 1 May 2009 09:51:19 -0500,
 Amanda P wrote:
 
 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras.

 Cameras around $100 dollars are very low quality. You could get no where
 near the dpi recommended for materials that need to be OCRed. The 
   
 quality of
 
 images from cameras would be not only low, but the OCR (even with the best
 software) would probably have many errors. For someone scanning items at
 home this might be ok, but for archival quality, I would not recommend
 cameras. If you are grant funded and the grant provider requires a certain
 level of quality, you need to make sure the scanning mechanism you use can
 scan at that quality.
   
 I know very little about digital cameras, so I hope I get this right.

 According to Wikipedia, Google uses (or used) an 11MP camera (Elphel
 323). You can get a 12MP camera for about $200.

 With a 12MP camera you should easily be able to get 300 DPI images of
 book pages and letter size archival documents. For a $100 camera you
 can get more or less 300 DPI images of book pages. *

 The problems I have always seen with OCR had much to do with alignment
 and artifacts than with DPI. 300 DPI is fine for OCR as far as my
 (limited) experience goes - as long as you have quality images.

 If your intention is to scan items for preservation, then, yes, you
 want higher quality - but I can’t imagine any setup for archival
 quality costing anywhere near $1000. If you just want to make scans 
 full text OCR available, these setups seem worth looking at -
 especially if the software  workflow can be improved.

 best,
 Erik

 * 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of
 a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing
 pixels / 300). As long as you can get the camera close enough to the
 image to not waste much space you will be getting in the close to 300
 DPI range for images of size 8.5 x 11 or less.
 ;; Erik Hetzner, California Digital Library
 ;; gnupg key id: 1024D/01DB07E3
 

   


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Randy Stern
My understanding is that a flatbed or sheetfed document scanner that 
produces 300 dpi will produce much better OCR results than a cheap digital 
camera that produces 300 dpi. The reasons have to do with the resolution 
and distortion of the resulting image, where resolution is defined as the 
number of line pairs per mm can be resolved (for example when scanning a 
test chart) - in other words the details that will show up for character 
images, and distortion is image aberration that can appear at the edges of 
the page image areas, particularly when illumination is not even. A scanner 
has much more even illumination.


At 11:21 AM 5/1/2009 -0700, Erik Hetzner wrote:

At Fri, 1 May 2009 09:51:19 -0500,
Amanda P wrote:

 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras.

 Cameras around $100 dollars are very low quality. You could get no where
 near the dpi recommended for materials that need to be OCRed. The 
quality of

 images from cameras would be not only low, but the OCR (even with the best
 software) would probably have many errors. For someone scanning items at
 home this might be ok, but for archival quality, I would not recommend
 cameras. If you are grant funded and the grant provider requires a certain
 level of quality, you need to make sure the scanning mechanism you use can
 scan at that quality.

I know very little about digital cameras, so I hope I get this right.

According to Wikipedia, Google uses (or used) an 11MP camera (Elphel
323). You can get a 12MP camera for about $200.

With a 12MP camera you should easily be able to get 300 DPI images of
book pages and letter size archival documents. For a $100 camera you
can get more or less 300 DPI images of book pages. *

The problems I have always seen with OCR had much to do with alignment
and artifacts than with DPI. 300 DPI is fine for OCR as far as my
(limited) experience goes - as long as you have quality images.

If your intention is to scan items for preservation, then, yes, you
want higher quality - but I can’t imagine any setup for archival
quality costing anywhere near $1000. If you just want to make scans 
full text OCR available, these setups seem worth looking at -
especially if the software  workflow can be improved.

best,
Erik

* 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of
a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing
pixels / 300). As long as you can get the camera close enough to the
image to not waste much space you will be getting in the close to 300
DPI range for images of size 8.5 x 11 or less.
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Mike Taylor
William Wueppelmann writes:
   Cameras around $100 dollars are very low quality. You could get
   no where near the dpi recommended for materials that need to be
   OCRed. The quality of images from cameras would be not only low,
   but the OCR (even with the best software) would probably have
   many errors. For someone scanning items at home this might be ok,
   but for archival quality, I would not recommend cameras. If you
   are grant funded and the grant provider requires a certain level
   of quality, you need to make sure the scanning mechanism you use
   can scan at that quality.
  
  To capture an image 8.5 x 11 at 300 dpi, you need roughly 8.4
  megapixels, which is well within the capabilities of an inexpensive
  pocket camera.

Or not.  Cheap cameras may well produce JPEGs that contain eight
million pixels, but that doesn't mean that they are using all or even
much of that resolution.  In my experience, most cheap cameras are
producing way more data that their lenses can actually feed them, so
that you can halve the resolution or more without losing any actual
information.  Such cameras will, in effect, give you a 150 dpi scan --
even if that scan is expressed as a 300 dpi image.

If you want real 300 dpi images, at anything like the quality you get
from a flatbed scanner, then you're going to need cameras much more
expensive than $100.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  I think it should either be unrestricted garnishing, or a single
 Olympic standard mayonaisse -- Monty Python.


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Jonathan Rochkind
Yeah, I don't think people use cameras instead of flatbed scanners 
because they produce superior results, or are cheaper: They use them 
because they're _faster_ for large-scale digitization, and also make it 
possible to capture pages from rare/fragile materials with less damage 
to the materials. (Flatbeds are not good on bindings, if you want to get 
a good image).


If these things don't apply, is there any reason not to use a flatbed 
scanner? Not that I know of?


Jonathan

Randy Stern wrote:
My understanding is that a flatbed or sheetfed document scanner that 
produces 300 dpi will produce much better OCR results than a cheap digital 
camera that produces 300 dpi. The reasons have to do with the resolution 
and distortion of the resulting image, where resolution is defined as the 
number of line pairs per mm can be resolved (for example when scanning a 
test chart) - in other words the details that will show up for character 
images, and distortion is image aberration that can appear at the edges of 
the page image areas, particularly when illumination is not even. A scanner 
has much more even illumination.


At 11:21 AM 5/1/2009 -0700, Erik Hetzner wrote:
  

At Fri, 1 May 2009 09:51:19 -0500,
Amanda P wrote:


On the other hand, there are projects like bkrpr [2] and [3],
home-brew scanning stations build for marginally more than the cost of
a pair of $100 cameras.

Cameras around $100 dollars are very low quality. You could get no where
near the dpi recommended for materials that need to be OCRed. The 
  

quality of


images from cameras would be not only low, but the OCR (even with the best
software) would probably have many errors. For someone scanning items at
home this might be ok, but for archival quality, I would not recommend
cameras. If you are grant funded and the grant provider requires a certain
level of quality, you need to make sure the scanning mechanism you use can
scan at that quality.
  

I know very little about digital cameras, so I hope I get this right.

According to Wikipedia, Google uses (or used) an 11MP camera (Elphel
323). You can get a 12MP camera for about $200.

With a 12MP camera you should easily be able to get 300 DPI images of
book pages and letter size archival documents. For a $100 camera you
can get more or less 300 DPI images of book pages. *

The problems I have always seen with OCR had much to do with alignment
and artifacts than with DPI. 300 DPI is fine for OCR as far as my
(limited) experience goes - as long as you have quality images.

If your intention is to scan items for preservation, then, yes, you
want higher quality - but I can’t imagine any setup for archival
quality costing anywhere near $1000. If you just want to make scans 
full text OCR available, these setups seem worth looking at -
especially if the software  workflow can be improved.

best,
Erik

* 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of
a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing
pixels / 300). As long as you can get the camera close enough to the
image to not waste much space you will be getting in the close to 300
DPI range for images of size 8.5 x 11 or less.
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3



  


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Lars Aronsson
Mike Taylor wrote:

 Or not.  Cheap cameras may well produce JPEGs that contain eight 
 million pixels, but that doesn't mean that they are using all or 
 even much of that resolution.

Does anybody have a printed test sheet that we can scan or photo, 
and then compare the resulting digital images?  It should have 
lines at various densities and areas of different colours, just 
like an old TV test image.  Can you buy such calibration sheets?

We could make it a standard routine, to always shoot such a sheet 
at the beginning of any captured book, to give the reader an idea 
of the digitization quality of the used equipment.

They are called technical target in figure 14, page 149, of
Lisa L. Fox (ed.), Preservation Microfilming, 2nd ed. (1996), 
ISBN 0-8389-0653-2.  The example there is manufactured by AP 
International, http://www.a-p-international.com/

However, their price list is $100-400 per package of 50 sheets.
I wouldn't pay more for the calibration targets than for the
camera, if I could avoid it.


-- 
  Lars Aronsson (l...@aronsson.se)
  Aronsson Datateknik - http://aronsson.se

  Project Runeberg - free Nordic literature - http://runeberg.org/


Re: [CODE4LIB] Recommend book scanner?

2009-04-30 Thread Erik Hetzner
At Wed, 29 Apr 2009 13:32:08 -0400,
Christine Schwartz wrote:
 
 We are looking into buying a book scanner which we'll probably use for
 archival papers as well--probably something in the $1,000.00 range.
 
 Any advice?

Most organizations, or at least the big ones, Internet Archive and
Google, seem to be using a design based on 2 fixed cameras rather than
a tradition scanner type device. Is this what you had in mind?

Unfortunately none of these products are cheap. Internet Archive’s
Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because
it has two very expensive cameras. Google’s data is unavailable. A
company called Kirtas also sells what look like very expensive
machines of a similar design.

On the other hand, there are projects like bkrpr [2] and [3],
home-brew scanning stations build for marginally more than the cost of
a pair of $100 cameras. I think that these are a real possibility for
smaller organizations. The maturity of the software and workflow is
problematic, but with Google’s Ocropus OCR software [4] freely
available as the heart of a scanning workflow, the possibility is
there. Both bkrpr and [3] have software currently available, although
in the case of bkrpr at least the software is in the very early stages
of development.

best,
Erik Hetzner

1. 
http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/
2. http://bkrpr.org/doku.php
3. 
http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/
4. http://code.google.com/p/ocropus/
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpYI2WLVtxUI.pgp
Description: PGP signature


Re: [CODE4LIB] Recommend book scanner?

2009-04-30 Thread Ethan Gruber
How good are the two-camera apparatuses for scanning things other than
books?  The thing about the Google and Kirtas scanners is that they are not
particularly recommended for dealing with fragile books or otherwise special
collections materials.  The University of Virginia Library is still using
the single camera method for books as well as manuscripts, photographs,
slides, coins, etc.  It's a Hasselblad camera with a 45 megapixel Phase One
digital back, but significantly out of the $1000 range.  I haven't dealt
with all the camera hardware you could find, but I have never seen a
professional digitization hardware/software suite for as low as a thousand
bucks.  You could check out i2s's copibook series (
http://www.i2s-bookscanner.com/produits.asp?gamme=1003sX_Menu_selectedID=leftV_1003_MOD),
but I have no idea how much they cost; they don't say.  Erik's idea of
building something custom is an option, but you might not necessarily get
consistent quality and production rate.

Have you considered partnering with Princeton University's digitization
labs?  The UVA Health Sciences Library occasionally borrows/trades/buys
resources from the university library's digitization services (the health
system and university are technically two different entities).  For all I
know, someone from Princeton University is on this list; I don't know what
their resources are and don't presume to speak for them.  That's just my
idea.

Ethan Gruber

On Thu, Apr 30, 2009 at 12:49 PM, Erik Hetzner erik.hetz...@ucop.eduwrote:

 At Wed, 29 Apr 2009 13:32:08 -0400,
 Christine Schwartz wrote:
 
  We are looking into buying a book scanner which we'll probably use for
  archival papers as well--probably something in the $1,000.00 range.
 
  Any advice?

 Most organizations, or at least the big ones, Internet Archive and
 Google, seem to be using a design based on 2 fixed cameras rather than
 a tradition scanner type device. Is this what you had in mind?

 Unfortunately none of these products are cheap. Internet Archive’s
 Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because
 it has two very expensive cameras. Google’s data is unavailable. A
 company called Kirtas also sells what look like very expensive
 machines of a similar design.

 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras. I think that these are a real possibility for
 smaller organizations. The maturity of the software and workflow is
 problematic, but with Google’s Ocropus OCR software [4] freely
 available as the heart of a scanning workflow, the possibility is
 there. Both bkrpr and [3] have software currently available, although
 in the case of bkrpr at least the software is in the very early stages
 of development.

 best,
 Erik Hetzner

 1. 
 http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/
 
 2. http://bkrpr.org/doku.php
 3. 
 http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/
 
 4. http://code.google.com/p/ocropus/

 ;; Erik Hetzner, California Digital Library
 ;; gnupg key id: 1024D/01DB07E3




Re: [CODE4LIB] Recommend book scanner?

2009-04-30 Thread William Wueppelmann

Erik Hetzner wrote:

At Wed, 29 Apr 2009 13:32:08 -0400,
Christine Schwartz wrote:

We are looking into buying a book scanner which we'll probably use for
archival papers as well--probably something in the $1,000.00 range.

Any advice?


Most organizations, or at least the big ones, Internet Archive and
Google, seem to be using a design based on 2 fixed cameras rather than
a tradition scanner type device. Is this what you had in mind?


This is probably the type of machine that will be needed for books if 
they need to remain bound throughout the scanning process. For looseleaf 
materials or for books that can be disbound and are in good condition, 
you can get inexpensive duplex sheet feeder scanners for a few hundred 
dollars that might be good enough.



Unfortunately none of these products are cheap. Internet Archive’s
Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because
it has two very expensive cameras. Google’s data is unavailable. A
company called Kirtas also sells what look like very expensive
machines of a similar design.


$15K seems pretty cheap for that kind of scanner; most that I've seen 
run from the tens of thousands well into the hundreds, depending on the 
model and features. I don't remember precisely what IA's Scribe stations 
cost, but I think they were more in the range of $40-60K CAD; it would 
probably be cheaper in the US, but not that much cheaper, and I suspect 
that IA gets some sort of bulk discount for buying them by the truckload.


The main issues to consider are:

- Type of material: is it fragile or not; is it rare; can you afford to 
damage or destroy a copy during the scanning process; can the items be 
disbound; what is the minimum and maximum size of item to be scanned; if 
books are to remain bound, are the bindings tight or are the margins; 
paper thickness; existence of damage, water spotting, show through, and 
other defects


- Scanning resolution required

- Image output (color/greyscale/black and white) and output format 
(TIFF, JPEG2000, PDF, JPEG).


- Throughput requirement. (How much stuff do you have: 
dozens/hundreds/thousands/millions of pages, and how quickly do you need 
to get it done: days/weeks/months/years?)


- How much technical work can/are you willing to do yourself? Can you 
invest in substantial post-processing, or do you need to be able to 
press Go on the scanner and produce a more or less finished product? 
If so, what sort of metadata, OCR, etc. requirements do you have, if 
any, in addition to getting the basic image?


For some projects, there are suitable desktop scanners available for 
very little money, and in some cases, using a decent (7 megapixel or 
higher) digital camera in conjunction with a stand and maybe an image 
editor like Photoshop (or something free like Irfanview) to crop and 
deskew afterwards might work just fine, but in other cases, a much more 
elaborate setup might be needed.


--
William Wueppelmann
Systems Librarian/Programmer
Canadiana.org
http://www.canadiana.org