[CODE4LIB] Mashed Library UK 2009 - registration now open

2009-05-01 Thread David Pattern
Hope this might be of interest to some of you.  I'm not sure how feasible it'll 
be to stream and/or video the event, but we're currently looking into it.

regards
Dave Pattern
University of Huddersfield

-

Mashed Library UK 2009: Mash Oop North!
Date: Tuesday 7th July 2009
Time: 10.00am until late afternoon
Venue: University of Huddersfield, Huddersfield, HD1 3DH
Web site: http://mashlib09.wordpress.com
Fee: £15 (ex. vat)
Speakers: Tony Hirst, Mike Ellis, Brendan Dawes, Richard Wallis and more
Primary sponsor: Talis

The first Mashed Library UK event, organised by Owen Stephens, was held at 
Birkbeck College in November 2008 with the aim of bringing together interested 
people and doing interesting stuff with libraries and technology.  Further 
details about the 2008 event are available here: http://mashedlibrary.ning.com

The University of Huddersfield is proud to be hosting the second event, dubbed 
Mash Oop North!, which is being sponsored by Talis.  The event will take 
place in Huddersfield on July 7th.

Mashed Library is aimed at librarians, library developers and library techies 
who want to learn more about Web 2.0  3.0, Library 2.0, creating mash-ups and 
generally doing interesting/cool/useful things with data.  In particular, we 
expect the event to generate the following outcomes for all attendees:

1) Awareness of the latest developments in library technology
2) Application of Web 2.0 technologies in a library context
3) Community building and networking
4) Learn new skills and develop existing ones

The event is primarily an unconference, so attendees will be encouraged to 
participate throughout the day.  Further information is available on the event 
blog: http://mashlib09.wordpress.com

A small token registration fee of £15 is the only charge for the event.  Places 
are limited to around 60 delegates, so we would advise booking early to avoid 
disappointment!

img src=http://www.hud.ac.uk/images/emails/neutral_navy_blue_003976.gif; 
alt=Inspiring tomorrow's professionals
---
This transmission is confidential and may be legally privileged. If you receive 
it in error, please notify us immediately by e-mail and remove it from your 
system. If the content of this e-mail does not relate to the business of the 
University of Huddersfield, then we do not endorse it and will accept no 
liability.


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Mike Taylor
Ray Denenberg, Library of Congress writes:
  Thanks, Ross. For SRU, this is an opportune time to reconcile these
  differences.  Opportune, because we are approaching standardization
  of SRU/CQL within OASIS, and there will be a number of areas that
  need to change.

Agreed.  Looking at the situation as it stands, it really does seem
insane that we've ended up with these three or four different URIs
describing each of the data formats; and if we with our library
background can't get this right, what hope does the rest of the world
have?  Because OpenURL 1.0 seems to have been more widely implemented
than SRU (though much less so than OpenURL 0.1), I think it would be
less painful to change SRU to change OpenURL's data-format URIs than
vice versa; good implementations will of course recognise both old and
new URIs.

  Some observations.
  
  1. the 'ofi' namespace of 'info' has the advantage that the name,
  ofi, isn't necessarily tied to a community or application (I
  suppose one could claim that the acronym ofi means openURL
  something starting with 'f' for Identifiers but it doesn't say
  so anywhere that I can find.)  However, the namespace itself (if
  not the name) is tied to OpenURL.  Namespace of Registry
  Identifiers used by the NISO OpenURL Framework Registry.  That
  seems like a simple problem to fix.  (Changing that title would not
  cause any technical problems. )
  
  2. In contrast, with the srw namespace, the actual name is
  srw. So at least in name, it is tied to an application.

Agreed -- another reason to prefer the OpenURL standard's URIs.

  3. On the other side, the srw namespace has the distinct advantage
  of built-in extensibility.  For the URI:
  info:srw/schema/1/onix-v2.0, the 1 is an authority.  There are
  (currently) 15 such authorities, they are listed in the (second)
  table at http://www.loc.gov/standards/sru/resources/infoURI.html
  
  Authority 1 is the SRU maintenance agency, and the objects
  registered under that authority are, more-or-less, public. But
  objects can be defined under the other authorities with no
  registration process required.
  
  4.  ofi does not offer this sort of extensibility.

But SRU's has always been a clumsy extensibility mechanism -- the
assignment of integer identifiers for sub-namespaces has the distinct
whiff of an OID hangover.  In these enlightened days, we use our
domains for namespace partitioning, as with HTTP URLs.

I'd like to see the info:ofi URI specification extended to allow this
kind of thing:
info:ofi/ext:miketaylor.org.uk:whateverTheHeckIWantToPutHere

  So, if we were going to unify these two systems (and I can't speak
  for the SRU community and commit to doing so yet) the extensibility
  offered by the srw approach would be an absolute requirement.  If
  it could somehow be built in to ofi, then I would not be opposed to
  migrating the srw identifiers.  Another approach would be to
  register an entirely new 'info:' URI namespace and migrating all of
  these identifiers to the new namespace.

Oh, gosh, no, introducing yet ANOTHER set of identifiers is really not
the answer! :-)

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  Conclusion: is left to the reader (see Table 2).
 Acknowledgements: I wrote this paper for money -- A. A. Chastel,
 _A critical analysis of the explanation of red-shifts by a new
 field_, AA 53, 67 (1976)


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Mike Taylor
Jonathan Rochkind writes:
  Crosswalk is exactly the wrong answer for this. Two very small
  overlapping communities of most library developers can surely agree
  on using the same identifiers, and then we make things easier for
  US.  We don't need to solve the entire universe of problems. Solve
  the simple problem in front of you in the simplest way that could
  possibly work and still leave room for future expansion and
  improvement. From that, we learn how to solve the big problems,
  when we're ready. Overreach and try to solve the huge problem
  including every possible use case, many of which don't apply to you
  but SOMEDAY MIGHT... and you end up with the kind of
  over-abstracted over-engineered
  too-complicated-to-actually-catch-on solutions that... we in the
  library community normally end up with.

I strongly, STRONGLY agree with this.  It's exactly what I was about
to write myself, in response to Peter's message, until I saw that
Jonathan had saved me the trouble :-)  Let's solve the problem that's
in front of us right now: bring SRU into harmony with OpenURL in this
respect, and the very act of doing so will lend extra legitimacy to
the agreed-on identifiers, which will then be more strongly positioned
as The Right Identifiers for other initiatives to use.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  You cannot really appreciate Dilbert unless you've read it in
 the original Klingon. -- Klingon Programming Mantra


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Amanda P
On the other hand, there are projects like bkrpr [2] and [3],
home-brew scanning stations build for marginally more than the cost of
a pair of $100 cameras.

Cameras around $100 dollars are very low quality. You could get no where
near the dpi recommended for materials that need to be OCRed. The quality of
images from cameras would be not only low, but the OCR (even with the best
software) would probably have many errors. For someone scanning items at
home this might be ok, but for archival quality, I would not recommend
cameras. If you are grant funded and the grant provider requires a certain
level of quality, you need to make sure the scanning mechanism you use can
scan at that quality.



On Thu, Apr 30, 2009 at 11:49 AM, Erik Hetzner erik.hetz...@ucop.eduwrote:

 At Wed, 29 Apr 2009 13:32:08 -0400,
 Christine Schwartz wrote:
 
  We are looking into buying a book scanner which we'll probably use for
  archival papers as well--probably something in the $1,000.00 range.
 
  Any advice?

 Most organizations, or at least the big ones, Internet Archive and
 Google, seem to be using a design based on 2 fixed cameras rather than
 a tradition scanner type device. Is this what you had in mind?

 Unfortunately none of these products are cheap. Internet Archive’s
 Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because
 it has two very expensive cameras. Google’s data is unavailable. A
 company called Kirtas also sells what look like very expensive
 machines of a similar design.

 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras. I think that these are a real possibility for
 smaller organizations. The maturity of the software and workflow is
 problematic, but with Google’s Ocropus OCR software [4] freely
 available as the heart of a scanning workflow, the possibility is
 there. Both bkrpr and [3] have software currently available, although
 in the case of bkrpr at least the software is in the very early stages
 of development.

 best,
 Erik Hetzner

 1. 
 http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/
 
 2. http://bkrpr.org/doku.php
 3. 
 http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/
 
 4. http://code.google.com/p/ocropus/

 ;; Erik Hetzner, California Digital Library
 ;; gnupg key id: 1024D/01DB07E3




Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread William Wueppelmann

Amanda P wrote:

Cameras around $100 dollars are very low quality. You could get no where
near the dpi recommended for materials that need to be OCRed. The quality of
images from cameras would be not only low, but the OCR (even with the best
software) would probably have many errors. For someone scanning items at
home this might be ok, but for archival quality, I would not recommend
cameras. If you are grant funded and the grant provider requires a certain
level of quality, you need to make sure the scanning mechanism you use can
scan at that quality.


To capture an image 8.5 x 11 at 300 dpi, you need roughly 8.4 
megapixels, which is well within the capabilities of an inexpensive 
pocket camera. (If you need 600 dpi, then you're in the 33.6 megapixel 
range.) As to whether the quality will be sufficient, this would depend 
on the goals and requirements of the project, but 300 dpi should be 
enough to get good OCR results for normal-sized text. Our very old 
version of PrimeOCR recommends 300 dpi, and suggests that 400 dpi may 
provide substantially better quality for text sizes smaller than 8 
point, while 200 dpi will be sufficient for text 12 points and up. At 
300 and 400 dpi on 19th Century small-print, variable quality texts, we 
are generally getting good to very good recognition: the quality of the 
original document itself is the limiting factor. More modern documents 
(and OCR software) should produce even better results. The cameras used 
by the Internet Archive are only 12 megapixels, though they are of 
substantially higher quality than a Canon PowerShot.


Some applications require very high quality images, and cheap cameras 
might not be able to deliver the goods, but if you just want to make 
sure the text of your documents is digitally preserved and/or available 
to read online, you don't really need all that much in the way of 
hardware. Using a pocket camera and a stand to digitize more than a few 
pages is going to be slow, clumsy and painful, but for many 
applications, the end result may be entirely acceptable.


-William


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Erik Hetzner
At Fri, 1 May 2009 09:51:19 -0500,
Amanda P wrote:
 
 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras.
 
 Cameras around $100 dollars are very low quality. You could get no where
 near the dpi recommended for materials that need to be OCRed. The quality of
 images from cameras would be not only low, but the OCR (even with the best
 software) would probably have many errors. For someone scanning items at
 home this might be ok, but for archival quality, I would not recommend
 cameras. If you are grant funded and the grant provider requires a certain
 level of quality, you need to make sure the scanning mechanism you use can
 scan at that quality.

I know very little about digital cameras, so I hope I get this right.

According to Wikipedia, Google uses (or used) an 11MP camera (Elphel
323). You can get a 12MP camera for about $200.

With a 12MP camera you should easily be able to get 300 DPI images of
book pages and letter size archival documents. For a $100 camera you
can get more or less 300 DPI images of book pages. *

The problems I have always seen with OCR had much to do with alignment
and artifacts than with DPI. 300 DPI is fine for OCR as far as my
(limited) experience goes - as long as you have quality images.

If your intention is to scan items for preservation, then, yes, you
want higher quality - but I can’t imagine any setup for archival
quality costing anywhere near $1000. If you just want to make scans 
full text OCR available, these setups seem worth looking at -
especially if the software  workflow can be improved.

best,
Erik

* 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of
a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing
pixels / 300). As long as you can get the camera close enough to the
image to not waste much space you will be getting in the close to 300
DPI range for images of size 8.5 x 11 or less. 
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgplxGVqVq0Xx.pgp
Description: PGP signature


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Peter Noerr
I am pleased to disagree to various levels of 'strongly (if we can agree on a 
definition for it :-).

Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he 
supplied

-snip
We could have something like:
http://purl.org/DataFormat/marcxml
  . skos:prefLabel MARC21 XML .
  . skos:notation info:srw/schema/1/marcxml-v1.1 .
  . skos:notation info:ofi/fmt:xml:xsd:MARC21 .
  . skos:notation http://www.loc.gov/MARC21/slim; .
  . skos:broader http://purl.org/DataFormat/marc .
  . skos:description ... .

Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really 
the point.  The point is that all of these various identifiers would be valid, 
but we'd have a real way of knowing what they actually mean.  Maybe this is 
what you mean by a crosswalk.
--end

Is exactly what I meant by a crosswalk. Basically a translating dictionary 
which allows any entity (system or person) to relate the various identifiers.

I would love to see a single unified set of identifiers, my life as a wrangled 
of record semantics would be s much easier. But I don't see it happening. 

That does not mean we should not try. Even a unification in our space (and if 
not in the library/information space, then where? as Mike said) reduces the 
larger problem. However I don't believe it is a scalable solution (which may 
not matter if all of a group of users agree, they why not leave them to it) as, 
at any time one group/organisation/person/system could introduce a new scheme, 
and a world view which relies on unified semantics would no longer be viable.

Which means until global unification on an object (better a (large) set of 
objects) is achieved it will be necessary to have the translating dictionary 
and systems which know how to use it. Unification reduces Ray's list of 15 
alternative uris to 14 or 13 or whatever. As long as that number is 1 
translation will be necessary. (I will leave aside discussions of massive 
record bloat, continual system re-writes, the politics of whose view prevails, 
the unhelpfulness of compromises for joint solutions, and so on.)

Peter

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Mike Taylor
 Sent: Friday, May 01, 2009 02:36
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
 Them All
 
 Jonathan Rochkind writes:
   Crosswalk is exactly the wrong answer for this. Two very small
   overlapping communities of most library developers can surely agree
   on using the same identifiers, and then we make things easier for
   US.  We don't need to solve the entire universe of problems. Solve
   the simple problem in front of you in the simplest way that could
   possibly work and still leave room for future expansion and
   improvement. From that, we learn how to solve the big problems,
   when we're ready. Overreach and try to solve the huge problem
   including every possible use case, many of which don't apply to you
   but SOMEDAY MIGHT... and you end up with the kind of
   over-abstracted over-engineered
   too-complicated-to-actually-catch-on solutions that... we in the
   library community normally end up with.
 
 I strongly, STRONGLY agree with this.  It's exactly what I was about
 to write myself, in response to Peter's message, until I saw that
 Jonathan had saved me the trouble :-)  Let's solve the problem that's
 in front of us right now: bring SRU into harmony with OpenURL in this
 respect, and the very act of doing so will lend extra legitimacy to
 the agreed-on identifiers, which will then be more strongly positioned
 as The Right Identifiers for other initiatives to use.
 
  _/|_  ___
 /o ) \/  Mike Taylorm...@indexdata.com
 http://www.miketaylor.org.uk
 )_v__/\  You cannot really appreciate Dilbert unless you've read it in
the original Klingon. -- Klingon Programming Mantra


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Ross Singer
Ideally, though, if we have some buy in and extend this outside our
communities, future identifiers *should* have fewer variations, since
people can find the appropriate URI for the format and use that.

I readily admit that this is wishful thinking, but so be it.  I do
think that modeling it as SKOS/RDF at least would make it attractive
to the Linked Data/Semweb crowd who are likely the sorts of people
that would be interested in seeing URIs, anyway.

I mean, the worst that can happen is that nobody cares, right?

-Ross.

On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote:
 I am pleased to disagree to various levels of 'strongly (if we can agree on 
 a definition for it :-).

 Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he 
 supplied

 -snip
 We could have something like:
 http://purl.org/DataFormat/marcxml
  . skos:prefLabel MARC21 XML .
  . skos:notation info:srw/schema/1/marcxml-v1.1 .
  . skos:notation info:ofi/fmt:xml:xsd:MARC21 .
  . skos:notation http://www.loc.gov/MARC21/slim; .
  . skos:broader http://purl.org/DataFormat/marc .
  . skos:description ... .

 Or maybe those skos:notations should be owl:sameAs -- anyway, that's not 
 really the point.  The point is that all of these various identifiers would 
 be valid, but we'd have a real way of knowing what they actually mean.  Maybe 
 this is what you mean by a crosswalk.
 --end

 Is exactly what I meant by a crosswalk. Basically a translating dictionary 
 which allows any entity (system or person) to relate the various identifiers.

 I would love to see a single unified set of identifiers, my life as a 
 wrangled of record semantics would be s much easier. But I don't see it 
 happening.

 That does not mean we should not try. Even a unification in our space (and 
 if not in the library/information space, then where? as Mike said) reduces 
 the larger problem. However I don't believe it is a scalable solution (which 
 may not matter if all of a group of users agree, they why not leave them to 
 it) as, at any time one group/organisation/person/system could introduce a 
 new scheme, and a world view which relies on unified semantics would no 
 longer be viable.

 Which means until global unification on an object (better a (large) set of 
 objects) is achieved it will be necessary to have the translating dictionary 
 and systems which know how to use it. Unification reduces Ray's list of 15 
 alternative uris to 14 or 13 or whatever. As long as that number is 1 
 translation will be necessary. (I will leave aside discussions of massive 
 record bloat, continual system re-writes, the politics of whose view 
 prevails, the unhelpfulness of compromises for joint solutions, and so on.)

 Peter

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Mike Taylor
 Sent: Friday, May 01, 2009 02:36
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
 Them All

 Jonathan Rochkind writes:
   Crosswalk is exactly the wrong answer for this. Two very small
   overlapping communities of most library developers can surely agree
   on using the same identifiers, and then we make things easier for
   US.  We don't need to solve the entire universe of problems. Solve
   the simple problem in front of you in the simplest way that could
   possibly work and still leave room for future expansion and
   improvement. From that, we learn how to solve the big problems,
   when we're ready. Overreach and try to solve the huge problem
   including every possible use case, many of which don't apply to you
   but SOMEDAY MIGHT... and you end up with the kind of
   over-abstracted over-engineered
   too-complicated-to-actually-catch-on solutions that... we in the
   library community normally end up with.

 I strongly, STRONGLY agree with this.  It's exactly what I was about
 to write myself, in response to Peter's message, until I saw that
 Jonathan had saved me the trouble :-)  Let's solve the problem that's
 in front of us right now: bring SRU into harmony with OpenURL in this
 respect, and the very act of doing so will lend extra legitimacy to
 the agreed-on identifiers, which will then be more strongly positioned
 as The Right Identifiers for other initiatives to use.

  _/|_  ___
 /o ) \/  Mike Taylor    m...@indexdata.com
 http://www.miketaylor.org.uk
 )_v__/\  You cannot really appreciate Dilbert unless you've read it in
        the original Klingon. -- Klingon Programming Mantra



Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Peter Noerr
I agree with Ross wholeheartedly. Particularly in the use of an RDF based 
mechanism to describe, and then have systems act on, the semantics of these 
uniquely identified objects. Semantics (as in Web) has been exercising my 
thoughts recently and the problems we have here are writ large over all the SW 
people are trying to achieve. Perhaps we can help...

Peter 

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ross Singer
 Sent: Friday, May 01, 2009 13:40
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
 Them All
 
 Ideally, though, if we have some buy in and extend this outside our
 communities, future identifiers *should* have fewer variations, since
 people can find the appropriate URI for the format and use that.
 
 I readily admit that this is wishful thinking, but so be it.  I do
 think that modeling it as SKOS/RDF at least would make it attractive
 to the Linked Data/Semweb crowd who are likely the sorts of people
 that would be interested in seeing URIs, anyway.
 
 I mean, the worst that can happen is that nobody cares, right?
 
 -Ross.
 
 On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote:
  I am pleased to disagree to various levels of 'strongly (if we can agree
 on a definition for it :-).
 
  Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he
 supplied
 
  -snip
  We could have something like:
  http://purl.org/DataFormat/marcxml
   . skos:prefLabel MARC21 XML .
   . skos:notation info:srw/schema/1/marcxml-v1.1 .
   . skos:notation info:ofi/fmt:xml:xsd:MARC21 .
   . skos:notation http://www.loc.gov/MARC21/slim; .
   . skos:broader http://purl.org/DataFormat/marc .
   . skos:description ... .
 
  Or maybe those skos:notations should be owl:sameAs -- anyway, that's not
 really the point.  The point is that all of these various identifiers would
 be valid, but we'd have a real way of knowing what they actually mean.
  Maybe this is what you mean by a crosswalk.
  --end
 
  Is exactly what I meant by a crosswalk. Basically a translating
 dictionary which allows any entity (system or person) to relate the various
 identifiers.
 
  I would love to see a single unified set of identifiers, my life as a
 wrangled of record semantics would be s much easier. But I don't see it
 happening.
 
  That does not mean we should not try. Even a unification in our space
 (and if not in the library/information space, then where? as Mike said)
 reduces the larger problem. However I don't believe it is a scalable
 solution (which may not matter if all of a group of users agree, they why
 not leave them to it) as, at any time one group/organisation/person/system
 could introduce a new scheme, and a world view which relies on unified
 semantics would no longer be viable.
 
  Which means until global unification on an object (better a (large) set
 of objects) is achieved it will be necessary to have the translating
 dictionary and systems which know how to use it. Unification reduces Ray's
 list of 15 alternative uris to 14 or 13 or whatever. As long as that number
 is 1 translation will be necessary. (I will leave aside discussions of
 massive record bloat, continual system re-writes, the politics of whose
 view prevails, the unhelpfulness of compromises for joint solutions, and so
 on.)
 
  Peter
 
  -Original Message-
  From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
  Mike Taylor
  Sent: Friday, May 01, 2009 02:36
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to
 Rule
  Them All
 
  Jonathan Rochkind writes:
    Crosswalk is exactly the wrong answer for this. Two very small
    overlapping communities of most library developers can surely agree
    on using the same identifiers, and then we make things easier for
    US.  We don't need to solve the entire universe of problems. Solve
    the simple problem in front of you in the simplest way that could
    possibly work and still leave room for future expansion and
    improvement. From that, we learn how to solve the big problems,
    when we're ready. Overreach and try to solve the huge problem
    including every possible use case, many of which don't apply to you
    but SOMEDAY MIGHT... and you end up with the kind of
    over-abstracted over-engineered
    too-complicated-to-actually-catch-on solutions that... we in the
    library community normally end up with.
 
  I strongly, STRONGLY agree with this.  It's exactly what I was about
  to write myself, in response to Peter's message, until I saw that
  Jonathan had saved me the trouble :-)  Let's solve the problem that's
  in front of us right now: bring SRU into harmony with OpenURL in this
  respect, and the very act of doing so will lend extra legitimacy to
  the agreed-on identifiers, which will then be more strongly positioned
  as The Right Identifiers 

Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Joe Atzberger
On Fri, May 1, 2009 at 5:39 PM, Mike Taylor m...@indexdata.com wrote:


 If you want real 300 dpi images, at anything like the quality you get
 from a flatbed scanner, then you're going to need cameras much more
 expensive than $100.


Or just wait, say, about 3 years.


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Han, Yan
That is right. 
In addition, for certain printing (gold seal), digital camera delivers better 
result than scanners. 

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of 
Jonathan Rochkind
Sent: Friday, May 01, 2009 2:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Recommend book scanner?

Yeah, I don't think people use cameras instead of flatbed scanners 
because they produce superior results, or are cheaper: They use them 
because they're _faster_ for large-scale digitization, and also make it 
possible to capture pages from rare/fragile materials with less damage 
to the materials. (Flatbeds are not good on bindings, if you want to get 
a good image).

If these things don't apply, is there any reason not to use a flatbed 
scanner? Not that I know of?

Jonathan

Randy Stern wrote:
 My understanding is that a flatbed or sheetfed document scanner that 
 produces 300 dpi will produce much better OCR results than a cheap digital 
 camera that produces 300 dpi. The reasons have to do with the resolution 
 and distortion of the resulting image, where resolution is defined as the 
 number of line pairs per mm can be resolved (for example when scanning a 
 test chart) - in other words the details that will show up for character 
 images, and distortion is image aberration that can appear at the edges of 
 the page image areas, particularly when illumination is not even. A scanner 
 has much more even illumination.

 At 11:21 AM 5/1/2009 -0700, Erik Hetzner wrote:
   
 At Fri, 1 May 2009 09:51:19 -0500,
 Amanda P wrote:
 
 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras.

 Cameras around $100 dollars are very low quality. You could get no where
 near the dpi recommended for materials that need to be OCRed. The 
   
 quality of
 
 images from cameras would be not only low, but the OCR (even with the best
 software) would probably have many errors. For someone scanning items at
 home this might be ok, but for archival quality, I would not recommend
 cameras. If you are grant funded and the grant provider requires a certain
 level of quality, you need to make sure the scanning mechanism you use can
 scan at that quality.
   
 I know very little about digital cameras, so I hope I get this right.

 According to Wikipedia, Google uses (or used) an 11MP camera (Elphel
 323). You can get a 12MP camera for about $200.

 With a 12MP camera you should easily be able to get 300 DPI images of
 book pages and letter size archival documents. For a $100 camera you
 can get more or less 300 DPI images of book pages. *

 The problems I have always seen with OCR had much to do with alignment
 and artifacts than with DPI. 300 DPI is fine for OCR as far as my
 (limited) experience goes - as long as you have quality images.

 If your intention is to scan items for preservation, then, yes, you
 want higher quality - but I can’t imagine any setup for archival
 quality costing anywhere near $1000. If you just want to make scans 
 full text OCR available, these setups seem worth looking at -
 especially if the software  workflow can be improved.

 best,
 Erik

 * 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of
 a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing
 pixels / 300). As long as you can get the camera close enough to the
 image to not waste much space you will be getting in the close to 300
 DPI range for images of size 8.5 x 11 or less.
 ;; Erik Hetzner, California Digital Library
 ;; gnupg key id: 1024D/01DB07E3
 

   


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Jonathan Rochkind
From my perspective, all we're talking about is using the same URI to 
refer to the same format(s) accross the library community standards this 
community generally can control.


That will make things much easier for developers, especially but not 
only when building software that interacts with more than one of these 
standards (as client or server).


Now, once you've done that, you've ALSO set the stage for that kind of 
RDF scenario, among other RDF scenarios. I agree with Mike that that 
particular scenario is unlikely, but once you set the stage for RDF 
experimentation like that, if folks are interested in experimenting (and 
many in our community are), maybe something more attractively useful 
will come out of it.


Or maybe not. Either way, you've made things easier and more 
inter-operable just by using the same set of URIs across multiple 
standards to refer to the same thing. So, yeah, I'd still focus on that, 
rather than any kind of 'cross walk', RDF or not. It's the actual use 
case in front of us, in which the benefit will definitely be worth the 
effort (if the effort is kept manageable by avoiding trying to solve the 
entire universe of problems at once).


Jonathan

Mike Taylor wrote:

So what are we talking about here?  A situation where an SRU server
receives a request for response records to be delivered in a
particular format, it doesn't recognise the format URI, so it goes and
looks it up in an RDF database and discovers that it's equivalent to a
URI that it does know?  Hmm ... it's crazy, but it might just work.

I bet no-one does it, though.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  Someday, I'll show you around monster-free Tokyo -- dialogue
 from Gamera: Guardian of the Universe




Peter Noerr writes:
  I agree with Ross wholeheartedly. Particularly in the use of an RDF based 
mechanism to describe, and then have systems act on, the semantics of these 
uniquely identified objects. Semantics (as in Web) has been exercising my thoughts 
recently and the problems we have here are writ large over all the SW people are 
trying to achieve. Perhaps we can help...
  
  Peter 
  
   -Original Message-

   From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
   Ross Singer
   Sent: Friday, May 01, 2009 13:40
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
   Them All
   
   Ideally, though, if we have some buy in and extend this outside our

   communities, future identifiers *should* have fewer variations, since
   people can find the appropriate URI for the format and use that.
   
   I readily admit that this is wishful thinking, but so be it.  I do

   think that modeling it as SKOS/RDF at least would make it attractive
   to the Linked Data/Semweb crowd who are likely the sorts of people
   that would be interested in seeing URIs, anyway.
   
   I mean, the worst that can happen is that nobody cares, right?
   
   -Ross.
   
   On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote:

I am pleased to disagree to various levels of 'strongly (if we can agree
   on a definition for it :-).
   
Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he
   supplied
   
-snip
We could have something like:
http://purl.org/DataFormat/marcxml
 . skos:prefLabel MARC21 XML .
 . skos:notation info:srw/schema/1/marcxml-v1.1 .
 . skos:notation info:ofi/fmt:xml:xsd:MARC21 .
 . skos:notation http://www.loc.gov/MARC21/slim; .
 . skos:broader http://purl.org/DataFormat/marc .
 . skos:description ... .
   
Or maybe those skos:notations should be owl:sameAs -- anyway, that's not
   really the point.  The point is that all of these various identifiers would
   be valid, but we'd have a real way of knowing what they actually mean.
Maybe this is what you mean by a crosswalk.
--end
   
Is exactly what I meant by a crosswalk. Basically a translating
   dictionary which allows any entity (system or person) to relate the various
   identifiers.
   
I would love to see a single unified set of identifiers, my life as a
   wrangled of record semantics would be s much easier. But I don't see it
   happening.
   
That does not mean we should not try. Even a unification in our space
   (and if not in the library/information space, then where? as Mike said)
   reduces the larger problem. However I don't believe it is a scalable
   solution (which may not matter if all of a group of users agree, they why
   not leave them to it) as, at any time one group/organisation/person/system
   could introduce a new scheme, and a world view which relies on unified
   semantics would no longer be viable.
   
Which means until global unification on an object (better a (large) set
   of objects) is 

Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Randy Stern
My understanding is that a flatbed or sheetfed document scanner that 
produces 300 dpi will produce much better OCR results than a cheap digital 
camera that produces 300 dpi. The reasons have to do with the resolution 
and distortion of the resulting image, where resolution is defined as the 
number of line pairs per mm can be resolved (for example when scanning a 
test chart) - in other words the details that will show up for character 
images, and distortion is image aberration that can appear at the edges of 
the page image areas, particularly when illumination is not even. A scanner 
has much more even illumination.


At 11:21 AM 5/1/2009 -0700, Erik Hetzner wrote:

At Fri, 1 May 2009 09:51:19 -0500,
Amanda P wrote:

 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras.

 Cameras around $100 dollars are very low quality. You could get no where
 near the dpi recommended for materials that need to be OCRed. The 
quality of

 images from cameras would be not only low, but the OCR (even with the best
 software) would probably have many errors. For someone scanning items at
 home this might be ok, but for archival quality, I would not recommend
 cameras. If you are grant funded and the grant provider requires a certain
 level of quality, you need to make sure the scanning mechanism you use can
 scan at that quality.

I know very little about digital cameras, so I hope I get this right.

According to Wikipedia, Google uses (or used) an 11MP camera (Elphel
323). You can get a 12MP camera for about $200.

With a 12MP camera you should easily be able to get 300 DPI images of
book pages and letter size archival documents. For a $100 camera you
can get more or less 300 DPI images of book pages. *

The problems I have always seen with OCR had much to do with alignment
and artifacts than with DPI. 300 DPI is fine for OCR as far as my
(limited) experience goes - as long as you have quality images.

If your intention is to scan items for preservation, then, yes, you
want higher quality - but I can’t imagine any setup for archival
quality costing anywhere near $1000. If you just want to make scans 
full text OCR available, these setups seem worth looking at -
especially if the software  workflow can be improved.

best,
Erik

* 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of
a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing
pixels / 300). As long as you can get the camera close enough to the
image to not waste much space you will be getting in the close to 300
DPI range for images of size 8.5 x 11 or less.
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Mike Taylor
So what are we talking about here?  A situation where an SRU server
receives a request for response records to be delivered in a
particular format, it doesn't recognise the format URI, so it goes and
looks it up in an RDF database and discovers that it's equivalent to a
URI that it does know?  Hmm ... it's crazy, but it might just work.

I bet no-one does it, though.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  Someday, I'll show you around monster-free Tokyo -- dialogue
 from Gamera: Guardian of the Universe




Peter Noerr writes:
  I agree with Ross wholeheartedly. Particularly in the use of an RDF based 
  mechanism to describe, and then have systems act on, the semantics of these 
  uniquely identified objects. Semantics (as in Web) has been exercising my 
  thoughts recently and the problems we have here are writ large over all the 
  SW people are trying to achieve. Perhaps we can help...
  
  Peter 
  
   -Original Message-
   From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
   Ross Singer
   Sent: Friday, May 01, 2009 13:40
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
   Them All
   
   Ideally, though, if we have some buy in and extend this outside our
   communities, future identifiers *should* have fewer variations, since
   people can find the appropriate URI for the format and use that.
   
   I readily admit that this is wishful thinking, but so be it.  I do
   think that modeling it as SKOS/RDF at least would make it attractive
   to the Linked Data/Semweb crowd who are likely the sorts of people
   that would be interested in seeing URIs, anyway.
   
   I mean, the worst that can happen is that nobody cares, right?
   
   -Ross.
   
   On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote:
I am pleased to disagree to various levels of 'strongly (if we can agree
   on a definition for it :-).
   
Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he
   supplied
   
-snip
We could have something like:
http://purl.org/DataFormat/marcxml
 . skos:prefLabel MARC21 XML .
 . skos:notation info:srw/schema/1/marcxml-v1.1 .
 . skos:notation info:ofi/fmt:xml:xsd:MARC21 .
 . skos:notation http://www.loc.gov/MARC21/slim; .
 . skos:broader http://purl.org/DataFormat/marc .
 . skos:description ... .
   
Or maybe those skos:notations should be owl:sameAs -- anyway, that's not
   really the point.  The point is that all of these various identifiers would
   be valid, but we'd have a real way of knowing what they actually mean.
    Maybe this is what you mean by a crosswalk.
--end
   
Is exactly what I meant by a crosswalk. Basically a translating
   dictionary which allows any entity (system or person) to relate the various
   identifiers.
   
I would love to see a single unified set of identifiers, my life as a
   wrangled of record semantics would be s much easier. But I don't see it
   happening.
   
That does not mean we should not try. Even a unification in our space
   (and if not in the library/information space, then where? as Mike said)
   reduces the larger problem. However I don't believe it is a scalable
   solution (which may not matter if all of a group of users agree, they why
   not leave them to it) as, at any time one group/organisation/person/system
   could introduce a new scheme, and a world view which relies on unified
   semantics would no longer be viable.
   
Which means until global unification on an object (better a (large) set
   of objects) is achieved it will be necessary to have the translating
   dictionary and systems which know how to use it. Unification reduces Ray's
   list of 15 alternative uris to 14 or 13 or whatever. As long as that number
   is 1 translation will be necessary. (I will leave aside discussions of
   massive record bloat, continual system re-writes, the politics of whose
   view prevails, the unhelpfulness of compromises for joint solutions, and so
   on.)
   
Peter
   
-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Mike Taylor
Sent: Friday, May 01, 2009 02:36
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to
   Rule
Them All
   
Jonathan Rochkind writes:
  Crosswalk is exactly the wrong answer for this. Two very small
  overlapping communities of most library developers can surely agree
  on using the same identifiers, and then we make things easier for
  US.  We don't need to solve the entire universe of problems. Solve
  the simple problem in front of you in the simplest way that could
  possibly work and still leave room for future expansion and
  

Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Mike Taylor
William Wueppelmann writes:
   Cameras around $100 dollars are very low quality. You could get
   no where near the dpi recommended for materials that need to be
   OCRed. The quality of images from cameras would be not only low,
   but the OCR (even with the best software) would probably have
   many errors. For someone scanning items at home this might be ok,
   but for archival quality, I would not recommend cameras. If you
   are grant funded and the grant provider requires a certain level
   of quality, you need to make sure the scanning mechanism you use
   can scan at that quality.
  
  To capture an image 8.5 x 11 at 300 dpi, you need roughly 8.4
  megapixels, which is well within the capabilities of an inexpensive
  pocket camera.

Or not.  Cheap cameras may well produce JPEGs that contain eight
million pixels, but that doesn't mean that they are using all or even
much of that resolution.  In my experience, most cheap cameras are
producing way more data that their lenses can actually feed them, so
that you can halve the resolution or more without losing any actual
information.  Such cameras will, in effect, give you a 150 dpi scan --
even if that scan is expressed as a 300 dpi image.

If you want real 300 dpi images, at anything like the quality you get
from a flatbed scanner, then you're going to need cameras much more
expensive than $100.

 _/|____
/o ) \/  Mike Taylorm...@indexdata.comhttp://www.miketaylor.org.uk
)_v__/\  I think it should either be unrestricted garnishing, or a single
 Olympic standard mayonaisse -- Monty Python.


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Jonathan Rochkind
Yeah, I don't think people use cameras instead of flatbed scanners 
because they produce superior results, or are cheaper: They use them 
because they're _faster_ for large-scale digitization, and also make it 
possible to capture pages from rare/fragile materials with less damage 
to the materials. (Flatbeds are not good on bindings, if you want to get 
a good image).


If these things don't apply, is there any reason not to use a flatbed 
scanner? Not that I know of?


Jonathan

Randy Stern wrote:
My understanding is that a flatbed or sheetfed document scanner that 
produces 300 dpi will produce much better OCR results than a cheap digital 
camera that produces 300 dpi. The reasons have to do with the resolution 
and distortion of the resulting image, where resolution is defined as the 
number of line pairs per mm can be resolved (for example when scanning a 
test chart) - in other words the details that will show up for character 
images, and distortion is image aberration that can appear at the edges of 
the page image areas, particularly when illumination is not even. A scanner 
has much more even illumination.


At 11:21 AM 5/1/2009 -0700, Erik Hetzner wrote:
  

At Fri, 1 May 2009 09:51:19 -0500,
Amanda P wrote:


On the other hand, there are projects like bkrpr [2] and [3],
home-brew scanning stations build for marginally more than the cost of
a pair of $100 cameras.

Cameras around $100 dollars are very low quality. You could get no where
near the dpi recommended for materials that need to be OCRed. The 
  

quality of


images from cameras would be not only low, but the OCR (even with the best
software) would probably have many errors. For someone scanning items at
home this might be ok, but for archival quality, I would not recommend
cameras. If you are grant funded and the grant provider requires a certain
level of quality, you need to make sure the scanning mechanism you use can
scan at that quality.
  

I know very little about digital cameras, so I hope I get this right.

According to Wikipedia, Google uses (or used) an 11MP camera (Elphel
323). You can get a 12MP camera for about $200.

With a 12MP camera you should easily be able to get 300 DPI images of
book pages and letter size archival documents. For a $100 camera you
can get more or less 300 DPI images of book pages. *

The problems I have always seen with OCR had much to do with alignment
and artifacts than with DPI. 300 DPI is fine for OCR as far as my
(limited) experience goes - as long as you have quality images.

If your intention is to scan items for preservation, then, yes, you
want higher quality - but I can’t imagine any setup for archival
quality costing anywhere near $1000. If you just want to make scans 
full text OCR available, these setups seem worth looking at -
especially if the software  workflow can be improved.

best,
Erik

* 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of
a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing
pixels / 300). As long as you can get the camera close enough to the
image to not waste much space you will be getting in the close to 300
DPI range for images of size 8.5 x 11 or less.
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3



  


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Ross Singer
I agree that most software probably won't do it.  But the data will be
there and free and relatively easy to integrate if one wanted to.

In a lot ways, Jonathan, it's got Umlaut written all over it.

Now to get to Jonathan's point -- yes, I think the primary goal still
needs to be working towards bringing use of identifiers for a given
thing to a single variant.  However, we would obviously have to know
what the options are in order to figure out what that one is -- while
we're doing that, why not enter the different options into the
registry and document them in some way (such as, who uses this
variant?).  Voila, we have a crosswalk.

Of course, the downside is that we technically also have a new URI
for this resource (since the skos:Concept would need to have a URI),
but we could probably hand wave that away as the id for the registry
concept, not the data format.

So -- we seem to have some agreement here?

-Ross.

On Fri, May 1, 2009 at 5:53 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 From my perspective, all we're talking about is using the same URI to refer
 to the same format(s) accross the library community standards this community
 generally can control.

 That will make things much easier for developers, especially but not only
 when building software that interacts with more than one of these standards
 (as client or server).

 Now, once you've done that, you've ALSO set the stage for that kind of RDF
 scenario, among other RDF scenarios. I agree with Mike that that particular
 scenario is unlikely, but once you set the stage for RDF experimentation
 like that, if folks are interested in experimenting (and many in our
 community are), maybe something more attractively useful will come out of
 it.

 Or maybe not. Either way, you've made things easier and more inter-operable
 just by using the same set of URIs across multiple standards to refer to the
 same thing. So, yeah, I'd still focus on that, rather than any kind of
 'cross walk', RDF or not. It's the actual use case in front of us, in which
 the benefit will definitely be worth the effort (if the effort is kept
 manageable by avoiding trying to solve the entire universe of problems at
 once).

 Jonathan

 Mike Taylor wrote:

 So what are we talking about here?  A situation where an SRU server
 receives a request for response records to be delivered in a
 particular format, it doesn't recognise the format URI, so it goes and
 looks it up in an RDF database and discovers that it's equivalent to a
 URI that it does know?  Hmm ... it's crazy, but it might just work.

 I bet no-one does it, though.

  _/|_
  ___
 /o ) \/  Mike Taylor    m...@indexdata.com
  http://www.miketaylor.org.uk
 )_v__/\  Someday, I'll show you around monster-free Tokyo -- dialogue
         from Gamera: Guardian of the Universe




 Peter Noerr writes:
   I agree with Ross wholeheartedly. Particularly in the use of an RDF
 based mechanism to describe, and then have systems act on, the semantics of
 these uniquely identified objects. Semantics (as in Web) has been exercising
 my thoughts recently and the problems we have here are writ large over all
 the SW people are trying to achieve. Perhaps we can help...
     Peter      -Original Message-
    From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf
 Of
    Ross Singer
    Sent: Friday, May 01, 2009 13:40
    To: CODE4LIB@LISTSERV.ND.EDU
    Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to
 Rule
    Them All
       Ideally, though, if we have some buy in and extend this outside
 our
    communities, future identifiers *should* have fewer variations, since
    people can find the appropriate URI for the format and use that.
       I readily admit that this is wishful thinking, but so be it.  I
 do
    think that modeling it as SKOS/RDF at least would make it attractive
    to the Linked Data/Semweb crowd who are likely the sorts of people
    that would be interested in seeing URIs, anyway.
       I mean, the worst that can happen is that nobody cares, right?
       -Ross.
       On Fri, May 1, 2009 at 3:41 PM, Peter Noerr
 pno...@museglobal.com wrote:
     I am pleased to disagree to various levels of 'strongly (if we can
 agree
    on a definition for it :-).
    
     Ross earlier gave a sample of a crossw3alk' for my MARC problem.
 What he
    supplied
    
     -snip
     We could have something like:
     http://purl.org/DataFormat/marcxml
      . skos:prefLabel MARC21 XML .
      . skos:notation info:srw/schema/1/marcxml-v1.1 .
      . skos:notation info:ofi/fmt:xml:xsd:MARC21 .
      . skos:notation http://www.loc.gov/MARC21/slim; .
      . skos:broader http://purl.org/DataFormat/marc .
      . skos:description ... .
    
     Or maybe those skos:notations should be owl:sameAs -- anyway,
 that's not
    really the point.  The point is that all of these various identifiers
 would
    be valid, but we'd 

Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Lars Aronsson
Mike Taylor wrote:

 Or not.  Cheap cameras may well produce JPEGs that contain eight 
 million pixels, but that doesn't mean that they are using all or 
 even much of that resolution.

Does anybody have a printed test sheet that we can scan or photo, 
and then compare the resulting digital images?  It should have 
lines at various densities and areas of different colours, just 
like an old TV test image.  Can you buy such calibration sheets?

We could make it a standard routine, to always shoot such a sheet 
at the beginning of any captured book, to give the reader an idea 
of the digitization quality of the used equipment.

They are called technical target in figure 14, page 149, of
Lisa L. Fox (ed.), Preservation Microfilming, 2nd ed. (1996), 
ISBN 0-8389-0653-2.  The example there is manufactured by AP 
International, http://www.a-p-international.com/

However, their price list is $100-400 per package of 50 sheets.
I wouldn't pay more for the calibration targets than for the
camera, if I could avoid it.


-- 
  Lars Aronsson (l...@aronsson.se)
  Aronsson Datateknik - http://aronsson.se

  Project Runeberg - free Nordic literature - http://runeberg.org/