Re: [CODE4LIB] Open, public standards v. pay per view standards and usage

2009-07-14 Thread William Wueppelmann
That might not be the best analogy. The most commonly-cited reason for 
Beta losing out to VHS seems to be the initial limitation of Beta to 
1-hour tapes, which wasn't enough to record a movie from TV, or to play 
back a rented one without switching tapes partway through. By the time 
Beta increased its tape length, VHS had basically caught up from a 
quality standpoint, and its market share had reached the tipping point 

I'm not entirely sure that TCP/IP and the other IETF RFCs became 
established because of restrictions placed on OSI. I was under the 
impression that OSI was also insanely complicated and that the IETF 
standards were much cheaper to implement from a technical standpoint. 
And, from a product standpoint, in the mid-90s, there were still a lot 
of bets being placed on closed online services like AOL, MSN, and 


David Fiander wrote:


Well the obvious commercial example, sort of is that old favourite:
Beta (for which Sony charged a license fee and controlled who could
produce media) vs VHS (for which there was either no fee or a much
lower one, and not oversight of media producers).

On Mon, Jul 13, 2009 at 12:28 PM, Andrew wrote:

Have a look at the ongoing battles between MPEG4 and Ogg for the browser
video space. I don't know of your second criteria for b), however - not many
people are using Ogg (yet)


On 13-Jul-09, at 12:22 PM, Walter Lewis wrote:

Are there any blindingly obvious examples of instances where
  a) a standards group produced a standard published by a body which
charged for access to it
 b) a alternative standards groups produced a competing standard that was
openly accessible
and the work of group a) was rendered totally irrelevant because most
non-commercial work ignored it in favour of b).

My instinct is to quote the battle between OSI (ISO) and TCP/IP (IETF
RFCs).  Does that strike others as appropriate?

Any examples closer to the library world?

Walter Lewis

Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread William Wueppelmann

Amanda P wrote:

Cameras around $100 dollars are very low quality. You could get no where
near the dpi recommended for materials that need to be OCRed. The quality of
images from cameras would be not only low, but the OCR (even with the best
software) would probably have many errors. For someone scanning items at
home this might be ok, but for archival quality, I would not recommend
cameras. If you are grant funded and the grant provider requires a certain
level of quality, you need to make sure the scanning mechanism you use can
scan at that quality.

To capture an image 8.5 x 11 at 300 dpi, you need roughly 8.4 
megapixels, which is well within the capabilities of an inexpensive 
pocket camera. (If you need 600 dpi, then you're in the 33.6 megapixel 
range.) As to whether the quality will be sufficient, this would depend 
on the goals and requirements of the project, but 300 dpi should be 
enough to get good OCR results for normal-sized text. Our very old 
version of PrimeOCR recommends 300 dpi, and suggests that 400 dpi may 
provide substantially better quality for text sizes smaller than 8 
point, while 200 dpi will be sufficient for text 12 points and up. At 
300 and 400 dpi on 19th Century small-print, variable quality texts, we 
are generally getting good to very good recognition: the quality of the 
original document itself is the limiting factor. More modern documents 
(and OCR software) should produce even better results. The cameras used 
by the Internet Archive are only 12 megapixels, though they are of 
substantially higher quality than a Canon PowerShot.

Some applications require very high quality images, and cheap cameras 
might not be able to deliver the goods, but if you just want to make 
sure the text of your documents is digitally preserved and/or available 
to read online, you don't really need all that much in the way of 
hardware. Using a pocket camera and a stand to digitize more than a few 
pages is going to be slow, clumsy and painful, but for many 
applications, the end result may be entirely acceptable.


Re: [CODE4LIB] Recommend book scanner?

2009-04-30 Thread William Wueppelmann

Erik Hetzner wrote:

At Wed, 29 Apr 2009 13:32:08 -0400,
Christine Schwartz wrote:

We are looking into buying a book scanner which we'll probably use for
archival papers as well--probably something in the $1,000.00 range.

Any advice?

Most organizations, or at least the big ones, Internet Archive and
Google, seem to be using a design based on 2 fixed cameras rather than
a tradition scanner type device. Is this what you had in mind?

This is probably the type of machine that will be needed for books if 
they need to remain bound throughout the scanning process. For looseleaf 
materials or for books that can be disbound and are in good condition, 
you can get inexpensive duplex sheet feeder scanners for a few hundred 
dollars that might be good enough.

Unfortunately none of these products are cheap. Internet Archive’s
Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because
it has two very expensive cameras. Google’s data is unavailable. A
company called Kirtas also sells what look like very expensive
machines of a similar design.

$15K seems pretty cheap for that kind of scanner; most that I've seen 
run from the tens of thousands well into the hundreds, depending on the 
model and features. I don't remember precisely what IA's Scribe stations 
cost, but I think they were more in the range of $40-60K CAD; it would 
probably be cheaper in the US, but not that much cheaper, and I suspect 
that IA gets some sort of bulk discount for buying them by the truckload.

The main issues to consider are:

- Type of material: is it fragile or not; is it rare; can you afford to 
damage or destroy a copy during the scanning process; can the items be 
disbound; what is the minimum and maximum size of item to be scanned; if 
books are to remain bound, are the bindings tight or are the margins; 
paper thickness; existence of damage, water spotting, show through, and 
other defects

- Scanning resolution required

- Image output (color/greyscale/black and white) and output format 

- Throughput requirement. (How much stuff do you have: 
dozens/hundreds/thousands/millions of pages, and how quickly do you need 
to get it done: days/weeks/months/years?)

- How much technical work can/are you willing to do yourself? Can you 
invest in substantial post-processing, or do you need to be able to 
press Go on the scanner and produce a more or less finished product? 
If so, what sort of metadata, OCR, etc. requirements do you have, if 
any, in addition to getting the basic image?

For some projects, there are suitable desktop scanners available for 
very little money, and in some cases, using a decent (7 megapixel or 
higher) digital camera in conjunction with a stand and maybe an image 
editor like Photoshop (or something free like Irfanview) to crop and 
deskew afterwards might work just fine, but in other cases, a much more 
elaborate setup might be needed.

William Wueppelmann
Systems Librarian/Programmer