On 2012-05-04 00:51, Karen Coyle wrote: > The difficulty seems to arise in the process of scanning. For the > purposes of scanning, each physical volume becomes a scanned file.
At any serious scale (e.g. Google or Internet Archive), I think book scanning needs to be organized as multiple work stations, each taking their portion of a day's batch of books, meaning that the 10 or 20 volumes of an encyclopedia will be scanned by different people, each generating a job that goes through OCR and postprocessing, so each volume needs its own metadata record. However, with Google I often find volumes 2 and 5 being all that is scanned. And at the Internet Archive I sometimes find everything except volumes 2 and 7 has been scanned. So there is more chaos than necessary. When we're trying to use scanned books for reference and for proofreading the text, we must hunt down individual parts from different sources. The prime example must be the German branch of Wikisource, here trying to find all 143 parts of the Weimar edition (1887-1919) of Goethe's collected works, http://de.wikisource.org/wiki/Goethe#Sophien-_oder_Weimarer_Ausgabe_.28WA.29 Now, the structure shown on that wiki page is something that should go into OpenLibrary.org, because it is open (as all of Wikisource is free and open) bibliographic data. -- Lars Aronsson ([email protected]) Project Runeberg - free Nordic literature - http://runeberg.org/ _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
