On Mon, Mar 18, 2013 at 1:02 PM, Karen Coyle <[email protected]> wrote: > On 3/18/13 7:57 AM, David Fiander wrote: >> I might argue that different scans of the same print edition (or even of >> the same copy of an edition) are different electronic editions, since >> they may have different scanning errors, and almost definitely have >> different ocr texts. > > Yes, they are different, but I would think of them as different "copies" > -- in the same way that a copy of a book with hand-written notes by Karl > Marx is a different copy from another one in another library. "Edition" > refers to a set of things that were published at the same time from the > same (now virtual) plates. In codex times even those could vary some, > but in modern times we can assume that they began life virtually > identical. The scans should be considered individual and one should not > assume that they are exact substitutes for each other in all ways.
I agree that the scans are not separate editions. It's also not really a given that a scan has a 1:1 relationship with the OCR text. I'm pretty sure that Google re-OCRs things as their OCR improves and there's not reason one couldn't have a raw scan, color corrected scan, three different OCR'd texts, a proofread OCR text all associated with each other through relationships. Over time, I'm sure our handling of these relationships will get more mature. > What drives me nuts about the Google digitizations is that they often > combine pages from different digitization "events" -- if a page from one > event is blurry, they'll substitute a page from another one, and thus > usually from a different library. This makes the Google digitized > versions basically useless as preservation copies. I'm sure Google has the raw scans intact. You're just saying a presentation artifact. > Note that some > Project Gutenberg editions, especially early ones, did not indicate > which print edition they were based on, so again those have issues as > sources for research. PG has had all kinds of weird rules over the years. They were adamantly opposed to ISO 8859-1 for years after it was in widespread use (before Unicode). Lack of edition identification was an explicit policy in those days as well. They were proudly "edition free" or, rather, creating specific unique PG editions. I don't know if this is still the case. They did eventually abandon their 7-bit US ASCII only stance. Tom _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
