On 10/27/12 12:06 PM, Ben Companjen wrote: > Hi all, > > Since I received my e-book reader a couple of weeks ago, I have been > looking at out-of-copyright books to load. The few books that I > downloaded as EPUB from the OL / Internet Archive contain many OCR > errors. Rather than correcting these by hand just for myself (as OL/IA > doesn't provide an obvious way to let me upload a more correct > version), I remembered that there is a web place where people gather > to improve texts for e-book readers and re-discovered Project > Gutenberg [1]. > > Community members involved with Project Gutenberg produce e-book > versions of out-of-copyright books, which can then be downloaded from > the website. But whereas OL EPUBs can be linked to a specific edition, > the PG EPUBs are mostly "reconstructed" from the text and harder to > link to a paper edition. > > Hence my following questions: > Do people agree that Project Gutenberg editions be seen as separate editions?
Yes, definitely. I also think that a corrected OL edition should be stored separately from its original un-corrected OCR. The reason is that at some point it may be desirable to go back and see what was there before the correction. Ideally, there could be versioning and forking, much like software. > Do people agree the release date given by the project is the publish date? The release date of the digital edition is a publish date, but I think that it isn't sufficient. If the text is derived from a physical book, then the date of the book is also needed. I also would like to see "original" dates where known -- that is the original publication date of the text. Otherwise, Moby Dick and Origin of Species end up being presented as 21st century texts, which really messes up the cultural and scientific context. > Do people agree that there is some sense in PG editions' formats being > something like "E-book" or "Electronic resource" They are electronic resources, but if they are plain text I have a hard time seeing them as "ebooks" -- to me, ebook implies something more structured than plain text. (Title pages, navigable chapters, etc.) I know not everyone sees it that way. > Why are there only (19 | less than 19 | 281) of the 40000+ editions > [2] in OL? These 19 seem to be linked to IA items, coming from > "European libraries", although not all seem to be really published by > PG (e.g. [3]). In the latest data dump, there are 281 editions with at > least one PG identifier, but they are not listed under publisher PG. > Are there people around who know about connecting or importing the PG > catalogue? I believe that the PG books are not in the OL/IA workflow for a reason, although I don't recall the reason. It may have to do with the availability of bibliographic data? Note, though, that from what I understand there is no new development happening on OL at the moment and I don't know if it will be taken up again. There seems to be no staff dedicated to the project. So it's unlikely that any new data types will be added. kc > Are there other known publishers named Project Gutenberg? > > (Feel free to answer a subset of these questions :) ) > > Ben > > [1] http://www.gutenberg.org > [2] http://openlibrary.org/publishers/Project_Gutenberg > [3] http://openlibrary.org/books/OL20478553M/The_Lady_of_the_Lake > _______________________________________________ > Ol-discuss mailing list > Ol-discuss@archive.org > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss > To unsubscribe from this mailing list, send email to > ol-discuss-unsubscr...@archive.org > -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet _______________________________________________ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org