On 16 April 2012 03:43, Karen Coyle <[email protected]> wrote: > > On 4/15/12 3:26 PM, Ben Companjen wrote: > >> >> And on import, most of the punctuation marks like [, ], : and / can be >> stripped I think. There are 376566 records with "[microform] :" in the >> latest datadump, whereas there were 376323 in January's datadump. See >> most variants (with proposed normalization "Microform") in this huge >> table: http://companjen.name/ol/editions_formats-2012-01-31.html > > The issue with punctuation is a huge one -- MARC includes the > punctuation in the data, UniMarc derives punctuation from the fields. > The really insane thing with MARC is that you have to include the > punctuation in the subfield BEFORE the thing it punctuates. This is so > absurd... yet it is commonplace in library cataloging. > > There are tricks to removing punctuation, for example: > > This is a book title. > This is a book title with etc. > > I can try to find some rules that have been used in the past (or you can > join the code4lib list, code4lib.org, and ask there). > Well, I hereby officially offer my "dictionary" of formats to normalized formats for use by anyone, including ImportBot, as a first step :) > >> Is it true that Paperback and Hardcover are not on the MARC list of >> GMDs or in RDA's lists of content/carrier/material types? > > Yes, it is true. These are not included in any of the lists. > >> I guess these are "concepts" under "text", but since there are 3M+ >> Paperbacks and 1.5M Hardcovers in OL, I was a little surprised to not >> find them. > > Those terms probably come from Amazon records (since for a bookseller > that is an important indication of price and shipping costs). For > libraries, the distinction is not considered important. The idea being > that if a person wishes to read a book they will not care whether the > library copy is hardback, paperback, trade paperback, but they will care > about eBook and audio book versions.
The designers of OL must have thought having the distinction is important too, as Paperback and Hardcover are the example inputs. I thought hardcover editions may be more durable, so I'd say for curators of books it may be interesting to know too. And for collectors, who, like me, use Open Library as a reference catalog :) > >> Does Open Library say anywhere that paperbacks and hardcovers should >> be separate editions? I consider them different, but I get the feeling >> newly published authors who add their own books don't (seem to) care, >> or maybe just don't know. > > Open Library has no cataloging rules. If the paperback and hardback come > in on different records, as they do from amazon, those will be > considered separate. But there is nothing to say what is "right" in that > regard. Right. > > kc > Ben > > -- > Karen Coyle > [email protected] http://kcoyle.net > ph: 1-510-540-7596 > m: 1-510-435-8234 > skype: kcoylenet > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
