Ooof! I took a look at some of these and mostly they are badly input $h subfields from the 245 field, and the ones I saw were from Talis. (That Talis data will haunt us forever -- very dirty.) I think that a pull-down would be a good idea for manual input. From relatively good MARC records the list of terms should be quite short although a little creative input does take place. The valid terms in MARC (which are called General Material Designations) are:
http://www.uproc.lib.mi.us/cat/gmd.htm Unfortunately, libraries do catalog the paperback and hardcopy on the same record. However, those terms are not acceptable GMDs. They *do*, however, result in more than one ISBN coming in on a single record. kc On 4/11/12 4:00 PM, Ben Companjen wrote: > Hi all, > > In the most recent dump file, I found 7467 different values for the > (physical) format field in the editions. Every variation in > capitalization, punctuation, spacing etc. is counted. > > Using Google Refine and its clustering functions I brought that number > down to about 4800, and saw that if my proposed changes Ooowere to be > executed, about 1 million records would be changed. The majority of > these changes involve variations of "microform". > > I was wondering how other see this large number of "formats". Is it > worth trying to "fix" them? Has anyone ever tried? > Some of the strange formats come from manual input; these are typos, > spam and wrong inputs like ISBNs. Could more detailed instructions > help prevent these? How about an autocomplete input field, like the > one for language? > > Part of the strange input comes from MARC records, like the ones from > Library of Congress and Talis. Is it possible for the ImportBot to > leave formats like ":" out, or autocorrect it? > > For example: > <http://openlibrary.org/query.json?type=/type/edition&physical_format=[Italian]%20/> > <http://openlibrary.org/query.json?type=/type/edition&physical_format=[chinese].> > <http://openlibrary.org/query.json?type=/type/edition&physical_format=:> > <http://openlibrary.org/query.json?type=/type/edition&physical_format=Paperback%20and%20Hardcover> > (lazy people...) > > I already corrected "[microwave]", "Both" and some other values. > > Regards, > > Ben > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] -- Karen Coyle [email protected] http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
