Ooof! I took a look at some of these and mostly they are badly input $h 
subfields from the 245 field, and the ones I saw were from Talis. (That 
Talis data will haunt us forever -- very dirty.) I think that a 
pull-down would be a good idea for manual input. From relatively good 
MARC records the list of terms should be quite short although a little 
creative input does take place. The valid terms in MARC (which are 
called General Material Designations) are:

http://www.uproc.lib.mi.us/cat/gmd.htm

Unfortunately, libraries do catalog the paperback and hardcopy on the 
same record. However, those terms are not acceptable GMDs. They *do*, 
however, result in more than one ISBN coming in on a single record.

kc

On 4/11/12 4:00 PM, Ben Companjen wrote:
> Hi all,
>
> In the most recent dump file, I found 7467 different values for the
> (physical) format field in the editions. Every variation in
> capitalization, punctuation, spacing etc. is counted.
>
> Using Google Refine and its clustering functions I brought that number
> down to about 4800, and saw that if my proposed changes Ooowere to be
> executed, about 1 million records would be changed. The majority of
> these changes involve variations of "microform".
>
> I was wondering how other see this large number of "formats". Is it
> worth trying to "fix" them? Has anyone ever tried?
> Some of the strange formats come from manual input; these are typos,
> spam and wrong inputs like ISBNs. Could more detailed instructions
> help prevent these? How about an autocomplete input field, like the
> one for language?
>
> Part of the strange input comes from MARC records, like the ones from
> Library of Congress and Talis. Is it possible for the ImportBot to
> leave formats like ":" out, or autocorrect it?
>
> For example:
> <http://openlibrary.org/query.json?type=/type/edition&physical_format=[Italian]%20/>
> <http://openlibrary.org/query.json?type=/type/edition&physical_format=[chinese].>
> <http://openlibrary.org/query.json?type=/type/edition&physical_format=:>
> <http://openlibrary.org/query.json?type=/type/edition&physical_format=Paperback%20and%20Hardcover>
> (lazy people...)
>
> I already corrected "[microwave]", "Both" and some other values.
>
> Regards,
>
> Ben
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to 
> [email protected]

-- 
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to