Ben, where did the strings like: "amazon.co.jp" come from? did you grab the domain names? or were these all text strings found in the field?
kc On 2/22/12 4:30 AM, Ben Companjen wrote: > Hi all, > > Last night I ran a script to count the identifiers found in Edition > records in the dump of January 31st. > > It counted 173 identifiers, including ISBN 10 and 13, ocaid, oclc > numbers and all the variations of the identifiers in the list in the > edit form. There is a lot of junk in this list (starting with "1sbn", > "Select", "isbn", "isbn13"..), but more effort is needed to find the > records that contain the junk and clean it up. It appears that it > contains classifications too - just like the edit form does? > > The CSV list is at https://gist.github.com/1884546 - the second column > contains the total number of occurrences of the id (counting all the > instances in each record), the third column is the number of records > that contain the id. > > Regards, > > Ben > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] -- Karen Coyle [email protected] http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
