On 22-Feb-2012, at 6:00 PM, Ben Companjen wrote: > Hi all, > > Last night I ran a script to count the identifiers found in Edition > records in the dump of January 31st. > > It counted 173 identifiers, including ISBN 10 and 13, ocaid, oclc > numbers and all the variations of the identifiers in the list in the > edit form. There is a lot of junk in this list (starting with "1sbn", > "Select", "isbn", "isbn13"..), but more effort is needed to find the > records that contain the junk and clean it up. It appears that it > contains classifications too - just like the edit form does? > > The CSV list is at https://gist.github.com/1884546 - the second column > contains the total number of occurrences of the id (counting all the > instances in each record), the third column is the number of records > that contain the id.
Hi Ben, Very interesting to see the stats of identifiers. We initially had an option for everyone to add new identifier and it has grown without any order. We've removed the ability to add new identifiers after realizing that it was going out of control. It will nice if someone can write a bot to fix the existing identifiers. Will you be interested to write one? I've sorted the identifiers on the total-occurences count. https://gist.github.com/1885956#file_edition_identifiers_sorted_2012_01_31.csv What do you mean by "record occurrences"? Is that the number of records that have this identifier used? In that case it looks like that number of "ocaid" is wrong. We only allow one ocaid per edition and it should be exactly same as the total-occurences. Can you check it once again? Anand
_______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
