On 22-Feb-2012, at 6:00 PM, Ben Companjen wrote:

> Hi all,
> 
> Last night I ran a script to count the identifiers found in Edition
> records in the dump of January 31st.
> 
> It counted 173 identifiers, including ISBN 10 and 13, ocaid, oclc
> numbers and all the variations of the identifiers in the list in the
> edit form. There is a lot of junk in this list (starting with "1sbn",
> "Select", "isbn", "isbn13"..), but more effort is needed to find the
> records that contain the junk and clean it up. It appears that it
> contains classifications too - just like the edit form does?
> 
> The CSV list is at https://gist.github.com/1884546 - the second column
> contains the total number of occurrences of the id (counting all the
> instances in each record), the third column is the number of records
> that contain the id.

Hi Ben,

Very interesting to see the stats of identifiers. We initially had an option 
for everyone to add new identifier and it has grown without any order. We've 
removed the ability to add new identifiers after realizing that it was going 
out of control.

It will nice if someone can write a bot to fix the existing identifiers. Will 
you be interested to write one?

I've sorted the identifiers on the total-occurences count.

https://gist.github.com/1885956#file_edition_identifiers_sorted_2012_01_31.csv

What do you mean by "record occurrences"? Is that the number of records that 
have this identifier used? In that case it looks like that number of "ocaid" is 
wrong. We only allow one ocaid per edition and it should be exactly same as the 
total-occurences. Can you check it once again?

Anand




_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to