Thanks for interesting summary, Tom. This only a part of the data. This only show the imports during the first 1 or 2 years of the project. For recent imports, we've been using the source records field. Combining both of these would give more accurate results.
Also, one thing to remember is that there could be repetitions. There is plenty of chance that 2 records from different sources, but mapped to the same edition. Anand On 08-Mar-2013, at 10:31 AM, Tom Morris wrote: > Here's a quick analysis of the dump that Anand kindly made available: > > % Cum % # Records Source Notes > 31% 31% 7,029,035 marc_records_scriblio_net Library of > Congress records via Scriblio and Plymouth State University > 27% 58% 6,182,687 amazon Amazon web crawl > 13% 72% 3,021,901 talis_openlibrary_contribution Talis > contribution > 11% 83% 2,536,583 marc_university_of_toronto > 4% 87% 889,009 marc_oregon_summit_records Consortium of several > libraries > 3% 89% 605,925 marc_miami_univ_ohio > 2% 92% 533,279 marc_loc_updates Library of Congress update > service 2010-2012 > 2% 94% 465,994 ia Internet Archive scanning projects > 1% 95% 334,779 marc_laurentian > 1% 97% 320,925 bcl_marc Boston College > 1% 98% 224,762 marc_western_washington_univ > 1% 99% 203,740 SanFranPLnn San Francisco public libraries > 1% 99% 138,028 marc_binghamton_univ > 0% 99% 62,248 hollis_marc > 0% 100% 48,714 bpl_marc Boston Public Library (also contributes > through scanning project) > 0% 100% 31,771 wfm_bk_marc > 0% 100% 14,635 marc_ithaca_college > 0% 100% 8,193 marc_cca > 0% 100% 7,998 CollingswoodLibraryMarcDump10-27-2008 > 0% 100% 5,338 unc_catalog_marc > 100% > 22,665,544 > > > Vetting the top 10 sources would get us over 95% and the top 15 would cover > 99% of the records. Of course, by the same token, we could lose millions of > records if one of these heavy hitters had a provenance which turned out to be > unreliable. > > Tom > > > > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected]
_______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
