Re: [ol-discuss] Want to merge authors? Try Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.)
Hi Anand, all, You no longer need to work on that API to merge the 1+ Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.) authors, because I just finished the job. Now eventually the author page [1] will list 1+ copies of the exact same work/edition. So some automation to merge works and editions (in general) would be welcome. (Yes, I know this has been on wishlists for a long time.) Ben [1] http://openlibrary.org/authors/OL4602513A/Shirley_Institute_Wira_Joint_Conference_%281977_Manchester_Eng.%29 On 10 May 2012 04:28, Anand Chitipothu an...@archive.org wrote: On 09-May-2012, at 6:35 PM, Ben Companjen wrote: Hi, Although I found 341 duplicates of President Clinton a lot yesterday, there is still the author that goes by the name Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.). There are a whopping 10,047 authors with that name! Merging those manually is only for those who desperately need an extremely boring task :) Looking at the subject and book titles in the search results, I think one MARC record was imported many times without duplicate detection, so merging the authors would still leave some 1 duplicate works/editions. Any idea how to best solve this? I can work out an API. Will that help? Anand ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org
Re: [ol-discuss] Want to merge authors? Try Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.)
On 10 May 2012 04:28, Anand Chitipothu an...@archive.org wrote: On 09-May-2012, at 6:35 PM, Ben Companjen wrote: Hi, Although I found 341 duplicates of President Clinton a lot yesterday, there is still the author that goes by the name Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.). There are a whopping 10,047 authors with that name! Merging those manually is only for those who desperately need an extremely boring task :) Looking at the subject and book titles in the search results, I think one MARC record was imported many times without duplicate detection, so merging the authors would still leave some 1 duplicate works/editions. Any idea how to best solve this? I can work out an API. Will that help? That is probably the best way to go. It looks like the few authors that show up at the top of the search results with multiple (= 2) works were incorrectly used as author of those second works. I just corrected a few that according to the MARC records should go under a Russian scientist and British Columbia's Ministry of Finance. I think there is only one author (the conference), and only one edition of one work (The future of natural fibres), imported from one MARC record [1] from the Miami University of Ohio ('Let's put the item's price in, let's see, the 020 field. There, that looks nice. ;) ). OL4602513A is the first key when ordering the keys in an ascending way. Perhaps this can help determine what records (author, but also work/edition) to merge or even delete? Ben [1] http://openlibrary.org/show-records/marc_miami_univ_ohio/allbibs0016.out:7058864:1146 Anand ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org
Re: [ol-discuss] Want to merge authors? Try Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.)
On 10 May 2012 05:08, Karen Coyle kco...@kcoyle.net wrote: OK, now I get that too. No idea what I did different before but... nevermind. Maybe you used the search box on the top right of the page, instead of the search box you see after you click Authors left of the OL logo, and chose one of the authors in the facet on the right of the results page. Then you would see only the works/editions by that one author of 10047 authors by the same name. This has got to be a bug. The same item has been entered who knows how many times, and at least some of the IDs are consecutive. There have been issues like [1] about the import API creating duplicates. It seems that all these duplicate Shirley ... Eng.) authors were created years ago, so it may have been solved already. Ben [1] https://github.com/internetarchive/openlibrary/issues/42 ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org
Re: [ol-discuss] Want to merge authors? Try Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.)
On 09-May-2012, at 6:35 PM, Ben Companjen wrote: Hi, Although I found 341 duplicates of President Clinton a lot yesterday, there is still the author that goes by the name Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.). There are a whopping 10,047 authors with that name! Merging those manually is only for those who desperately need an extremely boring task :) Looking at the subject and book titles in the search results, I think one MARC record was imported many times without duplicate detection, so merging the authors would still leave some 1 duplicate works/editions. Any idea how to best solve this? I can work out an API. Will that help? Anand ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org
Re: [ol-discuss] Want to merge authors? Try Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.)
OK, now I get that too. No idea what I did different before but... nevermind. This has got to be a bug. The same item has been entered who knows how many times, and at least some of the IDs are consecutive. http://openlibrary.org/authors/OL4602522A http://openlibrary.org/authors/OL4602523A http://openlibrary.org/authors/OL4602524A Ditto the European law one. http://openlibrary.org/authors/OL4791619A http://openlibrary.org/authors/OL4791620A etc. I think there was another case like this before. kc On 5/9/12 2:57 PM, Ben Companjen wrote: I am, yes. I loaded the ~6.9 million author records from April's dump into MySQL, did a GROUP BY slug (where slug is the author name in lower case, without spaces and punctuation) and got shirleyinstitute/wirajointconference1977manchestereng: 10047. I then searched for Shirley institute 1977 as an author on the website and got 10,047 hits. And I still do: http://openlibrary.org/search/authors?q=shirley+institute+1977 Second in the list of slugs is colloquyoneuropeanlaw1981messinaitaly: 2368 http://openlibrary.org/search/authors?q=colloquy+1981+messina Ben On 9 May 2012 23:44, Karen Coylekco...@kcoyle.net wrote: This is rather odd. When I look up Shirley institute as an author and find the 1977 joint conference I get 2 work titles, each that has only 1 edition. Ben, are you working with the dump? kc On 5/9/12 6:05 AM, Ben Companjen wrote: Hi, Although I found 341 duplicates of President Clinton a lot yesterday, there is still the author that goes by the name Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.). There are a whopping 10,047 authors with that name! Merging those manually is only for those who desperately need an extremely boring task :) Looking at the subject and book titles in the search results, I think one MARC record was imported many times without duplicate detection, so merging the authors would still leave some 1 duplicate works/editions. Any idea how to best solve this? Ben ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org