Re: [ol-discuss] Want to merge authors? Try Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.)

2012-10-26 Thread Ben Companjen
Hi Anand, all,

You no longer need to work on that API to merge the 1+ Shirley
Institute/Wira Joint Conference (1977 Manchester, Eng.) authors,
because I just finished the job.

Now eventually the author page [1] will list 1+ copies of the
exact same work/edition. So some automation to merge works and
editions (in general) would be welcome. (Yes, I know this has been on
wishlists for a long time.)

Ben

[1] 
http://openlibrary.org/authors/OL4602513A/Shirley_Institute_Wira_Joint_Conference_%281977_Manchester_Eng.%29

On 10 May 2012 04:28, Anand Chitipothu an...@archive.org wrote:

 On 09-May-2012, at 6:35 PM, Ben Companjen wrote:

 Hi,

 Although I found 341 duplicates of President Clinton a lot yesterday,
 there is still the author that goes by the name Shirley
 Institute/Wira Joint Conference (1977 Manchester, Eng.). There are a
 whopping 10,047 authors with that name! Merging those manually is only
 for those who desperately need an extremely boring task :)

 Looking at the subject and book titles in the search results, I think
 one MARC record was imported many times without duplicate detection,
 so merging the authors would still leave some 1 duplicate
 works/editions.

 Any idea how to best solve this?

 I can work out an API. Will that help?

 Anand
 ___
 Ol-discuss mailing list
 Ol-discuss@archive.org
 http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
 To unsubscribe from this mailing list, send email to 
 ol-discuss-unsubscr...@archive.org
___
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org


Re: [ol-discuss] Want to merge authors? Try Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.)

2012-05-10 Thread Ben Companjen
On 10 May 2012 04:28, Anand Chitipothu an...@archive.org wrote:

 On 09-May-2012, at 6:35 PM, Ben Companjen wrote:

 Hi,

 Although I found 341 duplicates of President Clinton a lot yesterday,
 there is still the author that goes by the name Shirley
 Institute/Wira Joint Conference (1977 Manchester, Eng.). There are a
 whopping 10,047 authors with that name! Merging those manually is only
 for those who desperately need an extremely boring task :)

 Looking at the subject and book titles in the search results, I think
 one MARC record was imported many times without duplicate detection,
 so merging the authors would still leave some 1 duplicate
 works/editions.

 Any idea how to best solve this?

 I can work out an API. Will that help?

That is probably the best way to go.

It looks like the few authors that show up at the top of the search
results with multiple (= 2) works were incorrectly used as author of
those second works. I just corrected a few that according to the MARC
records should go under a Russian scientist and British Columbia's
Ministry of Finance.
I think there is only one author (the conference), and only one
edition of one work (The future of natural fibres), imported from one
MARC record [1] from the Miami University of Ohio ('Let's put the
item's price in, let's see, the 020 field. There, that looks nice. ;)
).

OL4602513A is the first key when ordering the keys in an ascending way.

Perhaps this can help determine what records (author, but also
work/edition) to merge or even delete?

Ben

[1] 
http://openlibrary.org/show-records/marc_miami_univ_ohio/allbibs0016.out:7058864:1146

 Anand
 ___
 Ol-discuss mailing list
 Ol-discuss@archive.org
 http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
 To unsubscribe from this mailing list, send email to 
 ol-discuss-unsubscr...@archive.org
___
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org


Re: [ol-discuss] Want to merge authors? Try Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.)

2012-05-10 Thread Ben Companjen
On 10 May 2012 05:08, Karen Coyle kco...@kcoyle.net wrote:
 OK, now I get that too. No idea what I did different before but...
 nevermind.

Maybe you used the search box on the top right of the page, instead of
the search box you see after you click Authors left of the OL logo,
and chose one of the authors in the facet on the right of the results
page. Then you would see only the works/editions by that one author of
10047 authors by the same name.


 This has got to be a bug. The same item has been entered who knows how many
 times, and at least some of the IDs are consecutive.

There have been issues like [1] about the import API creating
duplicates. It seems that all these duplicate Shirley ... Eng.)
authors were created years ago, so it may have been solved already.

Ben

[1] https://github.com/internetarchive/openlibrary/issues/42
___
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org


Re: [ol-discuss] Want to merge authors? Try Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.)

2012-05-09 Thread Anand Chitipothu

On 09-May-2012, at 6:35 PM, Ben Companjen wrote:

 Hi,
 
 Although I found 341 duplicates of President Clinton a lot yesterday,
 there is still the author that goes by the name Shirley
 Institute/Wira Joint Conference (1977 Manchester, Eng.). There are a
 whopping 10,047 authors with that name! Merging those manually is only
 for those who desperately need an extremely boring task :)
 
 Looking at the subject and book titles in the search results, I think
 one MARC record was imported many times without duplicate detection,
 so merging the authors would still leave some 1 duplicate
 works/editions.
 
 Any idea how to best solve this?

I can work out an API. Will that help?

Anand
___
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org


Re: [ol-discuss] Want to merge authors? Try Shirley Institute/Wira Joint Conference (1977 Manchester, Eng.)

2012-05-09 Thread Karen Coyle
OK, now I get that too. No idea what I did different before but... 
nevermind.

This has got to be a bug. The same item has been entered who knows how 
many times, and at least some of the IDs are consecutive.

http://openlibrary.org/authors/OL4602522A
http://openlibrary.org/authors/OL4602523A
http://openlibrary.org/authors/OL4602524A


Ditto the European law one.

http://openlibrary.org/authors/OL4791619A
http://openlibrary.org/authors/OL4791620A
etc.

I think there was another case like this before.

kc

On 5/9/12 2:57 PM, Ben Companjen wrote:
 I am, yes. I loaded the ~6.9 million author records from April's dump
 into MySQL, did a GROUP BY slug (where slug is the author name in
 lower case, without spaces and punctuation) and got
 shirleyinstitute/wirajointconference1977manchestereng: 10047.

 I then searched for Shirley institute 1977 as an author on the website
 and got 10,047 hits. And I still do:
 http://openlibrary.org/search/authors?q=shirley+institute+1977

 Second in the list of slugs is colloquyoneuropeanlaw1981messinaitaly: 2368
 http://openlibrary.org/search/authors?q=colloquy+1981+messina

 Ben

 On 9 May 2012 23:44, Karen Coylekco...@kcoyle.net  wrote:
 This is rather odd. When I look up Shirley institute as an author and
 find the 1977 joint conference I get 2 work titles, each that has only 1
 edition. Ben, are you working with the dump?

 kc

 On 5/9/12 6:05 AM, Ben Companjen wrote:
 Hi,

 Although I found 341 duplicates of President Clinton a lot yesterday,
 there is still the author that goes by the name Shirley
 Institute/Wira Joint Conference (1977 Manchester, Eng.). There are a
 whopping 10,047 authors with that name! Merging those manually is only
 for those who desperately need an extremely boring task :)

 Looking at the subject and book titles in the search results, I think
 one MARC record was imported many times without duplicate detection,
 so merging the authors would still leave some 1 duplicate
 works/editions.

 Any idea how to best solve this?

 Ben
 ___
 Ol-discuss mailing list
 Ol-discuss@archive.org
 http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
 To unsubscribe from this mailing list, send email to 
 ol-discuss-unsubscr...@archive.org


 --
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 ph: 1-510-540-7596
 m: 1-510-435-8234
 skype: kcoylenet
 ___
 Ol-discuss mailing list
 Ol-discuss@archive.org
 http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
 To unsubscribe from this mailing list, send email to 
 ol-discuss-unsubscr...@archive.org


-- 
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
___
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org