Re: [CODE4LIB] viaf and the levenshtein algorithm
Yes, me too. I take a great interest in that area of R&D and look forward to learning more. Colin Wilder > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf > Of McAulay, Lisa > Sent: Tuesday, June 07, 2016 12:49 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] viaf and the levenshtein algorithm > > Hi Eric, > > I really enjoyed this message. Thanks for sharing! > > Best, > Lisa > > > On Jun 7, 2016, at 2:49 AM, Eric Lease Morgan > wrote: > > > > In the past few weeks I have had some interesting experiences with > WorldCat, VIAF, and the Levenshtein algorithm. [1, 2] > > > > In short, I was given a set of authority records with the goal of > > associating > each name with a VIAF identifier. To accomplish this goal I first created a > rudimentary database - an easily parsed list of MARC 1xx fields. I then > looped through the database, and searched VIAF via the AutoSuggest > interface looking for one-to-one matches. If found, I updated my database > with the VIAF identifier. The AutoSuggest interface was fast but only able to > associate 20% of my names with identifiers. (Moreover, I don't know how it > works; AutoSuggest is a "black box" technology.) > > > > I then looped through the database again, but this time I queried VIAF using > the SRU interface. Searches often returned many hits, not just one-to-one > matches, but through the use of the Levenshtein algorithm I was able to > intelligently select items from the search results and update my database > accordingly. [3] Through the use of the SRU/Levenshtein combination, I was > able to associate another 50-55 percent of my names with identifiers. > > > > Now that I have close to 75% of my names associated with VIAF identifiers, > I can update my authority list's MARC 024 fields, in turn, I can then provide > enhanced services against my catalog as well as pave the way for linked data > implementations. > > > > Sometimes our library automation tasks can use a bit more computer > science. Librarianship isn't all about service and the humanities. > Librarianship > is an arscient discipline. [4] > > > > [1] VIAF Finder - http://infomotions.com/blog/2016/05/viaf-finder/ > > [2] Almost perfection - http://infomotions.com/blog/2016/06/levenshtein/ > > [3] Levenshtein - https://en.wikipedia.org/wiki/Levenshtein_distance > > [4] arscience - http://infomotions.com/blog/2008/07/arscience/ > > > > - > > Eric Lease Morgan
Re: [CODE4LIB] viaf and the levenshtein algorithm
Congrats Eric Lease Morgan. A real eye opener for me. Best wishes On Tue, Jun 7, 2016 at 10:18 PM, McAulay, Lisa wrote: > Hi Eric, > > I really enjoyed this message. Thanks for sharing! > > Best, > Lisa > > > On Jun 7, 2016, at 2:49 AM, Eric Lease Morgan wrote: > > > > In the past few weeks I have had some interesting experiences with > WorldCat, VIAF, and the Levenshtein algorithm. [1, 2] > > > > In short, I was given a set of authority records with the goal of > associating each name with a VIAF identifier. To accomplish this goal I > first created a rudimentary database — an easily parsed list of MARC 1xx > fields. I then looped through the database, and searched VIAF via the > AutoSuggest interface looking for one-to-one matches. If found, I updated > my database with the VIAF identifier. The AutoSuggest interface was fast > but only able to associate 20% of my names with identifiers. (Moreover, I > don’t know how it works; AutoSuggest is a “black box” technology.) > > > > I then looped through the database again, but this time I queried VIAF > using the SRU interface. Searches often returned many hits, not just > one-to-one matches, but through the use of the Levenshtein algorithm I was > able to intelligently select items from the search results and update my > database accordingly. [3] Through the use of the SRU/Levenshtein > combination, I was able to associate another 50-55 percent of my names with > identifiers. > > > > Now that I have close to 75% of my names associated with VIAF > identifiers, I can update my authority list’s MARC 024 fields, in turn, I > can then provide enhanced services against my catalog as well as pave the > way for linked data implementations. > > > > Sometimes our library automation tasks can use a bit more computer > science. Librarianship isn’t all about service and the humanities. > Librarianship is an arscient discipline. [4] > > > > [1] VIAF Finder - http://infomotions.com/blog/2016/05/viaf-finder/ > > [2] Almost perfection - http://infomotions.com/blog/2016/06/levenshtein/ > > [3] Levenshtein - https://en.wikipedia.org/wiki/Levenshtein_distance > > [4] arscience - http://infomotions.com/blog/2008/07/arscience/ > > > > — > > Eric Lease Morgan > -- --- Dr. Parthasarathi Mukhopadhyay Associate Professor, Department of Library and Information Science, University of Kalyani, Kalyani - 741 235 (WB), India ---
Re: [CODE4LIB] viaf and the levenshtein algorithm
Hi Eric, I really enjoyed this message. Thanks for sharing! Best, Lisa > On Jun 7, 2016, at 2:49 AM, Eric Lease Morgan wrote: > > In the past few weeks I have had some interesting experiences with WorldCat, > VIAF, and the Levenshtein algorithm. [1, 2] > > In short, I was given a set of authority records with the goal of associating > each name with a VIAF identifier. To accomplish this goal I first created a > rudimentary database — an easily parsed list of MARC 1xx fields. I then > looped through the database, and searched VIAF via the AutoSuggest interface > looking for one-to-one matches. If found, I updated my database with the VIAF > identifier. The AutoSuggest interface was fast but only able to associate 20% > of my names with identifiers. (Moreover, I don’t know how it works; > AutoSuggest is a “black box” technology.) > > I then looped through the database again, but this time I queried VIAF using > the SRU interface. Searches often returned many hits, not just one-to-one > matches, but through the use of the Levenshtein algorithm I was able to > intelligently select items from the search results and update my database > accordingly. [3] Through the use of the SRU/Levenshtein combination, I was > able to associate another 50-55 percent of my names with identifiers. > > Now that I have close to 75% of my names associated with VIAF identifiers, I > can update my authority list’s MARC 024 fields, in turn, I can then provide > enhanced services against my catalog as well as pave the way for linked data > implementations. > > Sometimes our library automation tasks can use a bit more computer science. > Librarianship isn’t all about service and the humanities. Librarianship is an > arscient discipline. [4] > > [1] VIAF Finder - http://infomotions.com/blog/2016/05/viaf-finder/ > [2] Almost perfection - http://infomotions.com/blog/2016/06/levenshtein/ > [3] Levenshtein - https://en.wikipedia.org/wiki/Levenshtein_distance > [4] arscience - http://infomotions.com/blog/2008/07/arscience/ > > — > Eric Lease Morgan
[CODE4LIB] viaf and the levenshtein algorithm
In the past few weeks I have had some interesting experiences with WorldCat, VIAF, and the Levenshtein algorithm. [1, 2] In short, I was given a set of authority records with the goal of associating each name with a VIAF identifier. To accomplish this goal I first created a rudimentary database — an easily parsed list of MARC 1xx fields. I then looped through the database, and searched VIAF via the AutoSuggest interface looking for one-to-one matches. If found, I updated my database with the VIAF identifier. The AutoSuggest interface was fast but only able to associate 20% of my names with identifiers. (Moreover, I don’t know how it works; AutoSuggest is a “black box” technology.) I then looped through the database again, but this time I queried VIAF using the SRU interface. Searches often returned many hits, not just one-to-one matches, but through the use of the Levenshtein algorithm I was able to intelligently select items from the search results and update my database accordingly. [3] Through the use of the SRU/Levenshtein combination, I was able to associate another 50-55 percent of my names with identifiers. Now that I have close to 75% of my names associated with VIAF identifiers, I can update my authority list’s MARC 024 fields, in turn, I can then provide enhanced services against my catalog as well as pave the way for linked data implementations. Sometimes our library automation tasks can use a bit more computer science. Librarianship isn’t all about service and the humanities. Librarianship is an arscient discipline. [4] [1] VIAF Finder - http://infomotions.com/blog/2016/05/viaf-finder/ [2] Almost perfection - http://infomotions.com/blog/2016/06/levenshtein/ [3] Levenshtein - https://en.wikipedia.org/wiki/Levenshtein_distance [4] arscience - http://infomotions.com/blog/2008/07/arscience/ — Eric Lease Morgan