Re: [CODE4LIB] viaf and the levenshtein algorithm

2016-06-09 Thread WILDER, COLIN
Yes, me too. I take a great interest in that area of R&D and look forward to 
learning more. 
Colin Wilder


> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> Of McAulay, Lisa
> Sent: Tuesday, June 07, 2016 12:49 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] viaf and the levenshtein algorithm
> 
> Hi Eric,
> 
> I really enjoyed this message. Thanks for sharing!
> 
> Best,
> Lisa
> 
> > On Jun 7, 2016, at 2:49 AM, Eric Lease Morgan 
> wrote:
> >
> > In the past few weeks I have had some interesting experiences with
> WorldCat, VIAF, and the Levenshtein algorithm. [1, 2]
> >
> > In short, I was given a set of authority records with the goal of 
> > associating
> each name with a VIAF identifier. To accomplish this goal I first created a
> rudimentary database - an easily parsed list of MARC 1xx fields. I then
> looped through the database, and searched VIAF via the AutoSuggest
> interface looking for one-to-one matches. If found, I updated my database
> with the VIAF identifier. The AutoSuggest interface was fast but only able to
> associate 20% of my names with identifiers. (Moreover, I don't know how it
> works; AutoSuggest is a "black box" technology.)
> >
> > I then looped through the database again, but this time I queried VIAF using
> the SRU interface. Searches often returned many hits, not just one-to-one
> matches, but through the use of the Levenshtein algorithm I was able to
> intelligently select items from the search results and update my database
> accordingly. [3] Through the use of the SRU/Levenshtein combination, I was
> able to associate another 50-55 percent of my names with identifiers.
> >
> > Now that I have close to 75% of my names associated with VIAF identifiers,
> I can update my authority list's MARC 024 fields, in turn, I can then provide
> enhanced services against my catalog as well as pave the way for linked data
> implementations.
> >
> > Sometimes our library automation tasks can use a bit more computer
> science. Librarianship isn't all about service and the humanities. 
> Librarianship
> is an arscient discipline. [4]
> >
> > [1] VIAF Finder - http://infomotions.com/blog/2016/05/viaf-finder/
> > [2] Almost perfection - http://infomotions.com/blog/2016/06/levenshtein/
> > [3] Levenshtein - https://en.wikipedia.org/wiki/Levenshtein_distance
> > [4] arscience - http://infomotions.com/blog/2008/07/arscience/
> >
> > -
> > Eric Lease Morgan


Re: [CODE4LIB] viaf and the levenshtein algorithm

2016-06-07 Thread P. S. Mukhopadhyay
Congrats Eric Lease Morgan. A real eye opener for me.

Best wishes

On Tue, Jun 7, 2016 at 10:18 PM, McAulay, Lisa 
wrote:

> Hi Eric,
>
> I really enjoyed this message. Thanks for sharing!
>
> Best,
> Lisa
>
> > On Jun 7, 2016, at 2:49 AM, Eric Lease Morgan  wrote:
> >
> > In the past few weeks I have had some interesting experiences with
> WorldCat, VIAF, and the Levenshtein algorithm. [1, 2]
> >
> > In short, I was given a set of authority records with the goal of
> associating each name with a VIAF identifier. To accomplish this goal I
> first created a rudimentary database — an easily parsed list of MARC 1xx
> fields. I then looped through the database, and searched VIAF via the
> AutoSuggest interface looking for one-to-one matches. If found, I updated
> my database with the VIAF identifier. The AutoSuggest interface was fast
> but only able to associate 20% of my names with identifiers. (Moreover, I
> don’t know how it works; AutoSuggest is a “black box” technology.)
> >
> > I then looped through the database again, but this time I queried VIAF
> using the SRU interface. Searches often returned many hits, not just
> one-to-one matches, but through the use of the Levenshtein algorithm I was
> able to intelligently select items from the search results and update my
> database accordingly. [3] Through the use of the SRU/Levenshtein
> combination, I was able to associate another 50-55 percent of my names with
> identifiers.
> >
> > Now that I have close to 75% of my names associated with VIAF
> identifiers, I can update my authority list’s MARC 024 fields, in turn, I
> can then provide enhanced services against my catalog as well as pave the
> way for linked data implementations.
> >
> > Sometimes our library automation tasks can use a bit more computer
> science. Librarianship isn’t all about service and the humanities.
> Librarianship is an arscient discipline. [4]
> >
> > [1] VIAF Finder - http://infomotions.com/blog/2016/05/viaf-finder/
> > [2] Almost perfection - http://infomotions.com/blog/2016/06/levenshtein/
> > [3] Levenshtein - https://en.wikipedia.org/wiki/Levenshtein_distance
> > [4] arscience - http://infomotions.com/blog/2008/07/arscience/
> >
> > —
> > Eric Lease Morgan
>



-- 
---
Dr. Parthasarathi Mukhopadhyay
Associate Professor, Department of Library and Information Science,
University of Kalyani,
Kalyani - 741 235 (WB), India
---


Re: [CODE4LIB] viaf and the levenshtein algorithm

2016-06-07 Thread McAulay, Lisa
Hi Eric,

I really enjoyed this message. Thanks for sharing!

Best,
Lisa

> On Jun 7, 2016, at 2:49 AM, Eric Lease Morgan  wrote:
> 
> In the past few weeks I have had some interesting experiences with WorldCat, 
> VIAF, and the Levenshtein algorithm. [1, 2]
> 
> In short, I was given a set of authority records with the goal of associating 
> each name with a VIAF identifier. To accomplish this goal I first created a 
> rudimentary database — an easily parsed list of MARC 1xx fields. I then 
> looped through the database, and searched VIAF via the AutoSuggest interface 
> looking for one-to-one matches. If found, I updated my database with the VIAF 
> identifier. The AutoSuggest interface was fast but only able to associate 20% 
> of my names with identifiers. (Moreover, I don’t know how it works; 
> AutoSuggest is a “black box” technology.)
> 
> I then looped through the database again, but this time I queried VIAF using 
> the SRU interface. Searches often returned many hits, not just one-to-one 
> matches, but through the use of the Levenshtein algorithm I was able to 
> intelligently select items from the search results and update my database 
> accordingly. [3] Through the use of the SRU/Levenshtein combination, I was 
> able to associate another 50-55 percent of my names with identifiers.
> 
> Now that I have close to 75% of my names associated with VIAF identifiers, I 
> can update my authority list’s MARC 024 fields, in turn, I can then provide 
> enhanced services against my catalog as well as pave the way for linked data 
> implementations.
> 
> Sometimes our library automation tasks can use a bit more computer science. 
> Librarianship isn’t all about service and the humanities. Librarianship is an 
> arscient discipline. [4]
> 
> [1] VIAF Finder - http://infomotions.com/blog/2016/05/viaf-finder/
> [2] Almost perfection - http://infomotions.com/blog/2016/06/levenshtein/
> [3] Levenshtein - https://en.wikipedia.org/wiki/Levenshtein_distance
> [4] arscience - http://infomotions.com/blog/2008/07/arscience/
> 
> —
> Eric Lease Morgan


[CODE4LIB] viaf and the levenshtein algorithm

2016-06-07 Thread Eric Lease Morgan
In the past few weeks I have had some interesting experiences with WorldCat, 
VIAF, and the Levenshtein algorithm. [1, 2]

In short, I was given a set of authority records with the goal of associating 
each name with a VIAF identifier. To accomplish this goal I first created a 
rudimentary database — an easily parsed list of MARC 1xx fields. I then looped 
through the database, and searched VIAF via the AutoSuggest interface looking 
for one-to-one matches. If found, I updated my database with the VIAF 
identifier. The AutoSuggest interface was fast but only able to associate 20% 
of my names with identifiers. (Moreover, I don’t know how it works; AutoSuggest 
is a “black box” technology.)

I then looped through the database again, but this time I queried VIAF using 
the SRU interface. Searches often returned many hits, not just one-to-one 
matches, but through the use of the Levenshtein algorithm I was able to 
intelligently select items from the search results and update my database 
accordingly. [3] Through the use of the SRU/Levenshtein combination, I was able 
to associate another 50-55 percent of my names with identifiers.

Now that I have close to 75% of my names associated with VIAF identifiers, I 
can update my authority list’s MARC 024 fields, in turn, I can then provide 
enhanced services against my catalog as well as pave the way for linked data 
implementations.

Sometimes our library automation tasks can use a bit more computer science. 
Librarianship isn’t all about service and the humanities. Librarianship is an 
arscient discipline. [4]

[1] VIAF Finder - http://infomotions.com/blog/2016/05/viaf-finder/
[2] Almost perfection - http://infomotions.com/blog/2016/06/levenshtein/
[3] Levenshtein - https://en.wikipedia.org/wiki/Levenshtein_distance
[4] arscience - http://infomotions.com/blog/2008/07/arscience/

—
Eric Lease Morgan