Re: [CODE4LIB] dict protocol

2008-03-31 Thread Eric Lease Morgan

On Mar 31, 2008, at 10:45 AM, Tim Shearer wrote:


Given the likely need to map back from an alternate name (string
search in
the definition?) to the auth name (maybe the most common use for
such a
service?), I think this route might be on the inefficient side.



I'm not quite sure how this will turn out either, but for a good time
I extracted bunches more authorities (subject, personal, corporate,
and geographic) from the FRED data, stuffed them into my DICT server,
and installed a CGI front-end. Try the following URLs, and they will
return huge "tag clouds" whose links could point to definitions or
searches against indexes:

  * blues - http://tinyurl.com/yt2db7
  * lancaster - http://tinyurl.com/yw5hdr
  * librarianship - http://tinyurl.com/2baoxg

In the end, I will probably use some other indexer to do this work,
but this suite of DICT tools has a pretty low barrier to implementation.

--
Eric Lease Morgan


Re: [CODE4LIB] dict protocol

2008-03-31 Thread Tim Shearer

Hi Eric,

Given the likely need to map back from an alternate name (string search in
the definition?) to the auth name (maybe the most common use for such a
service?), I think this route might be on the inefficient side.

I've been wondering about names as handles, with a crossref-like middleman
piece.  But not doing anything about such ideas.

-t

On Mon, 31 Mar 2008, Eric Lease Morgan wrote:



Over the weekend I had fun with the DICT protocol, a DICT server, a
DICT client, and the creation of dictionaries for the afore mentioned.

The DICT protocol seems to be a simple client/server protocol for
searching remote content and returning "definitions" of the query.
[1] I was initially drawn to the protocol for its content.
Specifically, I wanted a dictionary because I thought it would be
useful in a "next generation" library catalog application. The server
was trivial to install because it is available via yum. Since it is
protocol there are a number of clients and libraries available.
There's also bunches o' data to be had, albeit a bit dated. Some of
it includes: 1913 dictionary, version 2.0 of WordNet, the CIA World
Fact Book (2000), Moby's Thesaurus, a gazetteer, and quite a number
of English to other dictionaries.

What's interesting is the DICT protocol data is not limited to
"dictionaries" as the Fact Book exemplifies. The data really only has
two fields: headword (key), and note (definition). After thinking
about it, I thought authority lists would be a pretty good candidate
for DICT. The headword would be the term, and the definition would be
the See From and See Also listings.

Off on an adventure, I downloaded subject authorities from FRED. [2]
I used a shell script to loop through my data (subjects2dictd,
attached) which employed XSLT to parse the MARCXML
(subjects2dict.xsl, attached) and then ran various dict* utilities.
The end result is a "dictionary" query-able with your favorite DICT
client. From a Linux shell, try:

dict -h 208.81.177.118 -d subjects -s substring blues

While I think this is pretty kewl, I wonder whether or not DICT is
the correct approach. Maybe I should use a more robust, full-text
indexer for this problem? After all, DICT servers only look at the
headword when searching, not the definitions. On the other hand DICT
was *pretty* easy to get up an running, and authority lists are a
type of dictionary.

[1] http://www.dict.org
[2] http://www.ibiblio.org/fred2.0/authorities/

--
Eric Lease Morgan
University Libraries of Notre Dame


subjects2dictd
Description: Binary data





subjects2dict.xsl
Description: Binary data





[CODE4LIB] dict protocol

2008-03-31 Thread Eric Lease Morgan


Over the weekend I had fun with the DICT protocol, a DICT server, a
DICT client, and the creation of dictionaries for the afore mentioned.

The DICT protocol seems to be a simple client/server protocol for
searching remote content and returning "definitions" of the query.
[1] I was initially drawn to the protocol for its content.
Specifically, I wanted a dictionary because I thought it would be
useful in a "next generation" library catalog application. The server
was trivial to install because it is available via yum. Since it is
protocol there are a number of clients and libraries available.
There's also bunches o' data to be had, albeit a bit dated. Some of
it includes: 1913 dictionary, version 2.0 of WordNet, the CIA World
Fact Book (2000), Moby's Thesaurus, a gazetteer, and quite a number
of English to other dictionaries.

What's interesting is the DICT protocol data is not limited to
"dictionaries" as the Fact Book exemplifies. The data really only has
two fields: headword (key), and note (definition). After thinking
about it, I thought authority lists would be a pretty good candidate
for DICT. The headword would be the term, and the definition would be
the See From and See Also listings.

Off on an adventure, I downloaded subject authorities from FRED. [2]
I used a shell script to loop through my data (subjects2dictd,
attached) which employed XSLT to parse the MARCXML
(subjects2dict.xsl, attached) and then ran various dict* utilities.
The end result is a "dictionary" query-able with your favorite DICT
client. From a Linux shell, try:

  dict -h 208.81.177.118 -d subjects -s substring blues

While I think this is pretty kewl, I wonder whether or not DICT is
the correct approach. Maybe I should use a more robust, full-text
indexer for this problem? After all, DICT servers only look at the
headword when searching, not the definitions. On the other hand DICT
was *pretty* easy to get up an running, and authority lists are a
type of dictionary.

[1] http://www.dict.org
[2] http://www.ibiblio.org/fred2.0/authorities/

--
Eric Lease Morgan
University Libraries of Notre Dame



subjects2dictd
Description: Binary data





subjects2dict.xsl
Description: Binary data