Re: [CODE4LIB] dict protocol
On Mar 31, 2008, at 10:45 AM, Tim Shearer wrote: Given the likely need to map back from an alternate name (string search in the definition?) to the auth name (maybe the most common use for such a service?), I think this route might be on the inefficient side. I'm not quite sure how this will turn out either, but for a good time I extracted bunches more authorities (subject, personal, corporate, and geographic) from the FRED data, stuffed them into my DICT server, and installed a CGI front-end. Try the following URLs, and they will return huge "tag clouds" whose links could point to definitions or searches against indexes: * blues - http://tinyurl.com/yt2db7 * lancaster - http://tinyurl.com/yw5hdr * librarianship - http://tinyurl.com/2baoxg In the end, I will probably use some other indexer to do this work, but this suite of DICT tools has a pretty low barrier to implementation. -- Eric Lease Morgan
Re: [CODE4LIB] dict protocol
Hi Eric, Given the likely need to map back from an alternate name (string search in the definition?) to the auth name (maybe the most common use for such a service?), I think this route might be on the inefficient side. I've been wondering about names as handles, with a crossref-like middleman piece. But not doing anything about such ideas. -t On Mon, 31 Mar 2008, Eric Lease Morgan wrote: Over the weekend I had fun with the DICT protocol, a DICT server, a DICT client, and the creation of dictionaries for the afore mentioned. The DICT protocol seems to be a simple client/server protocol for searching remote content and returning "definitions" of the query. [1] I was initially drawn to the protocol for its content. Specifically, I wanted a dictionary because I thought it would be useful in a "next generation" library catalog application. The server was trivial to install because it is available via yum. Since it is protocol there are a number of clients and libraries available. There's also bunches o' data to be had, albeit a bit dated. Some of it includes: 1913 dictionary, version 2.0 of WordNet, the CIA World Fact Book (2000), Moby's Thesaurus, a gazetteer, and quite a number of English to other dictionaries. What's interesting is the DICT protocol data is not limited to "dictionaries" as the Fact Book exemplifies. The data really only has two fields: headword (key), and note (definition). After thinking about it, I thought authority lists would be a pretty good candidate for DICT. The headword would be the term, and the definition would be the See From and See Also listings. Off on an adventure, I downloaded subject authorities from FRED. [2] I used a shell script to loop through my data (subjects2dictd, attached) which employed XSLT to parse the MARCXML (subjects2dict.xsl, attached) and then ran various dict* utilities. The end result is a "dictionary" query-able with your favorite DICT client. From a Linux shell, try: dict -h 208.81.177.118 -d subjects -s substring blues While I think this is pretty kewl, I wonder whether or not DICT is the correct approach. Maybe I should use a more robust, full-text indexer for this problem? After all, DICT servers only look at the headword when searching, not the definitions. On the other hand DICT was *pretty* easy to get up an running, and authority lists are a type of dictionary. [1] http://www.dict.org [2] http://www.ibiblio.org/fred2.0/authorities/ -- Eric Lease Morgan University Libraries of Notre Dame subjects2dictd Description: Binary data subjects2dict.xsl Description: Binary data
[CODE4LIB] dict protocol
Over the weekend I had fun with the DICT protocol, a DICT server, a DICT client, and the creation of dictionaries for the afore mentioned. The DICT protocol seems to be a simple client/server protocol for searching remote content and returning "definitions" of the query. [1] I was initially drawn to the protocol for its content. Specifically, I wanted a dictionary because I thought it would be useful in a "next generation" library catalog application. The server was trivial to install because it is available via yum. Since it is protocol there are a number of clients and libraries available. There's also bunches o' data to be had, albeit a bit dated. Some of it includes: 1913 dictionary, version 2.0 of WordNet, the CIA World Fact Book (2000), Moby's Thesaurus, a gazetteer, and quite a number of English to other dictionaries. What's interesting is the DICT protocol data is not limited to "dictionaries" as the Fact Book exemplifies. The data really only has two fields: headword (key), and note (definition). After thinking about it, I thought authority lists would be a pretty good candidate for DICT. The headword would be the term, and the definition would be the See From and See Also listings. Off on an adventure, I downloaded subject authorities from FRED. [2] I used a shell script to loop through my data (subjects2dictd, attached) which employed XSLT to parse the MARCXML (subjects2dict.xsl, attached) and then ran various dict* utilities. The end result is a "dictionary" query-able with your favorite DICT client. From a Linux shell, try: dict -h 208.81.177.118 -d subjects -s substring blues While I think this is pretty kewl, I wonder whether or not DICT is the correct approach. Maybe I should use a more robust, full-text indexer for this problem? After all, DICT servers only look at the headword when searching, not the definitions. On the other hand DICT was *pretty* easy to get up an running, and authority lists are a type of dictionary. [1] http://www.dict.org [2] http://www.ibiblio.org/fred2.0/authorities/ -- Eric Lease Morgan University Libraries of Notre Dame subjects2dictd Description: Binary data subjects2dict.xsl Description: Binary data