Patrick, Finally had a chance to look at the NP-completeness keyword search. There are several reasons why you got the result set you did. As Jason said, np, or (for no paging, I guess) is in the 300 of both the Toulouse-Lautrec and the Jazz book and the word "complete" is 245 (title) of both. Both of these records are not good MARC records and I will merge or overlay tomorrow or the next day to get better ones. As Jason says, it is not really a valid record -- for example, the correct form for the 300 is 1 v. (unpaged). But n.p. is currently a valid abbreviation in the 260 field when there is no publisher on the piece. I don't know if the 260 is in the keyword index. I don't really think the 300 field should be.
The remaining two in the result set have NP in the 050 (Library of Congress Classification number) and "complete" in the 520 (summary, etc. of the work) I may need to merge these 2 records. I'll do a little more research on it tomorrow. Elaine PS Having the cataloger sanity check would be a good thing. But we would need to do it on a lot of catalogers, me included, first. J. Elaine Hardy Library Services Manager - Collections & Reference Georgia Public Library Service, A Unit of the University System of Georgia 1800 Century Place, Suite 150 Atlanta, Ga. 30345-4304 404.235-7128 404.235-7201, fax [EMAIL PROTECTED] www.georgialibraries.org -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Etheridge, Jason - Gmail Sent: Friday, September 14, 2007 5:34 PM To: [email protected] Subject: Re: [OPEN-ILS-DEV] Introduction and Question On 9/14/07, Patrick Durusau <[EMAIL PROTECTED]> wrote: > BTW, I am still curious about the "relevance" algorithm that returned > jazz music for the search term (without quotes) np-completeness. Or does > the system not react well to hyphens in names unless surrounded by > quotes? Not real sure why it would parse a hyphen but I have seen odder > things. (Noting that when I surrounded it with quotes "np-completeness" > I got zero hits, not jazz.) Hi Patrick, I believe when you quote a search term, it searches for that "exact" string, with no stemming or other interpretation. Without the quotes, I believe EG will strip out punctuation, so you'd basically be doing a search for np and completeness, or some stemmed variants. So your first hit there found a "np" in a 300 field, and "complete" in the 245. Hrmm, is that a valid record? For the cases where we do encounter messed up records, I imagine we could codify some cataloger sanity checking and not index certain things that look like garbage, but I don't think it'll ever be perfect. Here's a wiki document explaining some relevance ranking stuff, though I don't know if it's still accurate: http://open-ils.org/dokuwiki/doku.php?id=scratchpad:opac_demo The "metarecords" it talks about is the FRBR-like groupings you can get it if you choose Group Formats and Editions in the Advanced Search. > PS: One more question: Are there plans to add synonym support to further > confuse users with search results? ;-) I would think it would be an > advanced search option. I know they're planning multiple thesaurus support, but I think that might manifest in the "Did you mean/Are you looking for/spellcheck" feature (another kettle of fish that needs work), and/or in the authority-based sidebars. I can't imagine "loosening" search results just to inflate the number of hits. I'd rather get zero hits and then a lot of suggestions. -- Jason http://esilibrary.com/
