On Feb 13, 2008 4:30 PM, Patrick Durusau <[EMAIL PROTECTED]> wrote: > Hello, > > When I first joined this list I had a question about the search > algorithm that was never quite answered. A problem with it has come up > again. > > I search for "Apple fruit" and got 18 "hits." > > To the immediate left I have a listing of relevant subjects, the first > one of which is "Apples." Followed by "Fruit trees", "Fruit", then > "Frontier and pioneer life" and then "Overland journeys to the Pacific". > > Oh, but it gets better. > > Guess what is returned if you select "Apples?" Well partner, it isn't > Dewey 583.73 Apples.
The subject sidebar entry links create new searches, so you're in effect broadening your search from "subject:apple fruit" to "subject:apples". > > No, it helpfully returns 568 "hits" which starts off with Apple > Computers, includes Appling Country census results and the tenth item is > an apple cookbook. Among other things, the search infrastructure will stem any unadorned terms that you enter, which turns "apples" into "apple". "Appling" becomes > > Does that strike anyone besides myself as rather odd behavior for a > search engine? Or perhaps I should say, a library search engine? > > Well, but opinions are going to vary on that score aren't they? > > My real question is: Where is the relevance behavior for Evergreen set > such that I can alter it? > That depends on the version. You were testing on the production PINES servers (it seems, as I replicated your searches and result counts there), which is currently on 1.2.1.2 (soon to be 1.2.1.3). There are weighting values that you can apply in 1.2 that control how much a particular searched field is worth. So, for instance, topical subjects could be weighted higher than corporate name subjects, which would make the Granny Smiths float to the top, above the ][e handbooks. > That gets us past all the normative questions and to one that is purely > technical. I want to *alter* the relevance behavior of Evergreen > searches. Where is that done? > There are many different things that can be done to change the way Evergreen performs searches. One could replace, or augment, the snoball stemmer that is used by default with a dictionary stemmer (or a non-stemming dictionary). One could turn off stemming altogether, and require exact word matches. In future versions (as the plan stands, 1.4 to some degree and 2.0 a larger degree) one will be able to adjust the relevancy bonuses given under certain circumstances. For instance, title searches give a higher rank when the searched words are in the same order in both the field and the query. Author searches give a large bonus when the first word of both the field and the query match exactly. Bonuses are given all around when phrases match. And, obviously, a normalized full-query-and-field match gets a very large bonus. One way to effectively turn off stemming today is to quote words and phrases, which forces a space and case-normalized direct match for the quotes sections of text in the query. Does that answer some of your questions? -- Mike Rylander | VP, Research and Design | Equinox Software, Inc. / The Evergreen Experts | phone: 1-877-OPEN-ILS (673-6457) | email: [EMAIL PROTECTED] | web: http://www.esilibrary.com
