Hi all,

Since going live with tpac in late May, two of the MassLNC consortia have been discovering problems with some of the operators on the advanced search page. In both cases, they are working differently than they did in jspac, and we believe they lead to unexpected search results. I wanted to share our experience with the rest of the community to see if there is any interest in changing the way these operators work.

The first problem is with the "does not contain" option. If I do an advanced search for contains "martin luther" and does not contain "king," the search is conducted as (martin luther && keyword:-"king"). In jspac, a similar search would not have surrounded king in quotation marks. For Evergreen systems using the default indexes, this change doesn't seem to cause much of a problem. However, there are many systems that have added indexes to the keyword search so that an index like proper title can be weighted more heavily in the relevance ranking than other indexes. In those systems, the addition of the quotation marks to this search query does a terrible job of excluding a search term from the query. As an example, see the following "does not contain" search where we try to exclude the term "king" from the search:

http://bark.cwmars.org/eg/opac/results?bool=and&qtype=keyword&contains=contains&query=martin+luther&bool=and&qtype=keyword&contains=nocontains&query=king&bool=and&qtype=keyword&contains=contains&query=&sort=&locg=1&pubdate=is&date1=&date2=&_adv=1

The same search with the quotation marks removed shows a big improvement:

http://bark.cwmars.org/eg/opac/results?fi%3Aitem_type=&query=%28martin+luther+%26%26+keyword%3A-king%29&qtype=keyword&locg=1&_adv=1&page=0&sort=

I filed a bug on this issue several weeks ago at https://bugs.launchpad.net/evergreen/+bug/1019360, and Mike Rylander suggested that I send out a message to the general list to see if there is any objection to removing the quotation marks from the query when the "does not contain" option is used.

We have also encountered problems with the "Matches Exactly" option. In jspac, this option surrounded the search terms in quotation marks, essentially making it a phrase search. In tpac, there is now a "contains phrase" search that does the same thing. The "matches exactly" option now uses left- and right-anchored searching so that a "matches exactly" search for "great expectations" will conduct the search as ^great expectations$. In our testing, we have found that this search string yields the same number of search results as a simple "contains" search. The "Matches Exactly" search isn't really doing anything special for this search.

After asking some questions in IRC, Dan Scott suggested that surrounding the search query in quotation marks may be more successful - "^great expectations$" does indeed lead to expected results. However, this option for exact matches is very strict. To find the record for The Assistant by Robert Walser - http://bark.cwmars.org/eg/opac/record/2451793 - we needed to include the forward slash in the title search, so that the final search statement was "^the assistant /$" .

In this case, I was inclined to recommend that the quotation marks be added to the search string, but there is another inherent problem with the "Matches Exactly" search. For a system using the default indexes, I don't see how a "Matches Exactly" search could ever successfully yield results from a keyword search. If I'm understanding it correctly (and my testing has verified this understanding), this search string must exactly match the entire string in an index. In the default setup, the keyword index includes every indexed term from the record, and a user would never enter all of those search terms in the correct order. Ironically, this search query does have some success in our own catalogs for precisely the same reason that the "does not contain" search failed. We have other indexes included as part of our keyword search, and there is a possibility that one of those indexes will contain the exact terms being searched.

Given this information, we are considering removal of the "Matches Exactly" search locally since it isn't working in its current iteration and will continue to result in unexpected behavior if it were changed to include the quotation marks. However, I also wanted to send this information along to the community since it will most likely lead to unexpected results elsewhere. I'll also be filing a Launchpad bug with this information shortly.

In our discussions, we were thinking a "Starts With" search that left anchors the search term (e.g. "^the assistant") might be more useful than the "Matches Exactly" search since a user would not need to remember subtitles or include forward slashes. In some initial testing, it also seems to work better in a keyword search. I may be trying a local implementation of this and will share my results if I have any luck.

Kathy

--
Kathy Lussier
Project Coordinator
Massachusetts Library Network Cooperative
(508) 343-0128
[email protected]
Twitter: http://www.twitter.com/kmlussier

Reply via email to