#915: WebSearch: use index-time word breaking information during seach time as
well
-------------------------+-----------------
Reporter: simko | Owner:
Type: enhancement | Status: new
Priority: major | Milestone:
Component: WebSearch | Version:
Keywords: |
-------------------------+-----------------
In demo site, when searching for "spectrum.", one gets a warning phrase:
{{{
No exact match found for spectrum., using spectrum instead...
}}}
followed by two hits.
Considering that dot is stripped away from indexed terms at the index
time, see CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS and
CFG_BIBINDEX_CHARS_PUNCTUATION and friends, it should not be necessary for
the search engine to look for the dotted version at the search time.
The purpose of this ticket is to take advantage of
CFG_BIBINDEX_CHARS_PUNCTUATION and friends also during search time. I.e.
if a character is stripped away during indexing-time, then strip it away
also during search-time, when looking for words. (Not for phrases or
regexps.) We can amend search_unit_in_bibwords to this effect so that
incoming terms to look for will be washed similarly as during the indexing
process.
Note that this may concern stemming and stopwords and such, but we have
another ticket to take care of centralising indexing configurations, so
further improvements could be dealt with there. See ticket:852.
--
Ticket URL: <http://invenio-software.org/ticket/915>
Invenio <http://invenio-software.org>