#494: WebSearch: drop hyphens for author searches
------------------------+---------------------
Reporter: simko | Owner: simko
Type: defect | Status: in_work
Priority: major | Milestone: v1.0
Component: WebSearch | Version:
Resolution: | Keywords: INSPIRE
------------------------+---------------------
Comment (by simko):
Analysis showed that:
With `CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES` being True,
which is Invenio default, names like //Campbell-Wilson, D// generated
hyphenated version for the word index but not for the phrase index,
and hyphen was removed only for the phrase index, which is good
behaviour.
With `CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES` being False,
which is INSPIRE default, the hyphenated version was removed from the
word index, giving no hits if users type `campbell-wilson` without
firstname.
So, actually, a patch to fix the Campbell-Wilson searching issue on
INSPIRE site is actually to //add// hyphens to the word queries,
contrary to what the ticket summary says.
Note 1: complete removal of hyphens in the author index may be
possible, but this is to be handled on the fuzzy indexing side as well
and it will have to wait until we have implicit quoting around author
searches terms in place, see ticket:113. Otherwise we cannot search
for `campbell wilson' as a single term.
Note 2: fuzzy tokenizer does not remove hyphens from values that don't
have any firstname value such as `Campbell-Wilson` so this term goes into
the phrase index as such. Though this is rare in the database, we should
probably remove the hyphen, IOW treat this input value as having an
implicit comma at the trailing end.
--
Ticket URL: <http://invenio-software.org/ticket/494#comment:3>
Invenio <http://invenio-software.org>