#494: WebSearch: drop hyphens for author searches
------------------------+---------------------
  Reporter:  simko      |      Owner:  simko
      Type:  defect     |     Status:  in_work
  Priority:  major      |  Milestone:  v1.0
 Component:  WebSearch  |    Version:
Resolution:             |   Keywords:  INSPIRE
------------------------+---------------------

Comment (by simko):

 Analysis showed that:

 With `CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES` being True,
 which is Invenio default, names like //Campbell-Wilson, D// generated
 hyphenated version for the word index but not for the phrase index,
 and hyphen was removed only for the phrase index, which is good
 behaviour.

 With `CFG_BIBINDEX_AUTHOR_WORD_INDEX_EXCLUDE_FIRST_NAMES` being False,
 which is INSPIRE default, the hyphenated version was removed from the
 word index, giving no hits if users type `campbell-wilson` without
 firstname.

 So, actually, a patch to fix the Campbell-Wilson searching issue on
 INSPIRE site is actually to //add// hyphens to the word queries,
 contrary to what the ticket summary says.

 Note 1: complete removal of hyphens in the author index may be
 possible, but this is to be handled on the fuzzy indexing side as well
 and it will have to wait until we have implicit quoting around author
 searches terms in place, see ticket:113.  Otherwise we cannot search
 for `campbell wilson' as a single term.

 Note 2: fuzzy tokenizer does not remove hyphens from values that don't
 have any firstname value such as `Campbell-Wilson` so this term goes into
 the phrase index as such.  Though this is rare in the database, we should
 probably remove the hyphen, IOW treat this input value as having an
 implicit comma at the trailing end.

-- 
Ticket URL: <http://invenio-software.org/ticket/494#comment:3>
Invenio <http://invenio-software.org>

Reply via email to