Thadguidry created this task.
Thadguidry added projects: Wikidata, CirrusSearch, Elasticsearch.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Discovery-Search.

TASK DESCRIPTION
  (Lydia asked that I write this up, just in case)
  
  I thought that "," comma was already added to the Elasticsearch standard 
tokenizer and would be excluded from simple search?
  But it seems that there is some overriding decision to have the default 
config this way on Wikidata?  Perhaps the word_delimiter is being used and 
incorrectly?
  
  > Avoid using the word_delimiter filter with tokenizers that remove 
punctuation, such as the standard tokenizer. This could prevent the 
word_delimiter filter from splitting tokens correctly. It can also interfere 
with the filter’s configurable parameters, such as catenate_all or 
preserve_original. We recommend using the keyword or whitespace tokenizer 
instead.
  
  Below as seen in my screenshot, I was looking for entities that contained all 
3 words, but it seemed if I DID NOT include the comma, then the entity was not 
found.
  The only way that it was displayed was if I did include the comma.
  
  F34615713: search_dropdown_screenshot.png 
<https://phabricator.wikimedia.org/F34615713>
  
  I noticed that the string "foot locker inc" will not show the entity in the 
dropdown, but only "foot locker, inc." which includes the comma?
  Exact match should only happen by default if a user wraps in double quotes, 
such as
  
    "Foot Locker, Inc."
  
  where in my example screenshot I am not doing that, so my expectation was 
that any U+002C comma in the search string would not be included in the search 
query.
  (On that entity, I have since added the full legal name into the alias field 
to help improve searchability, but still would like to know the decision on why 
U+002C comma is not being excluded)
  
  Why was U+002C comma decided to be included in simple search?
  Must use the Advanced Search on Wikidata or the API if we want to actually do 
simple searches that are not exact match phrases?  This would seem 
counter-intuitive and the reverse of most users expectations.

TASK DETAIL
  https://phabricator.wikimedia.org/T289428

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Thadguidry
Cc: Aklapper, Thadguidry, Invadibot, MPhamWMF, maantietaja, Wilmanbeno, CBogen, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Gryllida, 
jayvdb, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to