Le 20/12/2015 22:19, John Erling Blad a écrit :
I tried this on a search for "Sør-Aurdal" (a municipality in Norway),
dropped the dash and wrote "sørau" and got a hit on "Søraust-Svalbard
naturreservat" among other things. The topmost hit was "søraurdøl", which
is a denomyn for someone from Sør-Aurdal. It seems to me that a spelling
error is compensated with a fuzzy search for long(est?) words, but that
imply nearly completing the word if there is a spelling error.

Thank you, this is exactly the kind of feedback we were looking for when we deployed this feature as a beta feature.

In this case the first thing to note is that "søraurdøl" [1] is a redirect to "Sør-Aurdal" [2]. The completion suggester won't display multiple suggestions that have the same target page. Here it will receive internally both "søraurdøl" and "Sør-Aurdal" but because these pages are related to "Sør-Aurdal" it will have to decide which one to display and will choose "søraurdøl" because the query "sørau" is a perfect prefix hit. You can see when the algorithm will prefer "Sør-Aurdal" by continuing typing :
"søraud" => "søraurdøl" (still a perfect prefix)
"sørauda" => "Sør-Aurdal" (here both are not perfect prefix and thus will decide to display the canonical page "Sør-Aurdal")

There are many knobs we could adjust to display better suggestions. Here I can see two of them:

1. At index time the suggester will group redirects that are very similar to the canonical title: On enwiki the redirect "Albert Enstein" is grouped with its canonical page "Albert Einstein", "Albert Enstein" will never be proposed to the suggester and thus won't have to choose between "Albert Enstein" and "Albert Einstein". It will always display "Albert Einstein". This technique allows us to display proper suggestions even if the user types something very far like "alberensten". Here the suggester can take benefits from popular pages that have been manually curated by editors with common typos. Unfortunately such arbitrary decisions have also drawbacks, a counter example is "life a", on enwiki this query will suggest "Life insurance" instead of "life assurance" because the redirect "Life assurance" has been wrongly grouped with "Life insurance". This is not completely wrong, both suggestions will lead to the same page, but it's not perfect... So we could fix the "sørau" problem by increasing the tolerance of this "grouping step" but unfortunately we will increase the number of cases like "life assurance".

2. Change the decision at query time
We could also change the decision and always prefer canonical pages vs redirects even if the canonical page is not a perfect prefix hit. I'm not aware of a counter example here but since our ranking algorithm is far from perfect we preferred to choose perfect prefix hits for now. In the coming months we should be able to include pageviews statistics in the formula, we hope to see positive improvements with such metrics and will hopefully allow us to review this decision.

As you can see the suggester will make arbitrary decisions (sometimes hazardous) that could be wrong and this is the whole purpose of having this feature in beta. Depending on feedback like yours we may review and adjust various parameters in the algorithm.

Thank you!

David.

[1] (Omdirigert fra Søraurdøl): https://no.wikipedia.org/w/index.php?title=S%C3%B8raurd%C3%B8l&redirect=no [2] https://no.wikipedia.org/w/api.php?action=query&list=backlinks&bltitle=S%C3%B8r-Aurdal&blfilterredir=redirects
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to