Smalyshev updated the task description. (Show Details)

CHANGES TO TASK DESCRIPTION
After talking with @dcausse, we decided that having two custom analyzers set up (stemmed & non-stemmed one) for every language in descriptions is wasteful, since not all of them are useful for Wikibase use case. We'd want to only make stemmed ones for those languages, and use the plain (non-stemmed) analyzer for others.

Here is the list of languages for which we have "non-trivial" configuration for stemming (`text`) analyzer:

```
ar
bg
ca
ckb
cs
da
de
el
en
en-ca
en-gb
es
eu
fa
fi
fr
ga
gl
hi
hu
hy
id
it
ja
ko
lt
lv
nb
nl
nn
pt
pt-br
ro
ru
simple
sv
th
tr
```

That includes having named analyzer types (e.g. 'bulgarian') and specialized filters or tokenizers.

Note that we are only concerned about whether the `text` analyzer we have will have additional value as compared to `plain` analyzer, since we're keeping `plain` one anyway, and only in the context of common Wikibase/Wikidata usage on descriptions.

TASK DETAIL
https://phabricator.wikimedia.org/T180169

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: TJones, Aklapper, EBernhardson, Lydia_Pintscher, hoo, aude, Smalyshev, dcausse, Lahi, GoranSMilovanovic, QZanden, EBjune, Avner, debt, Gehel, Jdrewniak, FloNight, Wikidata-bugs, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to