TJones renamed this task from "8dcaaaaaaa" to "Lexical Data user scenario: Exporting Lexical Data to create morphology and stemming for search engines".
TJones raised the priority of this task from "High" to "Needs Triage".
TJones added a subscriber: Aklapper.
TJones removed projects: TCB-Team, Mail, New-Editor-Experiences, Language-2018-Apr-June, KartoEditor, JADE, Hashtags, Gamepress, Tamil-Sites, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), CheckUser.
TJones updated the task description. (Show Details)

CHANGES TO TASK DESCRIPTION
26570726f6475636520796f757220627567207573696e67206120726563656e742076657273696f6e206f662074686520736f6674776172652c20746f2068652077696b6920636f6e74656e74206c616e67756167652e0a0a5468616e6b20796f752e0a546167730a436865636b557365720ad70a436f6e6e65637465642d4f70656e2d48657269746167652d42617463682d75706c6f61647320285241c42d4b4d425f315f323031372d3032290ad70a54616d696c2d53697465730ad70a47616d6570726573730ad70a48617368746167730ad70a4a4144450ad70a4b6172746f456469746f720ad70a4c616e67756167652d323031382d4170722d4a756e650ad70a4e65772d456469746f722d457870657269656e6365730ad70a4d61696c0ad70a5443422d5465616d0ad70a53756273637269626572730a4465736372697074696f6e20507265766965770a436f6e74656e77a6f6e652073657474696e6720696e20796f75722070726f66696c652c20636c69636b20746f207265636f6e63696c652eThe English language has very simple morphology, and this makes it relatively easy to build search engines that can find different forms of a word with no effort from the end user.

Many other languages have a complex morphology with declinations, conjugations, clitics, agglutination, etc. Some search engines can plug in morphology and stemming support for particular languages. Support for each language must be developed and maintained independently.

When Wikidata's Lexical Data is able to create all declined forms of a word, the output can be reused in both ways to build stemming engines: To find the base form (or forms) of a word from a declined form, and to find the declined forms from a base form. Wikibase should provide APIs that make such usage as easy as possible.

Notes:
- Like other subtasks of T186421, this is not a particular bug, but an idea for how Lexical Data can be useful in the long term. I am filing it in the hope that knowing the possible user scenarios will be useful to Wikibase developers when they are making decisions about developing the infrastructure, and to Wikidata community members when they are proposing properties, developing bots, and so on.
- This is comparable to T186429 and T186420, but for search engines.
- I'm subscribing @TJones and @Smalyshev, who know far more about stemming engines than I do.

TASK DETAIL
https://phabricator.wikimedia.org/T195447

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: TJones
Cc: Aklapper, Amire80, Smalyshev, TJones, Mringgaard, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Darkdadaah, Mbch331, AndyTan, Zylc, 1978Gage2001, herron, Chicocvenancio, alanajjar, Tbscho, Lea_WMDE, Mattias_Ostmar-WMSE, JJMC89, Jseddon, Ryuch, Mkdw, RuyP, JEumerus, Trizek-WMF, KasiaWMDE, 0x010C, srodlund, Luke081515, grin, Bsadowski1, mys_721tx, Snowolf, Huji, Gryllida, jayvdb, Tobi_WMDE_SW, revi, scfc, He7d3r, Romaine, Jay8g, Glaisher, Krenair, chasemp
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to