Manuel created this task.
Manuel added projects: Wikidata, Epic, Wikidata Analytics (Kanban).
TASK DESCRIPTION
As a Wiktionary user, I want to know what are the most common words
("entries") that are missing from a specific Wiktionary project.
Scope
-----
- Identify the original CSV for the "I miss you ..." table in
https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wiktionary/
- (Re)create a data process that generates the table daily (daily for now so
that we can evaluate the resource investment and usage)
- Some entries need to be filtered out ("Main_Page" and "main_Page")
Context
-------
**Wiktionaries** describe words coming from their own languages as well as
other languages. Pages on Wiktionaries are called "entries". Example: en:tree
<https://en.wiktionary.org/wiki/pain>.
The **Cognate extension** provides automatic links between two pages of
different language versions of Wiktionary that have the same title (including a
few normalization rules). So for example, fr:tree
<https://fr.wiktionary.org/wiki/tree> and en:tree
<https://en.wiktionary.org/wiki/tree>. These links then show up as automatic
interwikilinks.
There was also a **Wiktionary Cognate dashboard** that helped the community
analyze the data of the extension.
This community tool included an **"I miss you..." table/dashboard**.
- The users could select a particular Wiktionary from a drop-down menu. A
table then showed a table encompassing the top 1,000 enties (page titles) found
in other Wiktionaries that are absent from the selected project.
- The idea was to give to the editors of a language version, some ideas on
what new pages to create on their home wiki. So, if someone is editing French
Wiktionary, they would be interested in the words (whatever the language), that
already have a page on many other Wiktionaries, but not the French one. That's
probably the most interesting/useful pages to create. That's why users want a
list of the entries that already exist in a lot of languages, but not theirs.
- The data was originally updated every 6 hours.
https://meta.wikimedia.org/wiki/Wiktionary_Cognate_Dashboard#I_Miss_You_tab
This is just for context, this task ist only about implementing the data
process to create public CSVs.
Notes
-----
- Some tech details of the original work was documented in this task:
{T166487#4425588 <https://phabricator.wikimedia.org/T166487#4425588>}
Acceptance criteria
-------------------
[ ] We know which CSV was the source for the "I miss you ..." table
[ ] A data process is generating the respective CSV daily
[ ] Some entries are filtered out ("Main_Page" and "main_Page")
[ ] The CSV is published in
https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wiktionary/
again
TASK DETAIL
https://phabricator.wikimedia.org/T360296
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Manuel
Cc: Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart,
Manuel, me, Danny_Benjafield_WMDE, Astuthiodit_1, BeautifulBold, Suran38,
karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE,
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting,
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000,
Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]