AndrewTavis_WMDE added a comment.
Looking into this more, I'm as of now not sure how the original connection to
the Cognate extension data was made. I'm seeing no inputs from a source
database in the Wiktionary Cognate dashboard code. The server is loading in
data from the published datasets directory for the UI to be displayed, but
there's no indication of how it got there in the first place. For instance,
`mostPopularEntries.csv` appears once in the code where it's read in, but then
the expectation would be that there would be a step where it's also saved
there. Maybe there's a generation step that's not included in the Wiktionary
Cognate dashboard code that's on GitHub, which is what I have locally. There
might also be code that's on the server that's doing all this 🤔
I found the following Phab task talking about Cognate tables:
{https://phabricator.wikimedia.org/T162252}. This leads to the following
documentation: wikitech.wikimedia.org/wiki/WMDE/Cognate
<https://wikitech.wikimedia.org/wiki/WMDE/Cognate>. Based on this, I've ran the
following queries for some baseline exploration of Cognate data that's
available via MariaDB using wmfdata-python:
Queries ran with
----------------
df = wmf.mariadb.run(
commands=QUERY,
dbs="cognate_wiktionary",
use_x1=True, # connect to the given database on the ExtensionStorage
replica
)
All tables
----------
SHOW TABLES;
| Tables_in_cognate_wiktionary |
| ---------------------------- |
| cognate_pages |
| cognate_sites |
| cognate_titles |
|
cognate_pages
-------------
SELECT
*
FROM
cognate_pages
LIMIT
5
| cgpa_site | cgpa_namespace | cgpa_title |
| ----------- | -------------- | ------------ |
| 2.50397e+18 | 0 | -9.22337e+18 |
| 8.71187e+18 | 0 | -9.22337e+18 |
| 6.77301e+18 | 0 | -9.22337e+18 |
| 8.12084e+18 | 0 | -9.22337e+18 |
| 8.71187e+18 | 0 | -9.22337e+18 |
|
cognate_sites
-------------
SELECT
*
FROM
cognate_sites
LIMIT
5
| cgsi_key | cgsi_dbname | cgsi_interwiki |
| -------------------- | ------------- | -------------- |
| -9070280448546609211 | cawiktionary | ca |
| -8834749551276028540 | nahwiktionary | nah |
| -8821737830943167491 | kuwiktionary | ku |
| -8705824589415612322 | towiktionary | to |
| -8329989933404253437 | wowiktionary | wo |
|
cognate_titles
--------------
SELECT
*
FROM
cognate_titles
LIMIT
5
| cgti_raw | cgti_raw_key | cgti_normalized_key |
| --------- | -------------------- | -------------------- |
| выясняешь | -9223371534148352930 | -9223371534148352930 |
| అడుసు | -9223370618425054874 | -9223370618425054874 |
| skiftat | -9223370043901259262 | -9223370043901259262 |
| arreá | -9223369858987257508 | -9223369858987257508 |
| σιτικά | -9223369370128554895 | -9223369370128554895 |
|
These tables to me seem like where we'd be starting from in all of this. I'd
need to find someone who has a better idea of what's actually in these tables,
but at first glance we're looking at IDs that link Wiktionaries and the strings
that are within them. Queries across these tables could then be used to
recreate "I Miss You", "Compare" and "Most Popular".
> Side note: can we rename "I Miss You"... ? "Missing Entries" would be much
better, in my opinion.
TASK DETAIL
https://phabricator.wikimedia.org/T358254
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: AndrewTavis_WMDE
Cc: ECohen_WMDE, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE,
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz,
Michael, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting,
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]