AndrewTavis_WMDE added a comment.

  Looking into this more, I'm as of now not sure how the original connection to 
the Cognate extension data was made. I'm seeing no inputs from a source 
database in the Wiktionary Cognate dashboard code. The server is loading in 
data from the published datasets directory for the UI to be displayed, but 
there's no indication of how it got there in the first place. For instance, 
`mostPopularEntries.csv` appears once in the code where it's read in, but then 
the expectation would be that there would be a step where it's also saved 
there. Maybe there's a generation step that's not included in the Wiktionary 
Cognate dashboard code that's on GitHub, which is what I have locally. There 
might also be code that's on the server that's doing all this 🤔
  
  I found the following Phab task talking about Cognate tables: 
{https://phabricator.wikimedia.org/T162252}. This leads to the following 
documentation: wikitech.wikimedia.org/wiki/WMDE/Cognate 
<https://wikitech.wikimedia.org/wiki/WMDE/Cognate>. Based on this, I've ran the 
following queries for some baseline exploration of Cognate data that's 
available via MariaDB using wmfdata-python:
  
  Queries ran with
  ----------------
  
    df = wmf.mariadb.run(
        commands=QUERY,
        dbs="cognate_wiktionary",
        use_x1=True,  # connect to the given database on the ExtensionStorage 
replica
    )
  
  
  
  All tables
  ----------
  
    SHOW TABLES;
  
  
  
  | Tables_in_cognate_wiktionary |
  | ---------------------------- |
  | cognate_pages                |
  | cognate_sites                |
  | cognate_titles               |
  |
  
  
  
  cognate_pages
  -------------
  
    SELECT
        *
    
    FROM
        cognate_pages
    
    LIMIT
        5
  
  
  
  | cgpa_site   | cgpa_namespace | cgpa_title   |
  | ----------- | -------------- | ------------ |
  | 2.50397e+18 | 0              | -9.22337e+18 |
  | 8.71187e+18 | 0              | -9.22337e+18 |
  | 6.77301e+18 | 0              | -9.22337e+18 |
  | 8.12084e+18 | 0              | -9.22337e+18 |
  | 8.71187e+18 | 0              | -9.22337e+18 |
  |
  
  
  
  cognate_sites
  -------------
  
    SELECT
        *
    
    FROM
        cognate_sites
    
    LIMIT
        5
  
  
  
  | cgsi_key             | cgsi_dbname   | cgsi_interwiki |
  | -------------------- | ------------- | -------------- |
  | -9070280448546609211 | cawiktionary  | ca             |
  | -8834749551276028540 | nahwiktionary | nah            |
  | -8821737830943167491 | kuwiktionary  | ku             |
  | -8705824589415612322 | towiktionary  | to             |
  | -8329989933404253437 | wowiktionary  | wo             |
  |
  
  
  
  cognate_titles
  --------------
  
    SELECT
        *
    
    FROM
        cognate_titles
    
    LIMIT
        5
  
  
  
  | cgti_raw  | cgti_raw_key         | cgti_normalized_key  |
  | --------- | -------------------- | -------------------- |
  | выясняешь | -9223371534148352930 | -9223371534148352930 |
  | అడుసు     | -9223370618425054874 | -9223370618425054874 |
  | skiftat   | -9223370043901259262 | -9223370043901259262 |
  | arreá     | -9223369858987257508 | -9223369858987257508 |
  | σιτικά    | -9223369370128554895 | -9223369370128554895 |
  |
  
  These tables to me seem like where we'd be starting from in all of this. I'd 
need to find someone who has a better idea of what's actually in these tables, 
but at first glance we're looking at IDs that link Wiktionaries and the strings 
that are within them. Queries across these tables could then be used to 
recreate "I Miss You", "Compare" and "Most Popular".
  
  > Side note: can we rename "I Miss You"... ? "Missing Entries" would be much 
better, in my opinion.

TASK DETAIL
  https://phabricator.wikimedia.org/T358254

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE
Cc: ECohen_WMDE, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Michael, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to