[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hey @brouberol 👋 Just getting back from two weeks off today :) I'll check into this and get back to you all! Thanks for the ping! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: June 2024)

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "[Analytics] Monthly repeating tasks (next: May 2024)" to "[Analytics] Monthly repeating tasks (next: June 2024)". AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: June 2024)

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Sheet updated with the numbers for April. Higher number of user agents, but lower IPs (but then IPs still much higher than Feb). TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T364965: stat1007 migration output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata, Wikidata Dev Team. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Context --- Recently WMF has been migrating from legacy stat servers that are being

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "stat1007 migration output check" to "stat1007 to stat1011 migration pipeline output check". TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/e

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benja

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Quick note that the word used by @BTullis was `disabled` instead of `removed` for the stat1007 timers, so apologies if this caused some confusion. I figure not, but just wanted to be clear :) @BTullis, would you be able to check the journal for them and

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Confirming that data's still coming in as well. @BTullis, what should we do about statistics/manifests/wmde/wdcm.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/wdcm.pp>? Remove the file? An

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-16 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that MR#700 <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/700> has been opened that has the work for this :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-05-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that MR#700 <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/700> has been opened that has the work for this :) TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. ⚠️ Currently WIP ⚠️ === Going through the files sent by @JAllemandou above <https://phabricator.wikimedia.org/T358311#9648470>. This message will be saved as I go so that I don't loose my progress 😊 If I do find some

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Making this task as a means of saving that there is still work to be done to close out the Purdue Data Mine program

[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365457 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hey @Arian_Bozorg 👋 Yes, we do still need to check this out. I was thinking that @Lucas_Werkmeister_WMDE and I could discuss this when we chat about what else is needed in T364965: stat1007 to stat1011 migration pipeline output check <ht

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Ah looking at this, I'm realizing I restated myself as the work that's left in T364965: stat1007 to stat1011 migration pipeline output check <https://phabricator.wikimedia.org/T364965> is a duplicate of what we want to do here :) TAS

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. So basically removing the wdcm.pp related file on GitHub and its Puppet workflows will close both tasks :) TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Ok then! So the checks of the files above is complete as shown by its status. General summaries of each stat machine and HDFS are provided under the subsections above. `stat1005` has some files that @Manuel may find interesting given that they'r

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @Manuel - sending along a summary of what I'll be getting for you: == stat1004 == Jul 25 2020 Analytics Jun 23 2020 Experiments Jul 25 2020 wdUsagePerPage == stat1005 == All non data

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @Manuel, checking further as it's still not clear what you'd like. The double except is confusing. I'll only transfer files from `stat1005`, and could you answer the following questions: 1. Do you want **data files** (.csv, .tsv, etc)

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. I'm realizing also that I don't have admin rights and thus can't move files to your directory. I'll copy these files over to my directory, download them and send you a link to a zipped directory on Google Drive once we have the abov

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Thank you, @BTullis! Ya I wasn't happy with the solution either. Appreciate your willingness to help! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. @BTullis, checking in on this as your help in T358311 <https://phabricator.wikimedia.org/T358311> reminded me as it's all related to the same user. Would you be able to remove the `statistics/manifests/wmde/wdcm.pp` file and any related processes

[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365457 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Thanks for taking care of this, @Lucas_Werkmeister_WMDE! We'll be able to close both this and T351072 <https://phabricator.wikimedia.org/T351072> after Tuesday next week if/when the Puppet change is deployed :) TASK DET

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, D

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WM

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-05-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-05-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. I've been asking around about the data source and connecting the tables and have yet to get concrete answers. Based on general assumptions of the names of the tables/columns though, the path forward for getting missing entries for a Wiktionary will

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. None of the files listed in your comment above <https://phabricator.wikimedia.org/T364965#9838579> look like things we should worry about, @Lucas_Werkmeister_WMDE. Similarly that there's a different commit for this, as to my knowledge `stat10

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Moving this to verification given the work in T364965 <https://phabricator.wikimedia.org/T364965>. Thanks for all of this, @Lucas_Werkmeister_WMDE! Maybe we can resolve this and leave T364965 <https://phabricator.wikimedia.org/T364965> until `

[Wikidata-bugs] [Maniphest] T321666: Wiktionary Cognate Dashboard is not accessible [timeboxed 0.5 days]

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @Bicolino34 👋 Thanks for reaching out :) We are still working on tasks related to this dashboard - at least bringing back some of the data processes. TASK DETAIL https://phabricator.wikimedia.org/T321666 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE closed this task as "Resolved". AndrewTavis_WMDE claimed this task. AndrewTavis_WMDE added a comment. Sounds good to me! :) Thanks for the help here, @Lucas_Werkmeister_WMDE and @BTullis! TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Perfect, @Lucas_Werkmeister_WMDE! Glad to have this all cleared up :) TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE closed subtask T351072: Remove the WDCM clone (stats1007) as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMD

[Wikidata-bugs] [Maniphest] T351070: [EPIC] Clean up Wikidata Grafana cronjobs

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE closed subtask T351072: Remove the WDCM clone (stats1007) as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T351070 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Micha

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE closed this task as "Resolved". AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg, karapayneWMDE

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: July 2024)

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "[Analytics] Monthly repeating tasks (next: June 2024)" to "[Analytics] Monthly repeating tasks (next: July 2024)". AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: July 2024)

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Table has been updated with the new data from the most recent DAG run. Lots more user agents - almost a 3x increase. Noting this for now as maybe grounds for further investigation later, but IPs are also increasing (just not by as much). Note that we

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. wmde/analytics/hql/airflow_jobs/wiktionary_cognate <https://gitlab.wikimedia.org/repos/wmde/analytics/-/tree/main/hql/airflow_jobs/wiktionary_cognate?ref_type=heads> on GitLab now has all the needed queries to for missing entries, most popular entri

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-06-04 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Thanks so much for the support here, @BTullis! I'll update the epic <https://phabricator.wikimedia.org/T356618> with this being done. So close to being finished with all this :) TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-06-04 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-06-04 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-04 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. There's now a draft for the DAGs <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/725/diffs#96f15bf21ce9c18b6638c53402e35a2654aeeff6> open on GitLab. There's still lots to do as WMF wants to sync on suggestio

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pampu

[Wikidata-bugs] [Maniphest] T332899: [EPIC] Migrate selected R-based Wikidata products

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the status of subtask T360296: [Analytics] Implement data process to identify missing Wiktionary entries from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T332899 EMAIL PREFERENCES https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Quick note on this, in discussion, something to check as well would be those user agents that were present in May 2024, but were not active in April 2024 :) TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note, work that will unblock this task is being done in T364045: [Bug?] Can't find wikidatawiki on wmf.mediawiki_wikitext_history <https://phabricator.wikimedia.org/T364045>. TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFEREN

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @MarcoSwart, sorry for changing the status without explanation. Was in a meeting and we were moving things around, but obviously context should have been added. This is stalled for now as we're waiting for WMF to advise us on the best way forwa

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WM

[Wikidata-bugs] [Maniphest] T343019: [EPIC] Segments of Wikidata's data over time [up to milestone 3]

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the status of subtask T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T343019 EMAIL PREFERENCES https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Unstalled as the table has been created :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, D

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Unstalled as the plan for the data export has been approved in T365699 <https://phabricator.wikimedia.org/T365699> :) TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benja

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Status is open as T364045 <https://phabricator.wikimedia.org/T364045> has been resolved :) TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavi

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Checking on the numbers here really quick: the request is for the top `1000` user agents and then a sample of `1000` user agents, but the total is `1221`. Would an ordered list of all of them make more sense as we're talking a sample of 82%? There r

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Base queries for all of this are ready :) Let me know on the above and I'll finalize them. Re how to send the files: my suggestion would be that I put them into my `stat1010` and then @Manuel can migrate them to his. From there I'll delete my

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. > Would it be possible to send us a spreadsheet (and schedule it for deletion after 90 days)? I'd prefer to transfer via the servers if possible given the comment here <https://phabricator.wikimedia.org/T358311#9820450> from WMF Engineer

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. I can also prepare a notebook with quick functions to load and explore the data, if that would make the option I suggested a bit easier. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Talked further with WMF about this just now. One basic question for the end users: would it make it more convenient for you all if the exported datasets were per Wiktionary? There are two options here, with missing entries being used as an example: 1

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. @Manuel, my assumption was that you could help any non-analytics PMs or go through the results with them as you have the needed access. Using Google for PII is not something we're supposed to do if it can be avoided, but I have no experience with

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @MarcoSwart 👋 Thanks for the communication here :) I guess I'm a bit confused by how the other one would be used. You're roughly talking about: | word_that_is_missing_from_a_wiktionary | number_of_wiktionaries_that_do_have_it | | MOST_MI

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE moved this task from In progress to Product verification on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment. @Manuel and @Lydia_Pintscher, just shared a folder with the two CSVs on Wolke. Let me know if there's anything else needed, and I will

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-06-12 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-06-12 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a project: Epic. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, me, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T332899: [EPIC] Save our R-based Wikidata products

2023-04-20 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T332899 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel, AndrewTavis_WMDE Cc: Pamputt, Aklapper, Manuel, Astuthiodit_1, karapayneWMDE, Invadibot

[Wikidata-bugs] [Maniphest] T332899: [EPIC] Save our R-based Wikidata products

2023-04-20 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T332899 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel, AndrewTavis_WMDE Cc: Pamputt, Aklapper, Manuel, Astuthiodit_1, karapayneWMDE, Invadibot

[Wikidata-bugs] [Maniphest] T332899: [EPIC] Save our R-based Wikidata products

2023-04-20 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T332899 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel, AndrewTavis_WMDE Cc: Pamputt, Aklapper, Manuel, Astuthiodit_1, karapayneWMDE, Invadibot

[Wikidata-bugs] [Maniphest] T332899: [EPIC] Save our R-based Wikidata products

2023-04-20 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Added a link to the task description pointing people to the following doc where I've been documenting the code base: https://docs.google.com/document/d/1aYWcStCZySbKTKbHdFzb4n9ZBpBntIbSZsnd99nriuc/edit?usp=sharing TASK DETAIL

[Wikidata-bugs] [Maniphest] T334558: [Analytics] Unique user-agents accessing Wikidata's REST API

2023-05-02 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. @Manuel, in reference to the following open question: > Do we need more detail than months or not? Who are the stakeholders beyond you that we'd need to check with for the proper gradation? Let's also discuss the following in on

[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-05-09 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. @Manuel can give a more up to date rundown of our plans for all this. I'll be working on the migration with him :) TASK DETAIL https://phabricator.wikimedia.org/T334951 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailprefer

[Wikidata-bugs] [Maniphest] T334951: Wikidata Concepts Monitor ETL Migration to Spark3

2023-05-09 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Thanks all for your willingness to help! We'll be in touch in June once we've had the initial meetings with @ItamarWMDE. Those are planned for the 13th and 14th :) TASK DETAIL https://phabricator.wikimedia.org/T334951 EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] T321666: Wiktionary Cognate Dashboard is not accessible [timeboxed 0.5 days]

2023-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Thank you, @VIGNERON! Appreciate getting pinged here so I can get an overview 😊 We (@Manuel, @ItamarWMDE and I) will be discussing the state of the R dashboards in early June. We'll be able to give a better estimate on this and other related tasks at

[Wikidata-bugs] [Maniphest] T336426: [EPIC] Automate quarterly reporting of defined Wikidata metrics [up to milestone 2]

2023-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a subtask: T337245: Wikidata Quarterly Reporting Milestone 2 Aggregations. TASK DETAIL https://phabricator.wikimedia.org/T336426 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel, AndrewTavis_WMDE Cc: AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T336426: [EPIC] Automate quarterly reporting of defined Wikidata metrics [up to milestone 2]

2023-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T336426 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel, AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Astuthiodit_1, BeautifulBold

[Wikidata-bugs] [Maniphest] T336542: Explore potential solutions for analytics notebooks

2023-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T336542 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja

[Wikidata-bugs] [Maniphest] T336542: Explore potential solutions for analytics notebooks

2023-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that T336282 <https://phabricator.wikimedia.org/T336282> is related to this as we were investigating adding Git integrations for notebooks :) TASK DETAIL https://phabricator.wikimedia.org/T336542 EMAIL PREFERENCES https://phabricator.wikimed

[Wikidata-bugs] [Maniphest] T336542: Explore potential solutions for analytics notebooks

2023-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T336542 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja

[Wikidata-bugs] [Maniphest] T336542: Explore potential solutions for analytics notebooks

2023-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T336542 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja

[Wikidata-bugs] [Maniphest] T336542: Explore potential solutions for analytics notebooks

2023-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T336542 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja

[Wikidata-bugs] [Maniphest] T336542: Explore potential solutions for analytics notebooks

2023-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T336542 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja

[Wikidata-bugs] [Maniphest] T336542: Explore potential solutions for analytics notebooks

2023-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T336542 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja

[Wikidata-bugs] [Maniphest] T336542: Explore potential solutions for analytics notebooks

2023-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. @Manuel, quick update that PAWS also allows for the HTTPS Git version control. Expected that it would, but I did successfully test it just now. I've updated the task with the findings so far :) TASK DETAIL https://phabricator.wikimedia.org/T336542

[Wikidata-bugs] [Maniphest] T336426: [EPIC] Automate quarterly reporting of defined Wikidata metrics [up to milestone 2]

2023-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T336426 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel, AndrewTavis_WMDE Cc: Lydia_Pintscher, AndrewTavis_WMDE, Manuel, Aklapper, Astuthiodit_1

  1   2   3   4   5   6   7   >