[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WM

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, D

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Thanks for taking care of this, @Lucas_Werkmeister_WMDE! We'll be able to close both this and T351072 <https://phabricator.wikimedia.org/T351072> after Tuesday next week if/when the Puppet change is deployed :) TASK DETAIL

[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365457 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. @BTullis, checking in on this as your help in T358311 <https://phabricator.wikimedia.org/T358311> reminded me as it's all related to the same user. Would you be able to remove the `statistics/manifests/wmde/wdcm.pp` file and any related processes (inc

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Thank you, @BTullis! Ya I wasn't happy with the solution either. Appreciate your willingness to help! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. I'm realizing also that I don't have admin rights and thus can't move files to your directory. I'll copy these files over to my directory, download them and send you a link to a zipped directory on Google Drive once we have the above figured out. TASK

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @Manuel, checking further as it's still not clear what you'd like. The double except is confusing. I'll only transfer files from `stat1005`, and could you answer the following questions: 1. Do you want **data files** (.csv, .tsv, etc) __before 2020__

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @Manuel - sending along a summary of what I'll be getting for you: == stat1004 == Jul 25 2020 Analytics Jun 23 2020 Experiments Jul 25 2020 wdUsagePerPage == stat1005 == All non data files

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Ok then! So the checks of the files above is complete as shown by its status. General summaries of each stat machine and HDFS are provided under the subsections above. `stat1005` has some files that @Manuel may find interesting given that they're

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. So basically removing the wdcm.pp related file on GitHub and its Puppet workflows will close both tasks :) TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Ah looking at this, I'm realizing I restated myself as the work that's left in T364965: stat1007 to stat1011 migration pipeline output check <https://phabricator.wikimedia.org/T364965> is a duplicate of what we want to do here :) TASK DETAIL

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hey @Arian_Bozorg  Yes, we do still need to check this out. I was thinking that @Lucas_Werkmeister_WMDE and I could discuss this when we chat about what else is needed in T364965: stat1007 to stat1011 migration pipeline output check <ht

[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365457 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Making this task as a means of saving that there is still work to be done to close out the Purdue Data Mine program

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. ⚠️ Currently WIP ⚠️ === Going through the files sent by @JAllemandou above <https://phabricator.wikimedia.org/T358311#9648470>. This message will be saved as I go so that I don't loose my progress  If I do find something

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-05-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that MR#700 <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/700> has been opened that has the work for this :) TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that MR#700 <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/700> has been opened that has the work for this :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-16 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Confirming that data's still coming in as well. @BTullis, what should we do about statistics/manifests/wmde/wdcm.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/wdcm.pp>? Remove the file? And cou

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Quick note that the word used by @BTullis was `disabled` instead of `removed` for the stat1007 timers, so apologies if this caused some confusion. I figure not, but just wanted to be clear :) @BTullis, would you be able to check the journal for them

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benja

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "stat1007 migration output check" to "stat1007 to stat1011 migration pipeline output check". TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/pa

[Wikidata-bugs] [Maniphest] T364965: stat1007 migration output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata, Wikidata Dev Team. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Context --- Recently WMF has been migrating from legacy stat servers that are being

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: June 2024)

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Sheet updated with the numbers for April. Higher number of user agents, but lower IPs (but then IPs still much higher than Feb). TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: June 2024)

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "[Analytics] Monthly repeating tasks (next: May 2024)" to "[Analytics] Monthly repeating tasks (next: June 2024)". AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hey @brouberol  Just getting back from two weeks off today :) I'll check into this and get back to you all! Thanks for the ping! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "Generate historical weekly segments of Wikidata item sitelinks segmentations" to "Generate historical weekly segments of Wikidata item sitelink segmentations". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelinks segmentations

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "Generate weekly historical segments of Wikidata item sitelinks segmentations" to "Generate historical weekly segments of Wikidata item sitelinks segmentations". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T363583: Generate weekly historical segments of Wikidata item sitelinks segmentations

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata, Wikidata Analytics (Kanban). Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Purpose --- In T362849: [Analytics] Segments of Wikidata's data over time <ht

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. See T362849_wd_item_sitelink_segments.ipynb <https://gitlab.wikimedia.org/repos/wmde/analytics/-/blob/main/tasks/wikidata/2024/T362849_wd_item_sitelink_segments/T362849_wd_item_sitelink_segments.ipynb?ref_type=heads> for the work to derive the se

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Ok, so the new numbers after the change in scope for the max `2024-04-15` snapshot are: items_with_sitelinks: 32,231,861 items_items_with_sitelinks_link_to: 2,980,388 all_other_items: 72,910,679 For documentation, the numbers

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Moved this to `In progress` as I'm adding the job to export everything to the published datasets folder to the DAG as I work on the same for T362849 <https://phabricator.wikimedia.org/T362849>. TASK DETAIL https://phabricator.wikimedia.org/T361203

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-25 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. See {https://phabricator.wikimedia.org/T363451} for the task about bringing back the partition (hopefully via another job). I added a bit about whether we want to maybe turn this job on when WMDE needs historical data. Let me know what you all think

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Another note on this is: if we don't expect to be needing a Wikidata partition of `wmf.mediawiki_wikitext_history` for other tasks, then we could work directly from the XML dump for the data backdate. We wouldn't be able to leverage PySpark for the querying

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a subscriber: JAllemandou. AndrewTavis_WMDE added a comment. Thanks for all of the information, @mpopov! I talked this over in my bi-weekly with @JAllemandou, and would like to bring some further context to this particular situation :) The go to table

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Summary on your end sounds great, @Ifrahkhanyaree_WMDE!  Let me know if sending along some empty new item revisions from 2024 would be helpful :) TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Notebook with the work that was done for this is: wmde/analytics/tasks/product_platform/2024/T360761_empty_wikidata_items/T360761_empty_wikidata_items.ipynb <https://gitlab.wikimedia.org/repos/wmde/analytics/-/blob/main/tasks/product_platform/2

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE moved this task from Needs product input to Product verification on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment. Further insights on this, and moving it to `Product verification` at this point :) I've now changed the query to a span of bytes

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-19 Thread AndrewTavis_WMDE
AndrewTavis_WMDE moved this task from In progress to Needs product input on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment. The thread on Mattermost <https://mattermost.wikimedia.de/swe/pl/gsr9b485x7geby79t4sg151j7c> for discussing this has a lot of co

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-19 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Prioritizing this now. Initial exploration of the data sources indicates that we need to use the full `mediawiki_history` rather than `mediawiki_history_reduced` as the latter doesn't have a distinct `page_is_deleted` field for Population B. TASK DETAIL

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T362643: Mismatch Finder gadget: visisted link text icon doesn't change color with link

2024-04-16 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added a project: Wikidata.org. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Note that I did this Phabricator tasks search <https://phabricator.wikimedia.org/search/query/5BIk7a7RSJzT/#R> before making thi

[Wikidata-bugs] [Maniphest] T362641: [MSMF] Button texts are not centered in various places

2024-04-16 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Mismatch Finder, Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Note that I did the following Phabricator search <https://phabricator.wikimedia.org/search/query/FxDRSlmcrOEQ/#R> before w

[Wikidata-bugs] [Maniphest] T362301: [MSMF] Add mismatch file upload scripts to Mismatch Finder repo

2024-04-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362301 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, luca.favorido, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T362301: [MSMF] Add mismatch file upload scripts to Mismatch Finder repo

2024-04-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Mismatch Finder, Wikidata, wmde-wikidata-tech. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Context --- A part of the WMDE x Purdue University program where students have been looking

[Wikidata-bugs] [Maniphest] T356659: [QB] Remove references of broken tool from Mismatch Finder and Query Builder

2024-04-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "[MSMF] [QB] Remove references of broken tool from Mismatch Finder and Query Builder" to "[QB] Remove references of broken tool from Mismatch Finder and Query Builder". TASK DETAIL https://phabricator.wikimedia.org/T356659 EMAIL

[Wikidata-bugs] [Maniphest] T362151: [SW] The mismatch file description should be more visibly apparent in the Mismatch Finder UI

2024-04-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362151 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Sarai-WMDE, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T362217: Mismatch finder long description modal doesn't close on X press

2024-04-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362217 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T362217: Mismatch finder long description modal doesn't close on X press

2024-04-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362217 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T362217: Mismatch finder long description modal doesn't close on X press

2024-04-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362217 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T362217: Mismatch finder long description modal doesn't close on X press

2024-04-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Mismatch Finder, Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION **Steps to replicate the issue**: In looking at the mismatches on Mismatch Finder <https://mismatch-finder.toolforge.

[Wikidata-bugs] [Maniphest] T362151: [SW] The mismatch file description should be more visibly apparent in the Mismatch Finder UI

2024-04-09 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Mismatch Finder, Wikidata, Wikidata Dev Team. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Problem --- Upon uploading some new mismatches, something that I'm realizing

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-04-09 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that in checking the `tmp` directory just now, there still are files/directories in there, meaning that parts of the process are likely still running (maybe parts that don't need private data access). We'll be checking this again in a month once

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: April 2024)

2024-04-08 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "[Analytics] Monthly repeating tasks (next: March 2024)" to "[Analytics] Monthly repeating tasks (next: April 2024)". TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: March 2024)

2024-04-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Sheet has been updated for March via a query of `wmde.wd_rest_api_metrics_monthly` that's generated by Airflow. Slightly lower user agents than last month, but IPs doubled  TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: March 2024)

2024-04-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-03-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T356659: [MSMF] [QB] Remove references of broken tool from Mismatch Finder and Query Builder

2024-03-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356659 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, karapayneWMDE, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T356659: [MSMF] [QB] Remove references of broken tool from Mismatch Finder and Query Builder

2024-03-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Updated the description as wmde/wikidata-mismatch-finder#878 <https://github.com/wmde/wikidata-mismatch-finder/pull/878> fixed the problem for Mismatch Finder. At time of writing Curious Facts is still referenced in the Query Builder footer. TASK

[Wikidata-bugs] [Maniphest] T356659: [MSMF] [QB] Remove references of broken tool from Mismatch Finder and Query Builder

2024-03-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356659 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, karapayneWMDE, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-03-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: March 2024)

2024-03-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE lowered the priority of this task from "Medium" to "Low". TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WM

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: March 2024)

2024-03-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. I've added the numbers for February to the sheet based on the first DAG run and also just went through the query job one final time to check. The queries that are being ran by the job are directly from the original queries with only a few minor changes

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-03-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that this task is dependent on whether a standardized system that would not require the published datasets is created. Such a system is discussed in T361214: Public dashboard process <https://phabricator.wikimedia.org/T361214>. TASK DETAIL

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-03-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE removed AndrewTavis_WMDE as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T360298: [Analytics] Public dashboard pilot

2024-03-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that I've made T361214: Public dashboard process <https://phabricator.wikimedia.org/T361214> to explain our use case of a standardized public dashboard process :) TASK DETAIL https://phabricator.wikimedia.org/T360298 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-03-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION In T341330: [Analytics] Airflow implementation of unique ips accessing Wikidata's REST API metrics <ht

[Wikidata-bugs] [Maniphest] T341330: [Analytics] Airflow implementation of unique ips accessing Wikidata's REST API metrics

2024-03-27 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Merge request <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/631> has been brought in, and we've successfully deployed!  An output from the new `wmde.wd_rest_api_metrics_monthly` table is: |

[Wikidata-bugs] [Maniphest] T341330: [Analytics] Airflow implementation of unique ips accessing Wikidata's REST API metrics

2024-03-27 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T341330 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T360298: [Analytics] Public dashboard pilot

2024-03-27 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from " [Analytics] Public Superset dashboard pilot" to " [Analytics] Public dashboard pilot". AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360298 EMAIL PREFERENCES https://phabricator.wi

[Wikidata-bugs] [Maniphest] T360298: [Analytics] Public Superset dashboard pilot

2024-03-27 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Post a large discussion about this in the `data-engineering-collab` channel on Slack, the general findings for this are: - The public Superset instance isn't suitable for this at this time and there's no time table for it to be (see above comments

[Wikidata-bugs] [Maniphest] T360298: [Analytics] Public Superset dashboard pilot

2024-03-27 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Further checks on this: the dashboarding process for the public Superset seems to be based on a few preset databases that have the data from Wikimedia projects (see SQL Lab <https://superset.wmcloud.org/sqllab/>). As of now I'm doubting whether we'd b

[Wikidata-bugs] [Maniphest] T360298: [Analytics] Public Superset dashboard pilot

2024-03-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that from the most recent discussions with WMF data engineering, there isn't a set workflow for getting information into a place where it can be accessed via the Public Superset instance. We would need to edit the DAG such that we include an export

[Wikidata-bugs] [Maniphest] T348999: Add linter and formatter to wmfdata-python (and link check)

2024-03-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Exciting! I'll play around a bit towards the end of next week and send along a PR with the workflow, docs and changes given the local run warnings  Will let you know if anything comes up before then. Have a nice weekend when it comes along! TASK DETAIL

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-03-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE

[Wikidata-bugs] [Maniphest] T348999: Add linter and formatter to wmfdata-python (and link check)

2024-03-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. @nshahquinn-wmf, @xcollazo: checking in on this one again. I would have some time in the next two weeks or so to implement a PR workflow check of linting and code formatting. If folks are fine with Ruff <https://github.com/astral-sh/ruff> that'd be e

[Wikidata-bugs] [Maniphest] T341330: [Analytics] Airflow implementation of unique ips accessing Wikidata's REST API metrics

2024-03-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Merge request for this has been sent and can be found here <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/631> :) Requested WMF's review on this first one, but we'll need to take over from there unless there are pr

[Wikidata-bugs] [Maniphest] T341330: [Analytics] Airflow implementation of unique ips accessing Wikidata's REST API metrics

2024-03-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T341330 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-03-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-03-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T357697: Archive WMDE analytics Gerrit repositories

2024-03-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE closed this task as "Resolved". AndrewTavis_WMDE claimed this task. AndrewTavis_WMDE added a comment. Fantastic! Thank you both again for the help here :) Really is great to be winding down these processes and moving onto the next steps!  TASK DETA

[Wikidata-bugs] [Maniphest] T357697: Archive WMDE analytics Gerrit repositories

2024-03-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Thank you both so much! Let me know when the GitHub repos have been deleted and I'll resolve this and update the greater epic  TASK DETAIL https://phabricator.wikimedia.org/T357697 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T357697: Archive WMDE analytics Gerrit repositories

2024-03-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE edited projects, added Wikidata Analytics (Kanban); removed Wikidata Analytics. TASK DETAIL https://phabricator.wikimedia.org/T357697 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: hashar, brouberol, Manuel

[Wikidata-bugs] [Maniphest] T360436: [MSMF] Add upload file limit to Mismatch Finder documentation

2024-03-20 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Per suggestion from @noarave I reran the curl command <https://github.com/wmde/wikidata-mismatch-finder/blob/main/docs/UserGuide.md#example-with-curl> with `-v` at the end for a verbose output. Of note is in the first line we have `Note: Unnecessa

  1   2   3   4   5   6   >