[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE added a comment. Thanks for taking care of this, @Lucas_Werkmeister_WMDE! We'll be able to close both this and T351072 <https://phabricator.wikimedia.org/T351072> after Tuesday next week if/when the Puppet change is deployed :) TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, Kent7301, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365457 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE added a comment. @BTullis, checking in on this as your help in T358311 <https://phabricator.wikimedia.org/T358311> reminded me as it's all related to the same user. Would you be able to remove the `statistics/manifests/wmde/wdcm.pp` file and any related processes (including now stat1011) as well? TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. Thank you, @BTullis! Ya I wasn't happy with the solution either. Appreciate your willingness to help! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: BTullis, brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. I'm realizing also that I don't have admin rights and thus can't move files to your directory. I'll copy these files over to my directory, download them and send you a link to a zipped directory on Google Drive once we have the above figured out. TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. Hi @Manuel, checking further as it's still not clear what you'd like. The double except is confusing. I'll only transfer files from `stat1005`, and could you answer the following questions: 1. Do you want **data files** (.csv, .tsv, etc) __before 2020__? (assumption no) 2. Do you want **data files** __after 2020__? (as of now unclear) 3. Do you want **non data files** (.py, .R, etc) __before 2020__? (as of now unclear) 4. Do you want **non data files** __after 2020__? (assumption yes) TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. Hi @Manuel - sending along a summary of what I'll be getting for you: == stat1004 == Jul 25 2020 Analytics Jun 23 2020 Experiments Jul 25 2020 wdUsagePerPage == stat1005 == All non data files == stat1007 == Aug 23 2020 Analytics Jan 27 2020 Experiments Aug 23 2020 RScripts == stat1008 == Oct 11 2021 Analytics Jun 23 2020 R === HDFS 2021-11-02 17:37 /user/goransm/dewiki_revisions 2021-04-11 16:51 /user/goransm/wdtranslationsb No other files, as everything after 2020 is a data file or ORES related (this is coming in the stat server files anyway) TSVs, CSVs and data file types will not be included in the transfer. Out of convenience, I'm going to transfer the files into your directory on the given server. TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. Ok then! So the checks of the files above is complete as shown by its status. General summaries of each stat machine and HDFS are provided under the subsections above. `stat1005` has some files that @Manuel may find interesting given that they're for prior tasks of his. Any queries that looked like they could be interesting or were in files whose names sounded interesting but the query ended up not being interesting are printed above for documentation. Overall I can say that anything from the above would be easier to work from scratch via the docs and checking with WMDE engineers or WMF Data Engineering/Analytics rather than going through and re-implementing it. I personally would not keep anything, and will delete the files I copied over to my `stat1005` once this is closed :) Thanks again @JAllemandou for the file lists, and thanks @brouberol for the ping! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)
AndrewTavis_WMDE added a comment. So basically removing the wdcm.pp related file on GitHub and its Puppet workflows will close both tasks :) TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg, karapayneWMDE, Aklapper, Lucas_Werkmeister_WMDE, AndrewTavis_WMDE, Michael, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Djdungti, LawExplorer, _jensen, rosalieper, Scott_WUaS, Izno, Nastoshka, Wikidata-bugs, aude, Dinoguy1000, scfc, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)
AndrewTavis_WMDE added a comment. Ah looking at this, I'm realizing I restated myself as the work that's left in T364965: stat1007 to stat1011 migration pipeline output check <https://phabricator.wikimedia.org/T364965> is a duplicate of what we want to do here :) TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg, karapayneWMDE, Aklapper, Lucas_Werkmeister_WMDE, AndrewTavis_WMDE, Michael, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Djdungti, LawExplorer, _jensen, rosalieper, Scott_WUaS, Izno, Nastoshka, Wikidata-bugs, aude, Dinoguy1000, scfc, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)
AndrewTavis_WMDE added a comment. Hey @Arian_Bozorg Yes, we do still need to check this out. I was thinking that @Lucas_Werkmeister_WMDE and I could discuss this when we chat about what else is needed in T364965: stat1007 to stat1011 migration pipeline output check <https://phabricator.wikimedia.org/T364965>. In that one we've confirmed now that the data is coming in from stat1011, so at this point it'd be good to delete the statistics/manifests/wmde/wdcm.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/wdcm.pp> and also remove it's workflow from Puppet (just not quite sure if I have access and how to go about the Puppet work). I'm hopeful that another 25min call would be enough to get the work done for both tasks and I can document for my learning/our processes and report back? Let me know if sometime later if the week could work for this! TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg, karapayneWMDE, Aklapper, Lucas_Werkmeister_WMDE, AndrewTavis_WMDE, Michael, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Djdungti, LawExplorer, _jensen, rosalieper, Scott_WUaS, Izno, Nastoshka, Wikidata-bugs, aude, Dinoguy1000, scfc, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365457 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Making this task as a means of saving that there is still work to be done to close out the Purdue Data Mine program. Specifically all pull requests in the repo <https://github.com/Wikidata/Purdue-Data-Mine-2024/pulls> need to be brought in, and the resulting mismatches should be uploaded to Mismatch Finder using upload_mismatches.py <https://github.com/Wikidata/Purdue-Data-Mine-2024/blob/main/upload_mismatches.py>. TASK DETAIL https://phabricator.wikimedia.org/T365457 WORKBOARD https://phabricator.wikimedia.org/project/board/6546/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. ⚠️ Currently WIP ⚠️ === Going through the files sent by @JAllemandou above <https://phabricator.wikimedia.org/T358311#9648470>. This message will be saved as I go so that I don't loose my progress If I do find something worth documenting, then I'll also include it below so that this task can serve as a reference for later if need be. stat1004 All of the files are not worth keeping. See descriptions and reasoning below: total 28 Analytics └─ NewEditors └─ adHoc (nothing of interest) └─ Compaigns └─ 2019 and 2020 email compaigns with R based analysis (nothing of interest) └─ WDCM └─ WDCM_Output └─ Lots directories of CSVs (nothing of interest) └─ WDCM_Scripts └─ R based scripts that would be archived on Gerrit if they were ever in production (nothing of interest) └─ Wikidata └─ misc └─ Some ad hoc work (nothing of interest) └─ WD_languagesLandscape └─ R based scripts that would be archived on Gerrit if they were ever in production (nothing of interest) └─ WD_ORES_ItemQuality (nothing of interest given Lift Wing migration) └─ WD_UsageCoverage └─ R and Python scripts that are doubtless versions of the WDCM UsageCoverage dashboard that's archived on Gerrit (nothing of interest) Experiments └─ Empty _miscWMDE └─ summerBannerCampaign2017_DataOUT └─ TSV files (nothing of interest) └─ TWLBanner_2017 └─ TSV files and simple HQL queries from `wmf.webrequest` for banner campaigns hits (nothing of interest, easy to learn as needed) Example query: SELECT count(*) FROM wmf.webrequest WHERE uri_host = 'de.wikipedia.org' AND uri_query LIKE "$/wiki/Wikipedia:Umfragen/Technische_Wünsche_2017$" AND http_method = 'GET' AND is_pageview = TRUE AND YEAR = 2017 AND MONTH = 6 AND DAY = 1 and HOUR = 20; └─ TWLBanner_2017_DataOUT └─ TSV files (nothing of interest) _miscWMDE_1004 └─ TWLBanner_2017 └─ One HQL and one TSV file that are similar to the above (nothing of interest) R └─ x86_64-pc-linux-gnu-library (nothing of interest) Research └─ DydimusZengenene └─ Note: work to support a researcher (nothing of interest) └─ _analytics └─ _data └─ DydimusZengenene.Rproj └─ ParseTargetPage.R wdUsagePerPage └─ Related to the percentage usage dashboard, so would be archived on Gerrit if they were ever in production (nothing of interest) stat1005 total 964 Analytics └─ BotEdits_perProject.ipynb └─ crontabstat1005.txt └─ DataModelTerms_20210228_Updates.ipynb └─ dewiki_NewEds_2021.ipynb └─ QCF_M2_Test.ipynb └─ QuratorCuriousFacts_Separators.ipynb └─ Qurator_M1.ipynb └─ R └─ snapshot_query.hql └─ Untitled1.ipynb └─ untitled1.txt └─ Untitled2.ipynb └─ Untitled3.ipynb └─ Untitled4.ipynb └─ Untitled5.ipynb └─ Untitled.ipynb └─ untitled.txt └─ venv └─ wd_cluster_fetch_items_M2.ipynb └─ wd_cluster_fetch_items_M3.ipynb └─ WDCM_ETL_OTHER_TEST.ipynb └─ WDCM_Statements_Test.ipynb └─ WD_HumanEditsPerClass_RevisionTags.ipynb └─ WD_Inequality_Intake.ipynb └─ WD_Languages_Datamodel_CollectInit.ipynb └─ WD_Languages_Datamodel_EXP.ipynb └─ WD_MonthlyEditors.ipynb └─ WD_Sitelinks_WDAHP_202108.ipynb └─ wd_statements_HiveQL_Query.hql └─ WD_Translations.ipynb └─ WHEIP_exps.ipynb └─ wikidata_analytics_examples └─ WikidataRevisions_November2020.csv └─ stat1006 total 48 misc_projects └─ myTemp └─ NewEds └─ nohup.out └─ R └─ RPckg └─ RScripts └─ sqlIn └─ sqlOut └─ WDCM_Credentials └─ WDCM_DataIN └─ WDCM_DataOUT └─ WDCM_sql └─ stat1007 total 28 Analytics └─ crontabstat1007.txt └─ Experiments └─ Python3 └─ R └─ RScripts └─ venv └─ stat1008 total 16 Analytics └─ R └─ renv └─ venv └─ stat1009 total 0 stat1010 ---
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE added a comment. Note that MR#700 <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/700> has been opened that has the work for this :) TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a comment. Note that MR#700 <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/700> has been opened that has the work for this :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE added a comment. Confirming that data's still coming in as well. @BTullis, what should we do about statistics/manifests/wmde/wdcm.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/wdcm.pp>? Remove the file? And could you also remove it from puppet entirely on stat1011 as well? Anything else? TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE added a comment. Quick note that the word used by @BTullis was `disabled` instead of `removed` for the stat1007 timers, so apologies if this caused some confusion. I figure not, but just wanted to be clear :) @BTullis, would you be able to check the journal for them and paste the output here so we can check it? On my end as well it seems like I can't access it. TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE renamed this task from "stat1007 migration output check" to "stat1007 to stat1011 migration pipeline output check". TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 migration output check
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata, Wikidata Dev Team. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Context --- Recently WMF has been migrating from legacy stat servers that are being deprecated - specifically stat1004, 1005, 1006 and 1007. WMDE has a few pipelines that were running on stat1007 that have since been migrated over to stat1011: - statistics/manifests/wmde/graphite.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/graphite.pp> - statistics/manifests/wmde/wdcm.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/wdcm.pp> The latter at first glance doesn't appear to do anything as it sets the environment variables and clones, but then the rest is `TODO`. The former is more expansive and leads in to our Graphite/Grafana workflows. Further directions -- > You should be able to find the required files and the clone of https://gerrit.wikimedia.org/g/analytics/wmde/scripts <https://gerrit.wikimedia.org/g/analytics/wmde/scripts> beneath `stat1011:/srv/analytics-wmde`. The assumption is that they're working, and the timers for stat1007 have been removed. Goals - Check the pipeline in statistics/manifests/wmde/graphite.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/graphite.pp> to assure that everything is working properly after the stat1007 -> stat1011 migration. TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: June 2024)
AndrewTavis_WMDE added a comment. Sheet updated with the numbers for April. Higher number of user agents, but lower IPs (but then IPs still much higher than Feb). TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: June 2024)
AndrewTavis_WMDE renamed this task from "[Analytics] Monthly repeating tasks (next: May 2024)" to "[Analytics] Monthly repeating tasks (next: June 2024)". AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm
AndrewTavis_WMDE added a comment. Hey @brouberol Just getting back from two weeks off today :) I'll check into this and get back to you all! Thanks for the ping! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations
AndrewTavis_WMDE renamed this task from "Generate historical weekly segments of Wikidata item sitelinks segmentations" to "Generate historical weekly segments of Wikidata item sitelink segmentations". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelinks segmentations
AndrewTavis_WMDE renamed this task from "Generate weekly historical segments of Wikidata item sitelinks segmentations" to "Generate historical weekly segments of Wikidata item sitelinks segmentations". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate weekly historical segments of Wikidata item sitelinks segmentations
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata, Wikidata Analytics (Kanban). Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Purpose --- In T362849: [Analytics] Segments of Wikidata's data over time <https://phabricator.wikimedia.org/T362849> we need to calculate historical segments of Wikidata's items based on their relation to sitelinks. Purpose from that ticket: > As Wikidata Product Managers, we would like to understand how different segments of Wikidata's data developed over time, so we can inform our projections. This task would encompass the historical data that's needed to achieve this. Scope - From T362849 <https://phabricator.wikimedia.org/T362849>: > How did the number of Items of the following types develop over time? > > A) Items that contain a sitelink to one of the Wikimedia projects (e.g. about a notable person) > B) Items that are needed to build A (used in A Items for example in a statement or reference; e.g. the non-notable father of that notable person) > C) All other Items - In order to do this, T363451: Add job to create Wikidata partition to wmf.mediawiki_wikitext_history <https://phabricator.wikimedia.org/T363451> was made to recreate the Wikidata partition of wmf.mediawiki_wikitext_history <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Mediawiki_wikitext_history> - Once this task is complete, work can then begin to use this partition to generate all data from when Wikidata was created to the most recent weekly data generated by the DAG created in T362849 <https://phabricator.wikimedia.org/T362849> Desired Output -- - Weekly stats of the number of Items in category A, B and C Acceptance criteria: [ ] Weekly historical breakdowns of populations A, B and C - These would be in the Data Lake and the published datasets --- **Information below this point is filled out by the Wikidata Analytics team.** General Planning Information is filled out by the analytics product manager. Assignee Planning - Information is filled out by the assignee of this task. Estimation -- Estimate: Actual: Sub Tasks - Full breakdown of the steps to complete this task: [ ] Step Data to be used --- See Analytics/Data_Lake <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake> for the breakdown of the data lake databases and tables. The following tables will be referenced in this task: - wmf.mediawiki_wikitext_history <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Mediawiki_wikitext_history> Notes and Questions --- Things that came up during the completion of this task, questions to be answered and follow up tasks: - Note TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a comment. See T362849_wd_item_sitelink_segments.ipynb <https://gitlab.wikimedia.org/repos/wmde/analytics/-/blob/main/tasks/wikidata/2024/T362849_wd_item_sitelink_segments/T362849_wd_item_sitelink_segments.ipynb?ref_type=heads> for the work to derive the segments :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a comment. Ok, so the new numbers after the change in scope for the max `2024-04-15` snapshot are: items_with_sitelinks: 32,231,861 items_items_with_sitelinks_link_to: 2,980,388 all_other_items: 72,910,679 For documentation, the numbers for the original Population B definition for the min `2024-02-26` snapshot were: items_with_sitelinks: 31,978,738 linked_to_items_with_sitelinks: 75,221,879 all_other_items: 242,565 Status on the rest of this: - The weekly DAG is written and further does include an export to the published datasets repo - I've also included the work for T361203 <https://phabricator.wikimedia.org/T361203> in this - We need to confirm the numbers above and the method that generates them - I'll then rewrite the DAG job that runs the query - Then testing, I'll need the table `wmde.wd_item_sitelink_segments_weekly` to be made in HDFS by an admin, and then we can go into production - Should all be done by Tuesday/Wednesday evening after I'm back in a few weeks depending on folks' availability - I'll make a new task for the historic data generation process, which will depend on T363451 <https://phabricator.wikimedia.org/T363451> TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE added a comment. Moved this to `In progress` as I'm adding the job to export everything to the published datasets folder to the DAG as I work on the same for T362849 <https://phabricator.wikimedia.org/T362849>. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a comment. See {https://phabricator.wikimedia.org/T363451} for the task about bringing back the partition (hopefully via another job). I added a bit about whether we want to maybe turn this job on when WMDE needs historical data. Let me know what you all think on that :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a comment. Another note on this is: if we don't expect to be needing a Wikidata partition of `wmf.mediawiki_wikitext_history` for other tasks, then we could work directly from the XML dump for the data backdate. We wouldn't be able to leverage PySpark for the querying though, so I worry about how long all of this would take... TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a subscriber: JAllemandou. AndrewTavis_WMDE added a comment. Thanks for all of the information, @mpopov! I talked this over in my bi-weekly with @JAllemandou, and would like to bring some further context to this particular situation :) The go to table for this would be wmf.wikidata_entity <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Wikidata_entity> for the following reasons: - It has the `sitelinks` column for Population A above - It has the `claims` column for Population B above It thus has everything we need for the given task for future data. One change to the output for this though would be the frequency of the DAG, as `wmf.wikidata_entity` is a weekly data dump, so it'd make sense to do a weekly DAG. If we still want to do a monthly job, then the best option would be to do a DAG that runs on the first Monday of every month (in the docs for `wmf.wikidata_entity` it mentions the `2020-01-20` snapshot, which was a Monday). Now we get to the question of the historical data... This is a situation that cannot be solved at this time given the current makeup of the Data Lake. As mentioned on Mattermost: we currently do not have Wikidata as a partition within wmf.mediawiki_wikitext_history <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Mediawiki_wikitext_history>, so we do not have historical versions of Wikidata items with which we'd be able to rebuild the history. The assumption we're making on this is that the legacy version of these metrics was made using `wmf.mediawiki_wikitext_history` at a time when Wikidata was still an available partition. The change for removing Wikidata from the `wmf.mediawiki_wikitext_history` dump process was `2024-02` - see T357859 <https://phabricator.wikimedia.org/T357859> where ~12 of 25 days of the dump generation is for the Wikidata XML dump. This was slowing down metrics delivery for WMF Movements Insights. Steps forward on this: - I'll begin work on a DAG based on `wmf.wikidata_entity`, as even if we do get a Wikidata partition within `wmf.mediawiki_wikitext_history`, it would not be used for recent data updates - Are we fine with a weekly DAG? - A decision needs to be made on whether WMDE is requesting Wikidata data to again be an output in `wmf.mediawiki_wikitext_history` snapshot creation process - The preferred solution here would be to not revert the changes to T357859 <https://phabricator.wikimedia.org/T357859>, but rather make a new job that adds a new partition to the table via the Wikidata XML dump - Reason for this is to assure that WMF Movements Insights can maintain the current speed of delivery - @JAllemandou has said that bringing the Wikidata partition back is fine if we need it (again, preferably in the above way) - If the request is being made, a new task should be made for it - We'd then do what I'd argue would be a separate task whereby the new `wmf.mediawiki_wikitext_history` Wikidata parition would be used to recompute the historical populations above Let me know what thoughts are on the above! TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE claimed this task. AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE added a comment. Summary on your end sounds great, @Ifrahkhanyaree_WMDE! Let me know if sending along some empty new item revisions from 2024 would be helpful :) TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE added a comment. Notebook with the work that was done for this is: wmde/analytics/tasks/product_platform/2024/T360761_empty_wikidata_items/T360761_empty_wikidata_items.ipynb <https://gitlab.wikimedia.org/repos/wmde/analytics/-/blob/main/tasks/product_platform/2024/T360761_empty_wikidata_items/T360761_empty_wikidata_items.ipynb>. Will update this if further work is needed :) TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE moved this task from Needs product input to Product verification on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment. Further insights on this, and moving it to `Product verification` at this point :) I've now changed the query to a span of bytes that would be allowable for something to be empty. I added 10 bytes to the calculated max for `170`, but also tried with `180` and `190` and the trend of empty on first revision items dropping off is maintained. Basic finding: it used to be way more common, but still does happen today New query is the following: SELECT DISTINCT event_user_text AS editor, substring(event_timestamp, 1, 7) AS event_year_month, page_title AS created_empty_qid FROM wmf.mediawiki_history WHERE wiki_db = 'wikidatawiki' AND page_namespace_is_content = True AND snapshot = '2024-03' AND event_entity = 'revision' AND event_type = 'create' AND page_revision_count = 1 -- Factor in bytes that are within a range small enough to be an empty first edit. AND 148 < revision_text_bytes AND revision_text_bytes < 170 ; Task 1.1 - Number of Items in population A that were created empty: `5,075,471` Task 1.2 - Number of editors who are creating empty items: `27,61` Of the above items, I did a test of `50,000` to see if they were empty on deletion using the `https://www.wikidata.org/wiki/Special:EntityData/` endpoint. `49,579` returned valid JSON responses, and of those `99.65%` were found to be empty. I also checked the empty item creation over time, with the following two plots coming based on the above definition of the population in the query (148-170 bytes being "empty"): F48099515: total_empty_qids_created_per_month_v3_definition.png <https://phabricator.wikimedia.org/F48099515> F48099542: total_empty_qids_created_per_month_in_2023_and_2024_v3_definition.png <https://phabricator.wikimedia.org/F48099542> Again, I also tried boosting the max byte sizes for `180` and `190` and the plots above were not noticeably different. Task 2 - Number of Items in population B that are currently deleted: `44,385` (`0.87%`) I switched around the 3.x tasks a bit with a focus on visualization, as as I said I basically wasn't seeing ones that were created empty and were still empty. Task 3.1 - no further edits ever on items that are not deleted: `0` (they all have at least one more edit) Query for this: WITH not_deleted_created_empty_qids_v3 AS ( SELECT DISTINCT page_title AS not_deleted_created_empty_qid FROM wmf.mediawiki_history WHERE wiki_db = 'wikidatawiki' AND page_namespace_is_content = True AND snapshot = '2024-03' AND event_entity = 'revision' AND event_type = 'create' AND page_revision_count = 1 -- Factor in bytes that are within a range small enough to be an empty first edit. AND 148 < revision_text_bytes AND revision_text_bytes < 170 AND page_is_deleted = False ) SELECT h.page_title AS not_deleted_created_empty_qid, count(h.revision_id) AS number_of_revisions FROM wmf.mediawiki_history AS h JOIN not_deleted_created_empty_qids_v3 AS e ON h.page_title = e.not_deleted_created_empty_qid WHERE h.wiki_db = 'wikidatawiki' AND h.page_namespace_is_content = True AND h.snapshot = '2024-03' AND h.event_entity = 'revision' AND h.event_type = 'create' GROUP BY h.page_title Task 3.2 - at least one additional edit (=the rest): `5,031,086` - Check: `5,031,086 + 44,385 = 5,075,471` New and hopefully a bit more helpful (my assumption) Task 3.3 - graphs of the number of edits the items have had F48100783: not_deleted_empty_on_creation_items_per_edit_amount_max_100_-_v3_definition.png <https://phabricator.wikimedia.org/F48100783> F48100788: number_of_revisions_on_empty_on_creation_items_v3_definition.png <https://phabricator.wikimedia.org/F48100788> Let me know if anything else would be helpful here, @Ifrahkhanyaree_WMDE! TASK DETAIL https://phabricator.wikimedia.org/T360761 WORKBOARD https://phabricator.wikimedia.org/project/board/6546/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE moved this task from In progress to Needs product input on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment. The thread on Mattermost <https://mattermost.wikimedia.de/swe/pl/gsr9b485x7geby79t4sg151j7c> for discussing this has a lot of comments on the data restrictions we're dealing with here because there is no text table for Wikidata in the Data Lake. A work around using `revision_text_bytes` to determine the minimum size that an item could be (i.e. = empty) has been used so far with okish results, but there are definitely drawbacks and it's not exact. What it is that I can say here is that: - There are lots of items being created empty (from one subset `3,540,260`) - They're not normally deleted (from the same subset only `0.95%` where) - It's usual that there are edits (I've yet to see an item that was created empty and is still empty, but please note that this is an eye test on ~30 items) Moving this to `Needs product input` for now. A basic thing that can be done that won't take too much time is that I can use a range instead of the case when for determining when a item is empty via the length of it's QID and the `revision_text_bytes` size. We would then not be getting empty on creation items 100% of the time, but I could also find the ratio and we could agree on what an acceptable margin of error would be (say `> 90%`). Time estimate on this is 1/2 a day. TASK DETAIL https://phabricator.wikimedia.org/T360761 WORKBOARD https://phabricator.wikimedia.org/project/board/6546/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE added a comment. Prioritizing this now. Initial exploration of the data sources indicates that we need to use the full `mediawiki_history` rather than `mediawiki_history_reduced` as the latter doesn't have a distinct `page_is_deleted` field for Population B. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362643: Mismatch Finder gadget: visisted link text icon doesn't change color with link
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added a project: Wikidata.org. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Note that I did this Phabricator tasks search <https://phabricator.wikimedia.org/search/query/5BIk7a7RSJzT/#R> before making this task :) **Steps to replicate the issue** (include links if applicable): - Go to https://mismatch-finder.toolforge.org/ - Click on `Random mismatches` - Click on the label and QID header of any element displayed - Click on `Inspect` in the Mismatch Finder gadget with the text `There is/are NUM_MISMATCHES mismatch/es for this item.` - Wait for the page to load such that the link you clicked now has the status visited - Navigate back to the Wikidata item page you were on **What happens?**: You'll see that the link text is colored given the visited status, but the link icon is still the default link text color **What should have happened instead?**: My expectation would be that the icon for the external link would have the same color as the link it's associated with. **Software version** (on `Special:Version` page; skip for WMF-hosted wikis like Wikipedia): Currently deployed version of the gadget. Not sure :) **Other information** (browser name/version, screenshots, etc.): Browser is Firefox 124.0.2 (64-bit). Screenshot of the assumed discoloration is below: F46968414: Screenshot from 2024-04-16 12-54-59.png <https://phabricator.wikimedia.org/F46968414> Minor comment: the link icon doesn't necessarily convey that what the user is clicking on is an external link. Would it make sense to shift the icon over to the right of `Inspect` and use the external link icon - arrow pointing to the top right from a box - for this? TASK DETAIL https://phabricator.wikimedia.org/T362643 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, KimKelting, Wikidata-bugs ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362641: [MSMF] Button texts are not centered in various places
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Mismatch Finder, Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Note that I did the following Phabricator search <https://phabricator.wikimedia.org/search/query/FxDRSlmcrOEQ/#R> before writing this :) **Steps to replicate the issue**: - Go to https://mismatch-finder.toolforge.org/ **What happens?**: Seems like the buttons on the page don't have their texts centered? See screenshots below: F46965326: Screenshot from 2024-04-16 13-23-44.png <https://phabricator.wikimedia.org/F46965326> F46965344: Screenshot from 2024-04-16 13-23-32.png <https://phabricator.wikimedia.org/F46965344> F46965357: Screenshot from 2024-04-16 13-23-17.png <https://phabricator.wikimedia.org/F46965357> F46965370: Screenshot from 2024-04-16 13-23-06.png <https://phabricator.wikimedia.org/F46965370> I've loaded each of the above screenshots into Figma to check the dimensions and there's extra space beneath the label in all of them except one. For the language selector the space is equal, but then there's a lowercase g, so maybe the text should be a bit lower still? **What should have happened instead?**: The text should be centered. **Software version** (on `Special:Version` page; skip for WMF-hosted wikis like Wikipedia): Currently deployed version of Mismatch Finder. **Other information** (browser name/version, screenshots, etc.): Browser is Firefox 124.0.2 (64-bit). TASK DETAIL https://phabricator.wikimedia.org/T362641 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362301: [MSMF] Add mismatch file upload scripts to Mismatch Finder repo
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362301 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, luca.favorido, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362301: [MSMF] Add mismatch file upload scripts to Mismatch Finder repo
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Mismatch Finder, Wikidata, wmde-wikidata-tech. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Context --- A part of the WMDE x Purdue University program where students have been looking for mismatches <https://www.wikidata.org/wiki/Wikidata:Mismatch_Finder/Collaboration/Purdue_Summer_of_Data_2024> has been the creation of scripts to more easily upload mismatch files. These scripts can be found in the root of the wikidata/Purdue-Data-Mine-2024 <https://github.com/Wikidata/Purdue-Data-Mine-2024> repo on GitHub. The files and descriptions of their use are: 1. check_mismatch_file.py <https://github.com/Wikidata/Purdue-Data-Mine-2024/blob/main/check_mismatch_file.py> - Loads a target CSV into a pandas DataFrame - Includes the function `check_mf_formatting` that will check the validity of the file for upload given the Mismatch Finder user guide <https://github.com/wmde/wikidata-mismatch-finder/blob/main/docs/UserGuide.md#creating-a-mismatches-import-file> - Says that the file is ready for upload, or if the file is not valid, steps to fix it are printed - At the start of the process, will also warn the user if the file is larger than the upload file size limit of 10 MB (see next file) 2. split_mismatch_file.py <https://github.com/Wikidata/Purdue-Data-Mine-2024/blob/main/split_mismatch_file.py> - Written in response to the upload limit of 10 MB for the Mismatch Finder API (see T360436 <https://phabricator.wikimedia.org/T360436>) - A path to a CSV is passed, and if the file is greater than the upload limit, then CSV subsets are created in a directory that are below the upload limit - A path to where the subset CSVs should be saved can be passed, and the resulting directory is checked to make sure it only has CSVs - Whether the original CSV should be deleted can also be passed as an argument 3. upload_mismatches.py <https://github.com/Wikidata/Purdue-Data-Mine-2024/blob/main/upload_mismatches.py> - A path to a CSV or directory of CSVs is passed - Python `requests` is used to execute the cURL request, with the `r.raise_for_status()` raising an error and printing the errors if the upload is unsuccessful - Arguments further include the needed access token, a description, the external source, the URL for the external source, and verbosity - Assertions are made to assure that arguments are correct Open questions -- I've found the process of using these scripts for uploading mismatches to be much easier than using cURL where the errors were not returned, or figuring out where all the needed arguments should go within a interface to make the request like Postman. Whether or not the second script should be included in the third is definitely something that should be considered based on end user feedback. Please let me know if there are any questions! TASK DETAIL https://phabricator.wikimedia.org/T362301 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, luca.favorido, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356659: [QB] Remove references of broken tool from Mismatch Finder and Query Builder
AndrewTavis_WMDE renamed this task from "[MSMF] [QB] Remove references of broken tool from Mismatch Finder and Query Builder" to "[QB] Remove references of broken tool from Mismatch Finder and Query Builder". TASK DETAIL https://phabricator.wikimedia.org/T356659 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, karapayneWMDE, AndrewTavis_WMDE, luca.favorido, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362151: [SW] The mismatch file description should be more visibly apparent in the Mismatch Finder UI
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362151 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Sarai-WMDE, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362217: Mismatch finder long description modal doesn't close on X press
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362217 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362217: Mismatch finder long description modal doesn't close on X press
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362217 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362217: Mismatch finder long description modal doesn't close on X press
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362217 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362217: Mismatch finder long description modal doesn't close on X press
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Mismatch Finder, Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION **Steps to replicate the issue**: In looking at the mismatches on Mismatch Finder <https://mismatch-finder.toolforge.org/results?ids=Q2804125%7CQ1400789%7CQ117225388%7CQ109394737%7CQ22964628%7CQ374855%7CQ6939795%7CQ16541128%7CQ1887363%7CQ6437641%7CQ24959108%7CQ30309997%7CQ109408429%7CQ27533146%7CQ110360032>, I'm seeing a minor bug :ladybug: For the mismatches that have a long description and a read full description element, when you open the modal to view the full description you can only close it with `Confirm` as the `X` in the top right doesn't function on my end. **What happens?**: The close modal `X` receives the focus state when it is clicked. **What should have happened instead?**: The modal should close. **Software version**: Currently deployed version of Mismatch Finder. **Other information**: Browser is Firefox 124.0.2 (64-bit) Screenshot below: F45566218: Screenshot from 2024-04-09 12-38-08.png <https://phabricator.wikimedia.org/F45566218> TASK DETAIL https://phabricator.wikimedia.org/T362217 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362151: [SW] The mismatch file description should be more visibly apparent in the Mismatch Finder UI
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Mismatch Finder, Wikidata, Wikidata Dev Team. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Problem --- Upon uploading some new mismatches, something that I'm realizing is that the description field for the mismatch file isn't very apparent within the Mismatch Finder UI. This doesn't allow for the person uploading the data to provide information about the context of the upload that would help a Wikidata editor fix the mismatches. To me there are as of now two situations for mismatches: 1. It's a simple mismatch and one value should be chosen 2. The mismatch AND other things on the item in question should be addressed Examples are: 1. The mismatch is pretty clear on one property: external says date of birth for a person is 1969, and Wikidata says 1970 2. The mismatch has lead to an understanding that there are more problems: external source says date of birth of a person is 1969, Wikidata says 1970, and it's because there's a wrong identifier on the item that is leading to a soccer player also being a chess player so the items need to be split A screenshot of the current UI is: F45354311: Screenshot from 2024-04-09 13-54-12.png <https://phabricator.wikimedia.org/F45354311> F45354315: Screenshot from 2024-04-09 13-54-21.png <https://phabricator.wikimedia.org/F45354315> Having a more apparent description that could also be renamed `Description / Directions` or something along those lines would allow an uploader to provide more context so that issues in the second case could be addressed. Solution There are various ways that the description could be made more apparent. To me marking it also as "directions" in the UI would be helpful, but I'm definitely not suggesting that another field should be added to the upload API. Description and directions should be in one text to simplify the work to be done. We could also leave the description in the last column and add some spacing between the username for the upload and date above it. Interested to see what UX thinks on this! Open questions -- How to best deal with the space considerations for the Mismatch Finder UI is definitely something that needs to be accounted for. Acceptance criteria --- [ ] The description field is a bit more apparent such that users would be able to see that there might be hints on how to best deal with the mismatch TASK DETAIL https://phabricator.wikimedia.org/T362151 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Sarai-WMDE, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure
AndrewTavis_WMDE added a comment. Note that in checking the `tmp` directory just now, there still are files/directories in there, meaning that parts of the process are likely still running (maybe parts that don't need private data access). We'll be checking this again in a month once the VPS machines are shut down. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: April 2024)
AndrewTavis_WMDE renamed this task from "[Analytics] Monthly repeating tasks (next: March 2024)" to "[Analytics] Monthly repeating tasks (next: April 2024)". TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: March 2024)
AndrewTavis_WMDE added a comment. Sheet has been updated for March via a query of `wmde.wd_rest_api_metrics_monthly` that's generated by Airflow. Slightly lower user agents than last month, but IPs doubled TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: March 2024)
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356659: [MSMF] [QB] Remove references of broken tool from Mismatch Finder and Query Builder
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356659 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, karapayneWMDE, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356659: [MSMF] [QB] Remove references of broken tool from Mismatch Finder and Query Builder
AndrewTavis_WMDE added a comment. Updated the description as wmde/wikidata-mismatch-finder#878 <https://github.com/wmde/wikidata-mismatch-finder/pull/878> fixed the problem for Mismatch Finder. At time of writing Curious Facts is still referenced in the Query Builder footer. TASK DETAIL https://phabricator.wikimedia.org/T356659 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, karapayneWMDE, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356659: [MSMF] [QB] Remove references of broken tool from Mismatch Finder and Query Builder
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356659 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, karapayneWMDE, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: March 2024)
AndrewTavis_WMDE lowered the priority of this task from "Medium" to "Low". TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: March 2024)
AndrewTavis_WMDE added a comment. I've added the numbers for February to the sheet based on the first DAG run and also just went through the query job one final time to check. The queries that are being ran by the job are directly from the original queries with only a few minor changes: For counting the filtered user agents we're doing the following: count( DISTINCT CASE WHEN user_agent NOT LIKE 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/% (KHTML, like Gecko) Chrome/% Safari/%' THEN user_agent END ) AS total_filtered_user_agents, ... instead of: SELECT count(DISTINCT user_agent) AS total_filtered_user_agents ... WHERE AND user_agent NOT LIKE 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/% (KHTML, like Gecko) Chrome/% Safari/%' Within the `WHERE` clause we are further adding `webrequest_source = 'text'` as discussed, which was suggested by WMF data engineering and meaning that we are not losing any any information, but rather that we are querying from a subset of information that included our original results. I'll update the numbers for March once the next DAG run is finished at the start of next week! TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE added a comment. Note that this task is dependent on whether a standardized system that would not require the published datasets is created. Such a system is discussed in T361214: Public dashboard process <https://phabricator.wikimedia.org/T361214>. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE removed AndrewTavis_WMDE as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360298: [Analytics] Public dashboard pilot
AndrewTavis_WMDE added a comment. Note that I've made T361214: Public dashboard process <https://phabricator.wikimedia.org/T361214> to explain our use case of a standardized public dashboard process :) TASK DETAIL https://phabricator.wikimedia.org/T360298 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION In T341330: [Analytics] Airflow implementation of unique ips accessing Wikidata's REST API metrics <https://phabricator.wikimedia.org/T341330> WMDE Analytics created its first Airflow DAG and the needed jobs for it. As a requirement for T360298: [Analytics] Public dashboard pilot <https://phabricator.wikimedia.org/T360298> it seems that another step would be needed in order to have the data be on a publicly available dashboard - specifically that we need to add the published datasets <https://analytics.wikimedia.org/published/datasets/> as a target of the jobs such that the data is saved to HDFS and in TSV format in a place where it can be ingested by a dashboarding software like Turnilo <https://wikitech.wikimedia.org/wiki/Analytics/Systems/Turnilo>. TASK DETAIL https://phabricator.wikimedia.org/T361203 WORKBOARD https://phabricator.wikimedia.org/project/board/6546/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T341330: [Analytics] Airflow implementation of unique ips accessing Wikidata's REST API metrics
AndrewTavis_WMDE added a comment. Merge request <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/631> has been brought in, and we've successfully deployed! An output from the new `wmde.wd_rest_api_metrics_monthly` table is: | month|total_user_agents|total_filtered_user_agents|total_ips| |--|-|--|-| |2024-02-01| 458| 424|14539| TASK DETAIL https://phabricator.wikimedia.org/T341330 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T341330: [Analytics] Airflow implementation of unique ips accessing Wikidata's REST API metrics
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T341330 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360298: [Analytics] Public dashboard pilot
AndrewTavis_WMDE renamed this task from " [Analytics] Public Superset dashboard pilot" to " [Analytics] Public dashboard pilot". AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360298 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360298: [Analytics] Public Superset dashboard pilot
AndrewTavis_WMDE added a comment. Post a large discussion about this in the `data-engineering-collab` channel on Slack, the general findings for this are: - The public Superset instance isn't suitable for this at this time and there's no time table for it to be (see above comments) - A suggestion of putting this information on Wikistats <https://stats.wikimedia.org/#/all-projects> was agreed to be too complex to setup and manage - We would need to use AQS 2 (Analytics Query Service) to make a service/API for this - An initial suggestion from WMDE to target Prometheus with the DAG was decided against - It is possible to push data to Prometheus, but there are many complications with this - A new suggestion is to leverage Turnilo <https://wikitech.wikimedia.org/wiki/Analytics/Systems/Turnilo> for this - There is a private instance at turnilo.wikimedia.org <https://turnilo.wikimedia.org/> - There are also public instances of this as seen at wiki-search-referrals.wmcloud.org <https://wiki-search-referrals.wmcloud.org/> - Wikitech docs for this can be found at wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/referrer_daily/Dashboard <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/referrer_daily/Dashboard> - The Turnilo dashboard is hosted on Cloud VPS <https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS> - The code for the Turnilo instance can be found at github.com/wikimedia/research-api-endpoint-template/turnilo-druid <https://github.com/wikimedia/research-api-endpoint-template/tree/turnilo-druid> - The way this would be achieved is that we would have the published datasets <https://analytics.wikimedia.org/published/datasets/> folder be another target of the DAG jobs, and we'd then ingest this data via the Turnilo instance This sounds like a good way forward, but the question of setting up the Turnilo instance and maintaining it then comes to mind. A big question is: how often are data pipelines supposed to be public, and would putting it all on a single Turnilo instance work well for our requirements? TASK DETAIL https://phabricator.wikimedia.org/T360298 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360298: [Analytics] Public Superset dashboard pilot
AndrewTavis_WMDE added a comment. Further checks on this: the dashboarding process for the public Superset seems to be based on a few preset databases that have the data from Wikimedia projects (see SQL Lab <https://superset.wmcloud.org/sqllab/>). As of now I'm doubting whether we'd be able to have active rights over one of these such that tables we'd generate in Airflow could be added to one and used for visualizations. I've asked in the WMDE data channel if there are people with domain knowledge for Graphite that could help with setting up a process where it would be one of the targets of the Airflow jobs. This to me seems more simple, with the end situation being that we use the main Superset instance for data processes that rely on the data lake/private data access, and then use Grafana for dashboards that are meant to be public facing. TASK DETAIL https://phabricator.wikimedia.org/T360298 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360298: [Analytics] Public Superset dashboard pilot
AndrewTavis_WMDE added a comment. Note that from the most recent discussions with WMF data engineering, there isn't a set workflow for getting information into a place where it can be accessed via the Public Superset instance. We would need to edit the DAG such that we include an export step for the data getting to a place where the public instance can access it. This would require some more research. Maybe another thing to consider is whether we'd prefer to have Graphite be the end export location for the data and then make a Grafana dashboard for this? Grafana does serve as the current public facing data dashboards for Wikidata, so it might make sense to leverage it more. TASK DETAIL https://phabricator.wikimedia.org/T360298 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T348999: Add linter and formatter to wmfdata-python (and link check)
AndrewTavis_WMDE added a comment. Exciting! I'll play around a bit towards the end of next week and send along a PR with the workflow, docs and changes given the local run warnings Will let you know if anything comes up before then. Have a nice weekend when it comes along! TASK DETAIL https://phabricator.wikimedia.org/T348999 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: nshahquinn-wmf, xcollazo, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, lbowmaker, BTullis, karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Mayakp.wiki, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T348999: Add linter and formatter to wmfdata-python (and link check)
AndrewTavis_WMDE added a comment. @nshahquinn-wmf, @xcollazo: checking in on this one again. I would have some time in the next two weeks or so to implement a PR workflow check of linting and code formatting. If folks are fine with Ruff <https://github.com/astral-sh/ruff> that'd be easiest on my end, but also happy to consider others! I'd also suggest adding in a `.vscode/extensions.json` file that would allow us to suggest VS Code extensions like the Ruff extension <https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff> so people are getting the appropriate warnings during editing. Included would of course also be some documentation on how to run the checks locally before a PR Let me know if this would be of interest on your all's end! TASK DETAIL https://phabricator.wikimedia.org/T348999 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: nshahquinn-wmf, xcollazo, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, Mohamed-Awnallah, Astuthiodit_1, lbowmaker, BTullis, karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Mayakp.wiki, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T341330: [Analytics] Airflow implementation of unique ips accessing Wikidata's REST API metrics
AndrewTavis_WMDE added a comment. Merge request for this has been sent and can be found here <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/631> :) Requested WMF's review on this first one, but we'll need to take over from there unless there are problems with it all. TASK DETAIL https://phabricator.wikimedia.org/T341330 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T341330: [Analytics] Airflow implementation of unique ips accessing Wikidata's REST API metrics
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T341330 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357697: Archive WMDE analytics Gerrit repositories
AndrewTavis_WMDE closed this task as "Resolved". AndrewTavis_WMDE claimed this task. AndrewTavis_WMDE added a comment. Fantastic! Thank you both again for the help here :) Really is great to be winding down these processes and moving onto the next steps! TASK DETAIL https://phabricator.wikimedia.org/T357697 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: hashar, brouberol, Manuel, Aklapper, AndrewTavis_WMDE, Baeisvar52braevincent, Danny_Benjafield_WMDE, Astuthiodit_1, YoutacrsVARs, MajaWiki82, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, ItamarWMDE, Mgagat, Akuckartz, Totolinototo3, Hassoonbxl, Zanziii, Sadisticturd, Nandana, Zylc, Reari, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Pppery, LawExplorer, _jensen, rosalieper, Scott_WUaS, Luke081515, Wikidata-bugs, aude, Dinoguy1000, Jdforrester-WMF, Mbch331, Jay8g, Krenair ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357697: Archive WMDE analytics Gerrit repositories
AndrewTavis_WMDE added a comment. Thank you both so much! Let me know when the GitHub repos have been deleted and I'll resolve this and update the greater epic TASK DETAIL https://phabricator.wikimedia.org/T357697 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: hashar, brouberol, Manuel, Aklapper, AndrewTavis_WMDE, Baeisvar52braevincent, Danny_Benjafield_WMDE, Astuthiodit_1, YoutacrsVARs, MajaWiki82, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, ItamarWMDE, Mgagat, Akuckartz, Totolinototo3, Hassoonbxl, Zanziii, Sadisticturd, Nandana, Zylc, Reari, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Pppery, LawExplorer, _jensen, rosalieper, Scott_WUaS, Luke081515, Wikidata-bugs, aude, Dinoguy1000, Jdforrester-WMF, Mbch331, Jay8g, Krenair ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357697: Archive WMDE analytics Gerrit repositories
AndrewTavis_WMDE edited projects, added Wikidata Analytics (Kanban); removed Wikidata Analytics. TASK DETAIL https://phabricator.wikimedia.org/T357697 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: hashar, brouberol, Manuel, Aklapper, AndrewTavis_WMDE, Baeisvar52braevincent, Danny_Benjafield_WMDE, Astuthiodit_1, YoutacrsVARs, MajaWiki82, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, ItamarWMDE, Mgagat, Akuckartz, Totolinototo3, Hassoonbxl, Zanziii, Sadisticturd, Nandana, Zylc, Reari, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Pppery, LawExplorer, _jensen, rosalieper, Scott_WUaS, Luke081515, Wikidata-bugs, aude, Dinoguy1000, Jdforrester-WMF, Mbch331, Jay8g, Krenair ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360436: [MSMF] Add upload file limit to Mismatch Finder documentation
AndrewTavis_WMDE added a comment. Per suggestion from @noarave I reran the curl command <https://github.com/wmde/wikidata-mismatch-finder/blob/main/docs/UserGuide.md#example-with-curl> with `-v` at the end for a verbose output. Of note is in the first line we have `Note: Unnecessary use of -X or --request, POST is already inferred.`. Aside from that, I still got an empty string `"message"` at the end and nothing indicating that the file size limit was exceeded. At the end the response is: { "message": "" * Connection #0 to host mismatch-finder.toolforge.org left intact }% TASK DETAIL https://phabricator.wikimedia.org/T360436 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: noarave, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Mattia_Capozzi_WMDE, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org