[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure
AndrewTavis_WMDE added a project: Epic. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BeautifulBold, Suran38, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE moved this task from In progress to Product verification on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment. @Manuel and @Lydia_Pintscher, just shared a folder with the two CSVs on Wolke. Let me know if there's anything else needed, and I will set a reminder that they should be deleted on my end in 89 days (they were generated yesterday). Sharing has been disabled on the directory, so if others need access, then let me know :) TASK DETAIL https://phabricator.wikimedia.org/T366621 WORKBOARD https://phabricator.wikimedia.org/project/board/6546/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE added a comment. Hi @MarcoSwart 👋 Thanks for the communication here :) I guess I'm a bit confused by how the other one would be used. You're roughly talking about: | word_that_is_missing_from_a_wiktionary | number_of_wiktionaries_that_do_have_it | | MOST_MISSING_WORD | 156 | | NEXT_MOST_MISSING_WORD | 155 | | ...| ... | | With that we're missing the `Wiktionary` column, so then editors wouldn't have the ability to easily know if their Wiktionary needed that word or not? Maybe it can be gotten from another part of the data process. Let me explain :) What's planned for this data process at this point is two outputs: - Missing Entries (I miss you ...) as described above <https://phabricator.wikimedia.org/T360296#9879652> - per Wiktionary what are the 1,000 most popular missing words - Most Popular - the most popular entries across all Wiktionaries Maybe Most Popular would serve your interests above? This would be a CSV with say the 10,000 or 100,000 or whatever you all would need most popular entries across all Wiktionaries. All of this updating on a daily basis. Would that work for you? Please let me know if I'm understanding correctly, by the way! Appreciate your feedback :) TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE added a comment. @Manuel, my assumption was that you could help any non-analytics PMs or go through the results with them as you have the needed access. Using Google for PII is not something we're supposed to do if it can be avoided, but I have no experience with Wolke. Please let me know if you'd like me to look into Wolke or send the files over Drive. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE added a comment. Talked further with WMF about this just now. One basic question for the end users: would it make it more convenient for you all if the exported datasets were per Wiktionary? There are two options here, with missing entries being used as an example: 1. We export one file that has all missing entries for all Wiktionaries - 188,000 rows x 3 columns - 188,000 rows = the 1,000 most popular missing entries for each Wiktionary (there are 188 in the data) - 3 columns - The Wiktionary - The word that's missing from it - The total of the other Wiktionaries that have it 2. We export 188 CSVs, each of length 1,000 with the above columns Reason for option 1 or 2 and not both is that we don't want to keep the data in duplicate both in the published datasets directories and in the data lake. Option 1 is easier, but we can figure out Option 2 if that would be your all's preference. So the baseline question for each option is: 1. If you're only working on one Wiktionary, would you be ok with getting it as a subset from the whole dataset? 2. If you're working on more than one Wiktionary, would you be ok with getting the separate datasets and combining them? Let us know which would be better for your workflow! And thanks for your continued interest in this. Great talks today about the various options we have 😊 TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE added a comment. I can also prepare a notebook with quick functions to load and explore the data, if that would make the option I suggested a bit easier. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE added a comment. > Would it be possible to send us a spreadsheet (and schedule it for deletion after 90 days)? I'd prefer to transfer via the servers if possible given the comment here <https://phabricator.wikimedia.org/T358311#9820450> from WMF Engineering. I'm also not sure how to schedule a spreadsheet for deletion, but can look into this if this would be preferable. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE added a comment. Base queries for all of this are ready :) Let me know on the above and I'll finalize them. Re how to send the files: my suggestion would be that I put them into my `stat1010` and then @Manuel can migrate them to his. From there I'll delete my copy and he can delete his once he and @Lydia_Pintscher are done checking them. Suggesting this as I can't move the files into another users' directory myself. Generally from one's root the command would be: # The last . is the current directory, and autocomplete should work. cp ../andrewtavis-wmde/wikidata/2024/T366621_rest_api_user_agents/FILE_NAME.csv . Let me know how this sounds! TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE added a comment. Checking on the numbers here really quick: the request is for the top `1000` user agents and then a sample of `1000` user agents, but the total is `1221`. Would an ordered list of all of them make more sense as we're talking a sample of 82%? There really isn't going to be a difference between the first two sets. An ordered list of all of them and another ordered list of all who were active in May and not in April? TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations
AndrewTavis_WMDE added a comment. Status is open as T364045 <https://phabricator.wikimedia.org/T364045> has been resolved :) TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations
AndrewTavis_WMDE changed the task status from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE added a comment. Unstalled as the plan for the data export has been approved in T365699 <https://phabricator.wikimedia.org/T365699> :) TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE changed the task status from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time
AndrewTavis_WMDE added a comment. Unstalled as the table has been created :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T343019: [EPIC] Segments of Wikidata's data over time [up to milestone 3]
AndrewTavis_WMDE changed the status of subtask T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T343019 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel, AndrewTavis_WMDE Cc: Aklapper, Manuel, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time
AndrewTavis_WMDE changed the task status from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE added a comment. Hi @MarcoSwart, sorry for changing the status without explanation. Was in a meeting and we were moving things around, but obviously context should have been added. This is stalled for now as we're waiting for WMF to advise us on the best way forward on migrating data from MariaDB to HDFS. The data processes we need to use for this cannot be run directly on MariaDB in a sustainable way that's in line with long term supported data practices, so first we need to migrate the data to the private data cluster, and then our normal workflows take over. This migration is non-standard, and they're looking into how best to support/guide us. By the sounds of it they're allotting the budget of a Staff Engineer to help with this soon. The data pipeline and the needed queries are basically done, so what we're waiting on is the process to migrate the data as a final step. From there we'll get the process up and running such that the data at the very least will be exported to the published datasets folders <https://analytics.wikimedia.org/published/datasets/> on a daily basis. As far as a dashboard is concerned, we're also in the midst of looking into a more sustainable solution for presenting information to the public. This is similarly tied to WMF's efforts on this front. For now we hope that an export to the published datasets will suffice such that the community can then take the data and model it as they wish. I'd be happy to help people with simple Python scripts to get the data loaded into data frames and more workable states once that's done! I'd put an estimate on the data process as end of month if things work out with WMF's resources, but if not then it's August as I'm away for most of July (no later than that though). Please let me know if you have further questions, and again sorry for the confusion! TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations
AndrewTavis_WMDE added a comment. Note, work that will unblock this task is being done in T364045: [Bug?] Can't find wikidatawiki on wmf.mediawiki_wikitext_history <https://phabricator.wikimedia.org/T364045>. TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024
AndrewTavis_WMDE added a comment. Quick note on this, in discussion, something to check as well would be those user agents that were present in May 2024, but were not active in April 2024 :) TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T332899: [EPIC] Migrate selected R-based Wikidata products
AndrewTavis_WMDE changed the status of subtask T360296: [Analytics] Implement data process to identify missing Wiktionary entries from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T332899 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel, AndrewTavis_WMDE Cc: MarcoSwart, Lydia_Pintscher, JeanFred, AndrewTavis_WMDE, Pamputt, Aklapper, Manuel, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE added a comment. There's now a draft for the DAGs <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/725/diffs#96f15bf21ce9c18b6638c53402e35a2654aeeff6> open on GitLab. There's still lots to do as WMF wants to sync on suggestions they'll give me on how to do the MariaDB to HDFS data transfer, but the DAGs are mapped out and the hive queries they're calling have been prepared :) TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. Thanks so much for the support here, @BTullis! I'll update the epic <https://phabricator.wikimedia.org/T356618> with this being done. So close to being finished with all this :) TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: BTullis, AndrewTavis_WMDE Cc: BTullis, brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE added a comment. wmde/analytics/hql/airflow_jobs/wiktionary_cognate <https://gitlab.wikimedia.org/repos/wmde/analytics/-/tree/main/hql/airflow_jobs/wiktionary_cognate?ref_type=heads> on GitLab now has all the needed queries to for missing entries, most popular entries and comparing Wiktionaries. Was easier to write all three at once rather than lose some context later. Note that these are Hive queries as the goal is to I've discussed the further infrastructure needs at length with a data engineer at WMF, with the steps from here being: - I need to write a PySpark job that gets the `cognate_wiktionary` tables from the MariaDB instance and puts them on HDFS on a daily basis - This will go in wmde/analytics/spark <https://gitlab.wikimedia.org/repos/wmde/analytics/-/tree/main/spark?ref_type=heads> - Note that this is relatively uncharted territory (it can be done with current long term supported tools, but will be a new type of job) - From there we need a DAG that will eventually include all three processes discussed above - The reason we'll do a DAG for all three is that each will rely on the PySpark job to migrate the data from MariaDB to HDFS - We can start with just doing missing entries as an output for this task, and then other tasks can add the other two to the DAG TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: July 2024)
AndrewTavis_WMDE added a comment. Table has been updated with the new data from the most recent DAG run. Lots more user agents - almost a 3x increase. Noting this for now as maybe grounds for further investigation later, but IPs are also increasing (just not by as much). Note that we need to do some work on the `wmde.wd_rest_api_metrics_monthly` table at this point as directed in T365699: Published datasets data release request for Wikidata REST API metrics <https://phabricator.wikimedia.org/T365699>. Specifically, as of now we have all the the outputs of this table being `bigint` values. As this type of data is classified under users, we need to assure that data points less than 25 are recorded as `"<25"`. All columns will thus need to be converted over to being strings. Goal on this would be of course to not have any data loss in this process. The query has already been updated locally, and will be changed with the next deploy to add the published datasets as a deployment target (both the DAG and the jobs need to be updated at this point). As the DAG has already been ran for this month, I'm going to update the jobs now. Another thing to consider is whether in this update we can also backdate the table with the information from months before the DAG was functional. TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: July 2024)
AndrewTavis_WMDE renamed this task from "[Analytics] Monthly repeating tasks (next: June 2024)" to "[Analytics] Monthly repeating tasks (next: July 2024)". AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)
AndrewTavis_WMDE closed this task as "Resolved". AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg, karapayneWMDE, Aklapper, Lucas_Werkmeister_WMDE, AndrewTavis_WMDE, Michael, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Djdungti, LawExplorer, _jensen, rosalieper, Scott_WUaS, Izno, Nastoshka, Wikidata-bugs, aude, Dinoguy1000, scfc, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T351070: [EPIC] Clean up Wikidata Grafana cronjobs
AndrewTavis_WMDE closed subtask T351072: Remove the WDCM clone (stats1007) as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T351070 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Michael, Manuel, Aklapper, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Lydia_Pintscher, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE closed subtask T351072: Remove the WDCM clone (stats1007) as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)
AndrewTavis_WMDE added a comment. Perfect, @Lucas_Werkmeister_WMDE! Glad to have this all cleared up :) TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg, karapayneWMDE, Aklapper, Lucas_Werkmeister_WMDE, AndrewTavis_WMDE, Michael, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Djdungti, LawExplorer, _jensen, rosalieper, Scott_WUaS, Izno, Nastoshka, Wikidata-bugs, aude, Dinoguy1000, scfc, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE closed this task as "Resolved". AndrewTavis_WMDE claimed this task. AndrewTavis_WMDE added a comment. Sounds good to me! :) Thanks for the help here, @Lucas_Werkmeister_WMDE and @BTullis! TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T321666: Wiktionary Cognate Dashboard is not accessible [timeboxed 0.5 days]
AndrewTavis_WMDE added a comment. Hi @Bicolino34 👋 Thanks for reaching out :) We are still working on tasks related to this dashboard - at least bringing back some of the data processes. TASK DETAIL https://phabricator.wikimedia.org/T321666 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Bicolino34, Lepticed7, XANA000, VIGNERON, AndrewTavis_WMDE, Lydia_Pintscher, WMDE-leszek, Pamputt, MarcoSwart, GoranSMilovanovic, Otourly, ItamarWMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Akuckartz, Dringsim, Nandana, Lahi, Gq86, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Thibaut120094, Wikidata-bugs, aude, Darkdadaah, Mbch331, Ltrlg ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)
AndrewTavis_WMDE added a comment. Moving this to verification given the work in T364965 <https://phabricator.wikimedia.org/T364965>. Thanks for all of this, @Lucas_Werkmeister_WMDE! Maybe we can resolve this and leave T364965 <https://phabricator.wikimedia.org/T364965> until `stat1007` is deprecated, or resolve both? TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg, karapayneWMDE, Aklapper, Lucas_Werkmeister_WMDE, AndrewTavis_WMDE, Michael, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Djdungti, LawExplorer, _jensen, rosalieper, Scott_WUaS, Izno, Nastoshka, Wikidata-bugs, aude, Dinoguy1000, scfc, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE added a comment. None of the files listed in your comment above <https://phabricator.wikimedia.org/T364965#9838579> look like things we should worry about, @Lucas_Werkmeister_WMDE. Similarly that there's a different commit for this, as to my knowledge `stat1005` was the main server for the related work. So sounds like our work for this is finalized? Do we want to resolve this or keep this open until `stat1005` is fully deprecated? TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE added a comment. I've been asking around about the data source and connecting the tables and have yet to get concrete answers. Based on general assumptions of the names of the tables/columns though, the path forward for getting missing entries for a Wiktionary will be to: - Start with `cognate_wiktionary.cognate_sites` - Join to `cognate_wiktionary.cognate_pages` (`cognate_sites.cgsi_key = cognate_pages.cgpa_site`) - Join to `cognate_wiktionary.cognate_titles` (`cognate_pages.cgpa_title = cognate_titles.cgti_raw_key` - note the use of `cgti_raw_key`) - Use `cognate_titles.cgti_normalized_key` as a means of checking which Wiktionary entries are shared/missing across projects Putting this here as documentation :) TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred, Lydia_Pintscher, MarcoSwart, Manuel, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Dinoguy1000, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE added a comment. Thanks for taking care of this, @Lucas_Werkmeister_WMDE! We'll be able to close both this and T351072 <https://phabricator.wikimedia.org/T351072> after Tuesday next week if/when the Puppet change is deployed :) TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, Kent7301, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365457 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE added a comment. @BTullis, checking in on this as your help in T358311 <https://phabricator.wikimedia.org/T358311> reminded me as it's all related to the same user. Would you be able to remove the `statistics/manifests/wmde/wdcm.pp` file and any related processes (including now stat1011) as well? TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. Thank you, @BTullis! Ya I wasn't happy with the solution either. Appreciate your willingness to help! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: BTullis, brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. I'm realizing also that I don't have admin rights and thus can't move files to your directory. I'll copy these files over to my directory, download them and send you a link to a zipped directory on Google Drive once we have the above figured out. TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. Hi @Manuel, checking further as it's still not clear what you'd like. The double except is confusing. I'll only transfer files from `stat1005`, and could you answer the following questions: 1. Do you want **data files** (.csv, .tsv, etc) __before 2020__? (assumption no) 2. Do you want **data files** __after 2020__? (as of now unclear) 3. Do you want **non data files** (.py, .R, etc) __before 2020__? (as of now unclear) 4. Do you want **non data files** __after 2020__? (assumption yes) TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. Hi @Manuel - sending along a summary of what I'll be getting for you: == stat1004 == Jul 25 2020 Analytics Jun 23 2020 Experiments Jul 25 2020 wdUsagePerPage == stat1005 == All non data files == stat1007 == Aug 23 2020 Analytics Jan 27 2020 Experiments Aug 23 2020 RScripts == stat1008 == Oct 11 2021 Analytics Jun 23 2020 R === HDFS 2021-11-02 17:37 /user/goransm/dewiki_revisions 2021-04-11 16:51 /user/goransm/wdtranslationsb No other files, as everything after 2020 is a data file or ORES related (this is coming in the stat server files anyway) TSVs, CSVs and data file types will not be included in the transfer. Out of convenience, I'm going to transfer the files into your directory on the given server. TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. Ok then! So the checks of the files above is complete as shown by its status. General summaries of each stat machine and HDFS are provided under the subsections above. `stat1005` has some files that @Manuel may find interesting given that they're for prior tasks of his. Any queries that looked like they could be interesting or were in files whose names sounded interesting but the query ended up not being interesting are printed above for documentation. Overall I can say that anything from the above would be easier to work from scratch via the docs and checking with WMDE engineers or WMF Data Engineering/Analytics rather than going through and re-implementing it. I personally would not keep anything, and will delete the files I copied over to my `stat1005` once this is closed :) Thanks again @JAllemandou for the file lists, and thanks @brouberol for the ping! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)
AndrewTavis_WMDE added a comment. So basically removing the wdcm.pp related file on GitHub and its Puppet workflows will close both tasks :) TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg, karapayneWMDE, Aklapper, Lucas_Werkmeister_WMDE, AndrewTavis_WMDE, Michael, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Djdungti, LawExplorer, _jensen, rosalieper, Scott_WUaS, Izno, Nastoshka, Wikidata-bugs, aude, Dinoguy1000, scfc, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)
AndrewTavis_WMDE added a comment. Ah looking at this, I'm realizing I restated myself as the work that's left in T364965: stat1007 to stat1011 migration pipeline output check <https://phabricator.wikimedia.org/T364965> is a duplicate of what we want to do here :) TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg, karapayneWMDE, Aklapper, Lucas_Werkmeister_WMDE, AndrewTavis_WMDE, Michael, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Djdungti, LawExplorer, _jensen, rosalieper, Scott_WUaS, Izno, Nastoshka, Wikidata-bugs, aude, Dinoguy1000, scfc, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)
AndrewTavis_WMDE added a comment. Hey @Arian_Bozorg 👋 Yes, we do still need to check this out. I was thinking that @Lucas_Werkmeister_WMDE and I could discuss this when we chat about what else is needed in T364965: stat1007 to stat1011 migration pipeline output check <https://phabricator.wikimedia.org/T364965>. In that one we've confirmed now that the data is coming in from stat1011, so at this point it'd be good to delete the statistics/manifests/wmde/wdcm.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/wdcm.pp> and also remove it's workflow from Puppet (just not quite sure if I have access and how to go about the Puppet work). I'm hopeful that another 25min call would be enough to get the work done for both tasks and I can document for my learning/our processes and report back? Let me know if sometime later if the week could work for this! TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg, karapayneWMDE, Aklapper, Lucas_Werkmeister_WMDE, AndrewTavis_WMDE, Michael, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Djdungti, LawExplorer, _jensen, rosalieper, Scott_WUaS, Izno, Nastoshka, Wikidata-bugs, aude, Dinoguy1000, scfc, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365457 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Making this task as a means of saving that there is still work to be done to close out the Purdue Data Mine program. Specifically all pull requests in the repo <https://github.com/Wikidata/Purdue-Data-Mine-2024/pulls> need to be brought in, and the resulting mismatches should be uploaded to Mismatch Finder using upload_mismatches.py <https://github.com/Wikidata/Purdue-Data-Mine-2024/blob/main/upload_mismatches.py>. TASK DETAIL https://phabricator.wikimedia.org/T365457 WORKBOARD https://phabricator.wikimedia.org/project/board/6546/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE added a comment. ⚠️ Currently WIP ⚠️ === Going through the files sent by @JAllemandou above <https://phabricator.wikimedia.org/T358311#9648470>. This message will be saved as I go so that I don't loose my progress 😊 If I do find something worth documenting, then I'll also include it below so that this task can serve as a reference for later if need be. stat1004 All of the files are not worth keeping. See descriptions and reasoning below: total 28 Analytics └─ NewEditors └─ adHoc (nothing of interest) └─ Compaigns └─ 2019 and 2020 email compaigns with R based analysis (nothing of interest) └─ WDCM └─ WDCM_Output └─ Lots directories of CSVs (nothing of interest) └─ WDCM_Scripts └─ R based scripts that would be archived on Gerrit if they were ever in production (nothing of interest) └─ Wikidata └─ misc └─ Some ad hoc work (nothing of interest) └─ WD_languagesLandscape └─ R based scripts that would be archived on Gerrit if they were ever in production (nothing of interest) └─ WD_ORES_ItemQuality (nothing of interest given Lift Wing migration) └─ WD_UsageCoverage └─ R and Python scripts that are doubtless versions of the WDCM UsageCoverage dashboard that's archived on Gerrit (nothing of interest) Experiments └─ Empty _miscWMDE └─ summerBannerCampaign2017_DataOUT └─ TSV files (nothing of interest) └─ TWLBanner_2017 └─ TSV files and simple HQL queries from `wmf.webrequest` for banner campaigns hits (nothing of interest, easy to learn as needed) Example query: SELECT count(*) FROM wmf.webrequest WHERE uri_host = 'de.wikipedia.org' AND uri_query LIKE "$/wiki/Wikipedia:Umfragen/Technische_Wünsche_2017$" AND http_method = 'GET' AND is_pageview = TRUE AND YEAR = 2017 AND MONTH = 6 AND DAY = 1 and HOUR = 20; └─ TWLBanner_2017_DataOUT └─ TSV files (nothing of interest) _miscWMDE_1004 └─ TWLBanner_2017 └─ One HQL and one TSV file that are similar to the above (nothing of interest) R └─ x86_64-pc-linux-gnu-library (nothing of interest) Research └─ DydimusZengenene └─ Note: work to support a researcher (nothing of interest) └─ _analytics └─ _data └─ DydimusZengenene.Rproj └─ ParseTargetPage.R wdUsagePerPage └─ Related to the percentage usage dashboard, so would be archived on Gerrit if they were ever in production (nothing of interest) stat1005 total 964 Analytics └─ BotEdits_perProject.ipynb └─ crontabstat1005.txt └─ DataModelTerms_20210228_Updates.ipynb └─ dewiki_NewEds_2021.ipynb └─ QCF_M2_Test.ipynb └─ QuratorCuriousFacts_Separators.ipynb └─ Qurator_M1.ipynb └─ R └─ snapshot_query.hql └─ Untitled1.ipynb └─ untitled1.txt └─ Untitled2.ipynb └─ Untitled3.ipynb └─ Untitled4.ipynb └─ Untitled5.ipynb └─ Untitled.ipynb └─ untitled.txt └─ venv └─ wd_cluster_fetch_items_M2.ipynb └─ wd_cluster_fetch_items_M3.ipynb └─ WDCM_ETL_OTHER_TEST.ipynb └─ WDCM_Statements_Test.ipynb └─ WD_HumanEditsPerClass_RevisionTags.ipynb └─ WD_Inequality_Intake.ipynb └─ WD_Languages_Datamodel_CollectInit.ipynb └─ WD_Languages_Datamodel_EXP.ipynb └─ WD_MonthlyEditors.ipynb └─ WD_Sitelinks_WDAHP_202108.ipynb └─ wd_statements_HiveQL_Query.hql └─ WD_Translations.ipynb └─ WHEIP_exps.ipynb └─ wikidata_analytics_examples └─ WikidataRevisions_November2020.csv └─ stat1006 total 48 misc_projects └─ myTemp └─ NewEds └─ nohup.out └─ R └─ RPckg └─ RScripts └─ sqlIn └─ sqlOut └─ WDCM_Credentials └─ WDCM_DataIN └─ WDCM_DataOUT └─ WDCM_sql └─ stat1007 total 28 Analytics └─ crontabstat1007.txt └─ Experiments └─ Python3 └─ R └─ RScripts └─ venv └─ stat1008 total 16 Analytics └─ R └─ renv └─ venv └─ stat1009 to
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE added a comment. Note that MR#700 <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/700> has been opened that has the work for this :) TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a comment. Note that MR#700 <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/700> has been opened that has the work for this :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE added a comment. Confirming that data's still coming in as well. @BTullis, what should we do about statistics/manifests/wmde/wdcm.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/wdcm.pp>? Remove the file? And could you also remove it from puppet entirely on stat1011 as well? Anything else? TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE added a comment. Quick note that the word used by @BTullis was `disabled` instead of `removed` for the stat1007 timers, so apologies if this caused some confusion. I figure not, but just wanted to be clear :) @BTullis, would you be able to check the journal for them and paste the output here so we can check it? On my end as well it seems like I can't access it. TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check
AndrewTavis_WMDE renamed this task from "stat1007 migration output check" to "stat1007 to stat1011 migration pipeline output check". TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364965: stat1007 migration output check
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata, Wikidata Dev Team. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Context --- Recently WMF has been migrating from legacy stat servers that are being deprecated - specifically stat1004, 1005, 1006 and 1007. WMDE has a few pipelines that were running on stat1007 that have since been migrated over to stat1011: - statistics/manifests/wmde/graphite.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/graphite.pp> - statistics/manifests/wmde/wdcm.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/wdcm.pp> The latter at first glance doesn't appear to do anything as it sets the environment variables and clones, but then the rest is `TODO`. The former is more expansive and leads in to our Graphite/Grafana workflows. Further directions -- > You should be able to find the required files and the clone of https://gerrit.wikimedia.org/g/analytics/wmde/scripts <https://gerrit.wikimedia.org/g/analytics/wmde/scripts> beneath `stat1011:/srv/analytics-wmde`. The assumption is that they're working, and the timers for stat1007 have been removed. Goals - Check the pipeline in statistics/manifests/wmde/graphite.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/graphite.pp> to assure that everything is working properly after the stat1007 -> stat1011 migration. TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMDE, BTullis, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: June 2024)
AndrewTavis_WMDE added a comment. Sheet updated with the numbers for April. Higher number of user agents, but lower IPs (but then IPs still much higher than Feb). TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: June 2024)
AndrewTavis_WMDE renamed this task from "[Analytics] Monthly repeating tasks (next: May 2024)" to "[Analytics] Monthly repeating tasks (next: June 2024)". AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm
AndrewTavis_WMDE added a comment. Hey @brouberol 👋 Just getting back from two weeks off today :) I'll check into this and get back to you all! Thanks for the ping! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations
AndrewTavis_WMDE renamed this task from "Generate historical weekly segments of Wikidata item sitelinks segmentations" to "Generate historical weekly segments of Wikidata item sitelink segmentations". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelinks segmentations
AndrewTavis_WMDE renamed this task from "Generate weekly historical segments of Wikidata item sitelinks segmentations" to "Generate historical weekly segments of Wikidata item sitelinks segmentations". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T363583: Generate weekly historical segments of Wikidata item sitelinks segmentations
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata, Wikidata Analytics (Kanban). Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Purpose --- In T362849: [Analytics] Segments of Wikidata's data over time <https://phabricator.wikimedia.org/T362849> we need to calculate historical segments of Wikidata's items based on their relation to sitelinks. Purpose from that ticket: > As Wikidata Product Managers, we would like to understand how different segments of Wikidata's data developed over time, so we can inform our projections. This task would encompass the historical data that's needed to achieve this. Scope - From T362849 <https://phabricator.wikimedia.org/T362849>: > How did the number of Items of the following types develop over time? > > A) Items that contain a sitelink to one of the Wikimedia projects (e.g. about a notable person) > B) Items that are needed to build A (used in A Items for example in a statement or reference; e.g. the non-notable father of that notable person) > C) All other Items - In order to do this, T363451: Add job to create Wikidata partition to wmf.mediawiki_wikitext_history <https://phabricator.wikimedia.org/T363451> was made to recreate the Wikidata partition of wmf.mediawiki_wikitext_history <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Mediawiki_wikitext_history> - Once this task is complete, work can then begin to use this partition to generate all data from when Wikidata was created to the most recent weekly data generated by the DAG created in T362849 <https://phabricator.wikimedia.org/T362849> Desired Output -- - Weekly stats of the number of Items in category A, B and C Acceptance criteria: [ ] Weekly historical breakdowns of populations A, B and C - These would be in the Data Lake and the published datasets --- **Information below this point is filled out by the Wikidata Analytics team.** General Planning Information is filled out by the analytics product manager. Assignee Planning - Information is filled out by the assignee of this task. Estimation -- Estimate: Actual: Sub Tasks - Full breakdown of the steps to complete this task: [ ] Step Data to be used --- See Analytics/Data_Lake <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake> for the breakdown of the data lake databases and tables. The following tables will be referenced in this task: - wmf.mediawiki_wikitext_history <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Mediawiki_wikitext_history> Notes and Questions --- Things that came up during the completion of this task, questions to be answered and follow up tasks: - Note TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a comment. See T362849_wd_item_sitelink_segments.ipynb <https://gitlab.wikimedia.org/repos/wmde/analytics/-/blob/main/tasks/wikidata/2024/T362849_wd_item_sitelink_segments/T362849_wd_item_sitelink_segments.ipynb?ref_type=heads> for the work to derive the segments :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a comment. Ok, so the new numbers after the change in scope for the max `2024-04-15` snapshot are: items_with_sitelinks: 32,231,861 items_items_with_sitelinks_link_to: 2,980,388 all_other_items: 72,910,679 For documentation, the numbers for the original Population B definition for the min `2024-02-26` snapshot were: items_with_sitelinks: 31,978,738 linked_to_items_with_sitelinks: 75,221,879 all_other_items: 242,565 Status on the rest of this: - The weekly DAG is written and further does include an export to the published datasets repo - I've also included the work for T361203 <https://phabricator.wikimedia.org/T361203> in this - We need to confirm the numbers above and the method that generates them - I'll then rewrite the DAG job that runs the query - Then testing, I'll need the table `wmde.wd_item_sitelink_segments_weekly` to be made in HDFS by an admin, and then we can go into production - Should all be done by Tuesday/Wednesday evening after I'm back in a few weeks depending on folks' availability - I'll make a new task for the historic data generation process, which will depend on T363451 <https://phabricator.wikimedia.org/T363451> TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs
AndrewTavis_WMDE added a comment. Moved this to `In progress` as I'm adding the job to export everything to the published datasets folder to the DAG as I work on the same for T362849 <https://phabricator.wikimedia.org/T362849>. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a comment. See {https://phabricator.wikimedia.org/T363451} for the task about bringing back the partition (hopefully via another job). I added a bit about whether we want to maybe turn this job on when WMDE needs historical data. Let me know what you all think on that :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a comment. Another note on this is: if we don't expect to be needing a Wikidata partition of `wmf.mediawiki_wikitext_history` for other tasks, then we could work directly from the XML dump for the data backdate. We wouldn't be able to leverage PySpark for the querying though, so I worry about how long all of this would take... TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE added a subscriber: JAllemandou. AndrewTavis_WMDE added a comment. Thanks for all of the information, @mpopov! I talked this over in my bi-weekly with @JAllemandou, and would like to bring some further context to this particular situation :) The go to table for this would be wmf.wikidata_entity <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Wikidata_entity> for the following reasons: - It has the `sitelinks` column for Population A above - It has the `claims` column for Population B above It thus has everything we need for the given task for future data. One change to the output for this though would be the frequency of the DAG, as `wmf.wikidata_entity` is a weekly data dump, so it'd make sense to do a weekly DAG. If we still want to do a monthly job, then the best option would be to do a DAG that runs on the first Monday of every month (in the docs for `wmf.wikidata_entity` it mentions the `2020-01-20` snapshot, which was a Monday). Now we get to the question of the historical data... This is a situation that cannot be solved at this time given the current makeup of the Data Lake. As mentioned on Mattermost: we currently do not have Wikidata as a partition within wmf.mediawiki_wikitext_history <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Mediawiki_wikitext_history>, so we do not have historical versions of Wikidata items with which we'd be able to rebuild the history. The assumption we're making on this is that the legacy version of these metrics was made using `wmf.mediawiki_wikitext_history` at a time when Wikidata was still an available partition. The change for removing Wikidata from the `wmf.mediawiki_wikitext_history` dump process was `2024-02` - see T357859 <https://phabricator.wikimedia.org/T357859> where ~12 of 25 days of the dump generation is for the Wikidata XML dump. This was slowing down metrics delivery for WMF Movements Insights. Steps forward on this: - I'll begin work on a DAG based on `wmf.wikidata_entity`, as even if we do get a Wikidata partition within `wmf.mediawiki_wikitext_history`, it would not be used for recent data updates - Are we fine with a weekly DAG? - A decision needs to be made on whether WMDE is requesting Wikidata data to again be an output in `wmf.mediawiki_wikitext_history` snapshot creation process - The preferred solution here would be to not revert the changes to T357859 <https://phabricator.wikimedia.org/T357859>, but rather make a new job that adds a new partition to the table via the Wikidata XML dump - Reason for this is to assure that WMF Movements Insights can maintain the current speed of delivery - @JAllemandou has said that bringing the Wikidata partition back is fine if we need it (again, preferably in the above way) - If the request is being made, a new task should be made for it - We'd then do what I'd argue would be a separate task whereby the new `wmf.mediawiki_wikitext_history` Wikidata parition would be used to recompute the historical populations above Let me know what thoughts are on the above! TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time
AndrewTavis_WMDE claimed this task. AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE added a comment. Summary on your end sounds great, @Ifrahkhanyaree_WMDE! 😊 Let me know if sending along some empty new item revisions from 2024 would be helpful :) TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE added a comment. Notebook with the work that was done for this is: wmde/analytics/tasks/product_platform/2024/T360761_empty_wikidata_items/T360761_empty_wikidata_items.ipynb <https://gitlab.wikimedia.org/repos/wmde/analytics/-/blob/main/tasks/product_platform/2024/T360761_empty_wikidata_items/T360761_empty_wikidata_items.ipynb>. Will update this if further work is needed :) TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE moved this task from Needs product input to Product verification on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment. Further insights on this, and moving it to `Product verification` at this point :) I've now changed the query to a span of bytes that would be allowable for something to be empty. I added 10 bytes to the calculated max for `170`, but also tried with `180` and `190` and the trend of empty on first revision items dropping off is maintained. Basic finding: it used to be way more common, but still does happen today New query is the following: SELECT DISTINCT event_user_text AS editor, substring(event_timestamp, 1, 7) AS event_year_month, page_title AS created_empty_qid FROM wmf.mediawiki_history WHERE wiki_db = 'wikidatawiki' AND page_namespace_is_content = True AND snapshot = '2024-03' AND event_entity = 'revision' AND event_type = 'create' AND page_revision_count = 1 -- Factor in bytes that are within a range small enough to be an empty first edit. AND 148 < revision_text_bytes AND revision_text_bytes < 170 ; Task 1.1 - Number of Items in population A that were created empty: `5,075,471` Task 1.2 - Number of editors who are creating empty items: `27,61` Of the above items, I did a test of `50,000` to see if they were empty on deletion using the `https://www.wikidata.org/wiki/Special:EntityData/` endpoint. `49,579` returned valid JSON responses, and of those `99.65%` were found to be empty. I also checked the empty item creation over time, with the following two plots coming based on the above definition of the population in the query (148-170 bytes being "empty"): F48099515: total_empty_qids_created_per_month_v3_definition.png <https://phabricator.wikimedia.org/F48099515> F48099542: total_empty_qids_created_per_month_in_2023_and_2024_v3_definition.png <https://phabricator.wikimedia.org/F48099542> Again, I also tried boosting the max byte sizes for `180` and `190` and the plots above were not noticeably different. Task 2 - Number of Items in population B that are currently deleted: `44,385` (`0.87%`) I switched around the 3.x tasks a bit with a focus on visualization, as as I said I basically wasn't seeing ones that were created empty and were still empty. Task 3.1 - no further edits ever on items that are not deleted: `0` (they all have at least one more edit) Query for this: WITH not_deleted_created_empty_qids_v3 AS ( SELECT DISTINCT page_title AS not_deleted_created_empty_qid FROM wmf.mediawiki_history WHERE wiki_db = 'wikidatawiki' AND page_namespace_is_content = True AND snapshot = '2024-03' AND event_entity = 'revision' AND event_type = 'create' AND page_revision_count = 1 -- Factor in bytes that are within a range small enough to be an empty first edit. AND 148 < revision_text_bytes AND revision_text_bytes < 170 AND page_is_deleted = False ) SELECT h.page_title AS not_deleted_created_empty_qid, count(h.revision_id) AS number_of_revisions FROM wmf.mediawiki_history AS h JOIN not_deleted_created_empty_qids_v3 AS e ON h.page_title = e.not_deleted_created_empty_qid WHERE h.wiki_db = 'wikidatawiki' AND h.page_namespace_is_content = True AND h.snapshot = '2024-03' AND h.event_entity = 'revision' AND h.event_type = 'create' GROUP BY h.page_title Task 3.2 - at least one additional edit (=the rest): `5,031,086` - Check: `5,031,086 + 44,385 = 5,075,471` New and hopefully a bit more helpful (my assumption) Task 3.3 - graphs of the number of edits the items have had F48100783: not_deleted_empty_on_creation_items_per_edit_amount_max_100_-_v3_definition.png <https://phabricator.wikimedia.org/F48100783> F48100788: number_of_revisions_on_empty_on_creation_items_v3_definition.png <https://phabricator.wikimedia.org/F48100788> Let me know if anything else would be helpful here, @Ifrahkhanyaree_WMDE! TASK DETAIL https://phabricator.wikimedia.org/T360761 WORKBOARD https://phabricator.wikimedia.org/project/board/6546/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, G
[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items
AndrewTavis_WMDE moved this task from In progress to Needs product input on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment. The thread on Mattermost <https://mattermost.wikimedia.de/swe/pl/gsr9b485x7geby79t4sg151j7c> for discussing this has a lot of comments on the data restrictions we're dealing with here because there is no text table for Wikidata in the Data Lake. A work around using `revision_text_bytes` to determine the minimum size that an item could be (i.e. = empty) has been used so far with okish results, but there are definitely drawbacks and it's not exact. What it is that I can say here is that: - There are lots of items being created empty (from one subset `3,540,260`) - They're not normally deleted (from the same subset only `0.95%` where) - It's usual that there are edits (I've yet to see an item that was created empty and is still empty, but please note that this is an eye test on ~30 items) Moving this to `Needs product input` for now. A basic thing that can be done that won't take too much time is that I can use a range instead of the case when for determining when a item is empty via the length of it's QID and the `revision_text_bytes` size. We would then not be getting empty on creation items 100% of the time, but I could also find the ratio and we could agree on what an acceptable margin of error would be (say `> 90%`). Time estimate on this is 1/2 a day. TASK DETAIL https://phabricator.wikimedia.org/T360761 WORKBOARD https://phabricator.wikimedia.org/project/board/6546/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org