[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper,

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE,

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that MR#700 has been opened that has the work for this :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE,

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE,

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. See T362849_wd_item_sitelink_segments.ipynb for the work to derive the segments :)

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE,

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Ok, so the new numbers after the change in scope for the max `2024-04-15` snapshot are: items_with_sitelinks: 32,231,861 items_items_with_sitelinks_link_to: 2,980,388 all_other_items: 72,910,679 For documentation, the numbers for the

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-25 Thread Manuel
Manuel updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE, Manuel Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE,

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-25 Thread Manuel
Manuel updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE, Manuel Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE,

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-25 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. See {https://phabricator.wikimedia.org/T363451} for the task about bringing back the partition (hopefully via another job). I added a bit about whether we want to maybe turn this job on when WMDE needs historical data. Let me know what you all think on that

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-24 Thread Manuel
Manuel added a comment. About the missing revision history: - Did I understand correctly that we do not have any kind of complete edit history for Wikidata on the data lake? If so, we will need to find a solution for this, as my assumption is that we will need this kind of information

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-24 Thread Manuel
Manuel updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE, Manuel Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE,

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-24 Thread Manuel
Manuel added a comment. Thank you for digging into this: > I'll begin work on a DAG based on wmf.wikidata_entity Sounds good to me! I changed the description accordingly. > Are we fine with a weekly DAG? Sure! TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Another note on this is: if we don't expect to be needing a Wikidata partition of `wmf.mediawiki_wikitext_history` for other tasks, then we could work directly from the XML dump for the data backdate. We wouldn't be able to leverage PySpark for the querying

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a subscriber: JAllemandou. AndrewTavis_WMDE added a comment. Thanks for all of the information, @mpopov! I talked this over in my bi-weekly with @JAllemandou, and would like to bring some further context to this particular situation :) The go to table for this

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414,

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper,

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-22 Thread Manuel
Manuel added a comment. Hi @mpopov, thank you for your input! This confirms what I mentioned already, @AndrewTavis_WMDE: For a similar metric our legacy systems were set up to re-compute the entire history with each new snapshot. This would be the easiest solution in this case as well.

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-19 Thread mpopov
mpopov added subscribers: AndrewTavis_WMDE, mpopov. mpopov added a comment. @AndrewTavis_WMDE asked me for some thoughts/suggestions here :) I started typing out a DM reply but decided some of this stuff would be good to have on public record. > it's not normal that snapshots go

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-18 Thread Manuel
Manuel created this task. Manuel added projects: Wikidata, Wikidata Analytics (Kanban). Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Purpose --- As Wikidata Product Managers, we would like to understand how different segments of Wikidata's data developed