[Wikidata-bugs] [Maniphest] [Updated] T239565: Create reportupdater reports that execute SDC requests

2020-01-07 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T239565

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Milimetric, Maintenance_bot
Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, 
Mayakp.wiki, Ladsgroup, nettrom_WMF, Cparle, Nuria, Milimetric, mpopov, 
4748kitoko, darthmon_wmde, Nandana, JKSTNK, Akovalyov, Lahi, PDrouin-WMF, Gq86, 
E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, QZanden, Tramullas, 
Acer, LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Scott_WUaS, 
Susannaanas, JAllemandou, Jane023, terrrydactyl, Wikidata-bugs, Base, aude, 
Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, 
Steinsplitter, Mbch331, jeremyb, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, 
E.S.A-Sheild, Meekrab2012, joker88john, CucyNoiD, NebulousIris, Gaboe420, 
Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Af420, 
Darkminds3113, Bsandipan, Lordiis, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, 
WSH1906, Lewizho99, Maathavan
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T239565: Create reportupdater reports that execute SDC requests

2020-01-07 Thread gerritbot
gerritbot added a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T239565

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Milimetric, gerritbot
Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, 
Mayakp.wiki, Ladsgroup, nettrom_WMF, Cparle, Nuria, Milimetric, mpopov, 
4748kitoko, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, 
darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, 
JKSTNK, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, PDrouin-WMF, Gq86, Af420, E1presidente, 
Darkminds3113, Anooprao, SandraF_WMF, Bsandipan, Lordiis, GoranSMilovanovic, 
Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, Tramullas, Acer, LawExplorer, 
Salgo60, WSH1906, Lewizho99, Maathavan, Silverfish, _jensen, rosalieper, 
Scott_WUaS, Susannaanas, JAllemandou, Jane023, terrrydactyl, Wikidata-bugs, 
Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, 
Steinsplitter, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T239565: Create reportupdater reports that execute SDC requests

2019-12-19 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T239565

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Milimetric, Maintenance_bot
Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, 
Mayakp.wiki, Ladsgroup, nettrom_WMF, Cparle, Nuria, Milimetric, mpopov, 
4748kitoko, darthmon_wmde, Nandana, JKSTNK, Akovalyov, Lahi, PDrouin-WMF, Gq86, 
E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, QZanden, Tramullas, 
Acer, LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Scott_WUaS, 
Susannaanas, JAllemandou, Jane023, terrrydactyl, Wikidata-bugs, Base, aude, 
Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, 
Steinsplitter, Mbch331, jeremyb, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, 
E.S.A-Sheild, Meekrab2012, joker88john, CucyNoiD, NebulousIris, Gaboe420, 
Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Af420, 
Darkminds3113, Bsandipan, Lordiis, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, 
WSH1906, Lewizho99, Maathavan
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T239565: Create reportupdater reports that execute SDC requests

2019-12-19 Thread gerritbot
gerritbot added a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T239565

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Milimetric, gerritbot
Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, 
Mayakp.wiki, Ladsgroup, nettrom_WMF, Cparle, Nuria, Milimetric, mpopov, 
4748kitoko, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, 
darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, 
JKSTNK, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, PDrouin-WMF, Gq86, Af420, E1presidente, 
Darkminds3113, Anooprao, SandraF_WMF, Bsandipan, Lordiis, GoranSMilovanovic, 
Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, Tramullas, Acer, LawExplorer, 
Salgo60, WSH1906, Lewizho99, Maathavan, Silverfish, _jensen, rosalieper, 
Scott_WUaS, Susannaanas, JAllemandou, Jane023, terrrydactyl, Wikidata-bugs, 
Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, 
Steinsplitter, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T239565: Create reportupdater reports that execute SDC requests

2019-12-18 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T239565

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Milimetric, Maintenance_bot
Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, 
Mayakp.wiki, Ladsgroup, nettrom_WMF, Cparle, Nuria, Milimetric, mpopov, 
4748kitoko, darthmon_wmde, Nandana, JKSTNK, Akovalyov, Lahi, PDrouin-WMF, Gq86, 
E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, QZanden, Tramullas, 
Acer, LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Scott_WUaS, 
Susannaanas, JAllemandou, Jane023, terrrydactyl, Wikidata-bugs, Base, aude, 
Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, 
Steinsplitter, Mbch331, jeremyb, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, 
E.S.A-Sheild, Meekrab2012, joker88john, CucyNoiD, NebulousIris, Gaboe420, 
Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Af420, 
Darkminds3113, Bsandipan, Lordiis, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, 
WSH1906, Lewizho99, Maathavan
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T239565: Create reportupdater reports that execute SDC requests

2019-12-12 Thread gerritbot
gerritbot added a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T239565

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Milimetric, gerritbot
Cc: Abit, Ramsey-WMF, kzimmerman, Addshore, matthiasmullie, gsingers, 
Mayakp.wiki, Ladsgroup, nettrom_WMF, Cparle, Nuria, Milimetric, mpopov, 
4748kitoko, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, 
darthmon_wmde, Meekrab2012, joker88john, DannyS712, CucyNoiD, Nandana, 
NebulousIris, JKSTNK, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, 
Giuliamocci, Adrian1985, Cpaulf30, Lahi, PDrouin-WMF, Gq86, Af420, 
E1presidente, Darkminds3113, Anooprao, SandraF_WMF, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, Tramullas, 
Acer, LawExplorer, Salgo60, WSH1906, Lewizho99, Maathavan, Silverfish, _jensen, 
rosalieper, Scott_WUaS, Susannaanas, JAllemandou, Jane023, terrrydactyl, 
Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, 
Fabrice_Florin, Raymond, Steinsplitter, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T239565: Create reportupdater reports that execute SDC requests

2019-12-10 Thread matthiasmullie
matthiasmullie added a comment.


  I'm not really sure what number we want to go with, but I can probably help 
clarify what kind of data is in which db tables (and what numbers derived from 
those actually mean)
  
  So, it seems we have 2 completely separate definitions of "structured data":
  
  1. MediaInfo entities created for file pages, with captions and/or statements
  2. Existing "entities" (Wikidata or MediaInfo) pulled in via Lua to enrich 
(file) pages (e.g. given that we know the artwork, we can pull in author 
information etc.)
  
  And possibly, since IMO both are a valid definition of "structured data":
  
  3. A combination of both: files with either a MediaInfo entity and/or other 
(Wikidata/MediaInfo) entities' information included via Lua
  
  **1. MediaInfo entities created for file pages, with captions and/or 
statements: 3 082 976**
  
  We can query for it by counting the amount of 'mediainfo' slots & excluding 
deleted pages & empty content:
  
SELECT COUNT(DISTINCT page_id)
# page excludes deleted pages (which are in archive)
FROM page
# joining on page_latest - we only care about most recent
INNER JOIN slots ON slot_revision_id = page_latest
# mediainfo slot must contain actual content
INNER JOIN content ON slot_content_id = content_id AND content_size > 122
INNER JOIN slot_roles ON role_id = slot_role_id AND role_name = 'mediainfo';
  
  //AFAICT, there is no easy way to break up stats about MediaInfo entities 
more (e.g. "how many have only captions", "how many have X number of 
statements", ...) - not with a simple query on the raw data anyway.
  The entity's actual data is in external store (just like any other wikitext 
page's content) in the form of a JSON blob (so would have to be deserialized)//
  
  **2. Existing "entities" (Wikidata or MediaInfo) pulled in via Lua to enrich 
file pages: 7 936 829**
  
  That data lives in `wbc_entity_usage`, which is a table much like 
`categorylinks` and `templatelinks`.
  It does not contain all existing entities (and their labels etc), it's just a 
place to store the relationship of entities and other pages these entities are 
used on (via Lua)
  See T231952#5717638  & 
Wikibase/Schema/wbc_entity_usage on mw.org 
 for more 
details on this table and what data is holds.
  
  Something like this should give us the total amount of file pages that are 
including Wikidata/MediaInfo data via Lua (overwhelming majority is Wikidata 
entities - it's not been possible to fetch MediaInfo entities via Lua until a 
month ago)
  
SELECT COUNT(DISTINCT page_id)
# page excludes deleted pages (which are in archive)
FROM page
INNER JOIN wbc_entity_usage ON eu_page_id = page_id
# only include file pages and non-sitelink usage
WHERE page_namespace = 6 AND eu_aspect != 'S';
  
  Note that we might want to exclude Wikibase sitelinks usage 
 (recorded in this table with 
eu_aspect = 'S').
  While it's useful linking of data, I'm not these should be considered 
"structured data" usage.
  Anyway, their usage is negligible anyway: excluding those (`eu_aspect != 
'S'`) returns a result of **7 935 849** (or only 980 files that *only* have a 
wikidata sitelink)
  
  //It is somewhat possible to break down stats further (e.g. how many are 
including a label vs how many are including statements), but not in too much 
detail (e.g. `L` would be unreliable to figure out how much it's used 
in a particular language, because that language could also be covered by `L`//
  
  //FYI: there are about ~3M more (non-file) pages with structured data via 
Lua, mostly category pages.//
  
  **3. A combination of both: 10 440 129**
  
  This is pretty much just a combination of the above numbers, except that 
there is some overlap: entities could have MediaInfo entities as well as other 
data from Wikidata (and having MediaInfo entities is likely going to make it 
easier to fetch other related info, so I'd expect overlap to grow over time)
  Currently, there’s an overlap of (only) ~0.58M files (that have both an own 
MediaInfo entity, and existing Wikidata/MediaInfo usage via Lua)
  
  A simple union for both of the above queries should get us that data:
  
SELECT COUNT(*) FROM (
SELECT DISTINCT page_id
FROM page
INNER JOIN slots ON slot_revision_id = page_latest
INNER JOIN content ON slot_content_id = content_id AND content_size > 
122
INNER JOIN slot_roles ON role_id = slot_role_id AND role_name = 
'mediainfo'
UNION
SELECT DISTINCT page_id
FROM page
INNER JOIN wbc_entity_usage ON eu_page_id = page_id
WHERE page_namespace = 6
) AS t;

TASK DETAIL
  https://phabricator.wikimedia.org/T239565

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/