A related question: how many WD statements are already part of a time series? Let's say this means properties qualified by point in time (P585 <https://www.wikidata.org/wiki/Property:P585>) where there are at least four other instances of that property with a point-in-time qualifier. This query <https://query.wikidata.org/#%23defaultView%3AAreaChart%0ASELECT%20%3Fst%20%3Fct%20%7B%0A%20%20FILTER%20%28%3Fst%20%3E%205%29%0A%20%20%7B%0A%20%20%20BIND%20%280%20AS%20%3Fct%29%0A%20%20%20BIND%20%280%20AS%20%3Fst%29%0A%20%20%7D%0A%20%20UNION%20%7B%0A%20%20%20%20SELECT%20%3Fst%20%28COUNT%28%2a%29%20as%20%3Fct%29%20%0A%20%20%20%20%7B%0A%20%20%20%20%20%20%3Fitem%20wdt%3AP585%20%3Fvalue%20%3B%20wikibase%3Astatements%20%3Fst%0A%20%20%20%20%7D%0A%20%20%20%20GROUP%20BY%20%3Fst%0A%20%20%20%20ORDER%20BY%20%3Fst%0A%20%20%7D%0A%7D> suggests it's roughly 800K statements in all; with around 400 outliers with over 400 such statements.
This is common enough, and for sufficiently high-interest/high-traffic entities, that it would be nice to have a more explicit way of handling this. One suggestion: a norm of having a single most-recent value, for each time-series property, and a time-series property-space exclusively used for historical values. This would support explicitly noting where a time series is intended, allow for cleaner edit histories for that work, allow for including other time-series data that is in active use on Wikipedia, and help optimize queries for the most recent data. For instance: Iceland <https://www.wikidata.org/wiki/Q189> currently has over 2x as many properties as its main entry needs. It has * 17 statements of life-expectancy * 60 statements of population, * 57 statements for nominal GDP, * 57 statements for nominal GDP per capita, &c. -- (each qualified by point in time, reference) Instead it could have a single statement for the latest value of each of these (qualified by point in time: *date*, reference: *URL*, and *time-series*: *start date - end date*). and an associated entity like *Q189/historical* could have a time series; with the ~400 individual historical statements. Most queries and views could touch only the non-time-series statements, reflecting the most common uses of this data on the projects. SJ On Fri, Apr 10, 2020 at 10:13 AM Samuel Klein <[email protected]> wrote: > There are many highly used templates on WP with time-series data about > COVID spread: cases, tests, health outcomes, by region + per day. Each > cell has a source and some context (caveats, multiple slightly conflicting > or time-offset sources, commentary about that data point), and would > benefit from being explicitly versioned in Wikidata. > > What's the right way to capture this in Wikidata - currently, and in the > future? EN Wikipedia tends to have one footnote about sourcing per > geography, with occasional footnotes about how some of those sources have > changed over time. I don't know of any of these templates that are drawing > from Wikidata. > > SJ > -- Samuel Klein @metasj w:user:sj +1 617 529 4266
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
