A related question:  how many WD statements are already part of a time
series?  Let's say this means
   properties qualified by point in time (P585
<https://www.wikidata.org/wiki/Property:P585>)
   where there are at least four other instances of that property with a
point-in-time qualifier.
This query
<https://query.wikidata.org/#%23defaultView%3AAreaChart%0ASELECT%20%3Fst%20%3Fct%20%7B%0A%20%20FILTER%20%28%3Fst%20%3E%205%29%0A%20%20%7B%0A%20%20%20BIND%20%280%20AS%20%3Fct%29%0A%20%20%20BIND%20%280%20AS%20%3Fst%29%0A%20%20%7D%0A%20%20UNION%20%7B%0A%20%20%20%20SELECT%20%3Fst%20%28COUNT%28%2a%29%20as%20%3Fct%29%20%0A%20%20%20%20%7B%0A%20%20%20%20%20%20%3Fitem%20wdt%3AP585%20%3Fvalue%20%3B%20wikibase%3Astatements%20%3Fst%0A%20%20%20%20%7D%0A%20%20%20%20GROUP%20BY%20%3Fst%0A%20%20%20%20ORDER%20BY%20%3Fst%0A%20%20%7D%0A%7D>
suggests it's roughly 800K statements in all; with around 400 outliers with
over 400 such statements.

This is common enough, and for sufficiently high-interest/high-traffic
entities, that it would be nice to have a more explicit way of handling
this.

One suggestion: a norm of having a single most-recent value, for each
time-series property, and a time-series property-space exclusively used for
historical values. This would support explicitly noting where a time series
is intended, allow for cleaner edit histories for that work, allow for
including other time-series data that is in active use on Wikipedia, and
help optimize queries for the most recent data.

For instance: Iceland <https://www.wikidata.org/wiki/Q189> currently has
over 2x as many properties as its main entry needs.  It has
* 17 statements of life-expectancy
* 60 statements of population,
* 57 statements for nominal GDP,
* 57 statements for nominal GDP per capita, &c.  --  (each qualified by
point in time, reference)
Instead it could have a single statement for the latest value of each of
these (qualified by point in time: *date*, reference: *URL*, and
*time-series*: *start date - end date*). and an associated entity like
*Q189/historical* could have a time series; with the ~400 individual
historical statements.  Most queries and views could touch only the
non-time-series statements, reflecting the most common uses of this data on
the projects.

SJ

On Fri, Apr 10, 2020 at 10:13 AM Samuel Klein <[email protected]> wrote:

> There are many highly used templates on WP with time-series data about
> COVID spread: cases, tests, health outcomes, by region + per day.  Each
> cell has a source and some context (caveats, multiple slightly conflicting
> or time-offset sources, commentary about that data point), and would
> benefit from being explicitly versioned in Wikidata.
>
> What's the right way to capture this in Wikidata - currently, and in the
> future?  EN Wikipedia tends to have one footnote about sourcing per
> geography, with occasional footnotes about how some of those sources have
> changed over time.  I don't know of any of these templates that are drawing
> from Wikidata.
>
> SJ
>


-- 
Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to