Christopher added a comment. @Addshore Some progress was made on this in https://phabricator.wikimedia.org/T120166. The only "practical" way to get the statement and reference metrics is to facet the data by property. It is just not possible to run counting queries against the whole database and get any reasonable response time.
This means that any large domain or range metric counts should iterate over all 1800+ properties with separate SPARQL calls and then aggregate the numbers. We can do this for the statement -> reference arity with: PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX prov: <http://www.w3.org/ns/prov#> prefix p: <http://www.wikidata.org/prop/> SELECT ?nrefs (COUNT(?wds) AS ?count) WHERE { { SELECT ?wds (COUNT(DISTINCT(?ref)) AS ?nrefs) WHERE { ?item p:$property ?wds . OPTIONAL {?wds prov:wasDerivedFrom ?ref } . } GROUP BY ?wds } } GROUP BY ?nrefs ORDER BY ?nrefs Would you do this in PHP? If you want to handle this, just let me know, otherwise we could reuse the bulk sparql scripts that I have already done in R. In addition to tracking aggregates, it would also be useful to show all property counts in a table like I did for here http://wdm.wmflabs.org/?t=wikidata_property_usage_count. TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
