Christopher added a comment.

@Addshore Some progress was made on this in 
https://phabricator.wikimedia.org/T120166.  The only "practical" way to get the 
statement and reference metrics is to facet the data by property.  It is just 
not possible to run counting queries against the whole database and get any 
reasonable response time.

This means that any large domain or range metric counts should iterate over all 
1800+ properties with separate SPARQL calls and then aggregate the numbers.  We 
can do this for the statement -> reference arity with:

  PREFIX wikibase: <http://wikiba.se/ontology#>
  PREFIX wd: <http://www.wikidata.org/entity/> 
  PREFIX wdt: <http://www.wikidata.org/prop/direct/>
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
  PREFIX prov: <http://www.w3.org/ns/prov#>
  prefix p: <http://www.wikidata.org/prop/>
  
  SELECT ?nrefs (COUNT(?wds) AS ?count) WHERE {
    {
      SELECT ?wds (COUNT(DISTINCT(?ref)) AS ?nrefs)
      WHERE {
          ?item p:$property ?wds .
          OPTIONAL {?wds prov:wasDerivedFrom ?ref } .
      } GROUP BY ?wds
    }
  } GROUP BY ?nrefs 
  ORDER BY ?nrefs

Would you do this in PHP?  If you want to handle this, just let me know, 
otherwise we could reuse the bulk sparql scripts that I have already done in R.

In addition to tracking aggregates, it would also be useful to show all 
property counts in a table like I did for here 
http://wdm.wmflabs.org/?t=wikidata_property_usage_count.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to