Jheald added a subscriber: Jheald.
Jheald added a comment.

arity 0, 1 and "more than one" are reasonably easily accessible, at least for 
smaller subsets of the data.

The  FILTER NOT EXISTS { ... }  or  MINUS { ... } or the OPTIONAL { ... } 
FILTER (!bound( ... )) constructions can all be used, at least on smaller 
subsets, to get a set of items for which something is not true.

Obviously this can be inefficient for a dataset where arity 0 is dominant; but 
an alternative approach might just be to count the statements that //are// 
referenced, and subtract from the total number of statements.

The number of referenced statements could in principle be obtained from a query 
like:

  PREFIX prov: <http://www.w3.org/ns/prov#>
  
  SELECT (COUNT(DISTINCT ?stmt) AS ?count) WHERE {  
     ?stmt prov:wasDerivedFrom ?ref 
  }

- though you would need to allow the query more time (and maybe more memory) 
than the default WDQS service.

A set of items with exactly one value for a property can be obtained by looking 
for a second item, and then excluding any item for which it is present:

  ?item wdt:PROP ?value1
  OPTIONAL {
      ?item wdt:PROP ?value2 .
      FILTER (!sameTerm(?value1, ?value2))
  }
  FILTER (!bound(?value2))

A breakdown of the incidence-frequencies for larger numbers of different values 
can be obtained by using a sub-select:

SELECT ?nvals (COUNT(DISTINCT(?item)) AS ?count) WHERE {

  {
    SELECT ?item (COUNT(DISTINCT(?value)) AS ?nvals)
    WHERE {
        ?item wdt:PROP ?value
    }  GROUP BY ?item
  }

} ORDER BY ?nvals

Whole-database statistics like the total reference counts may be an exception; 
but for most smaller specific parts of the dataset, I would have thought it 
makes more sense to obtain counts of multivaluedness by querying than by 
actively storing and maintaining a record of the arity in the triplestore.


TASK DETAIL
  https://phabricator.wikimedia.org/T120166

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Jheald
Cc: Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, StudiesWorld, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
JeroenDeDauw, Mbch331



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to