Jheald added a subscriber: Jheald.
Jheald added a comment.
arity 0, 1 and "more than one" are reasonably easily accessible, at least for
smaller subsets of the data.
The FILTER NOT EXISTS { ... } or MINUS { ... } or the OPTIONAL { ... }
FILTER (!bound( ... )) constructions can all be used, at least on smaller
subsets, to get a set of items for which something is not true.
Obviously this can be inefficient for a dataset where arity 0 is dominant; but
an alternative approach might just be to count the statements that //are//
referenced, and subtract from the total number of statements.
The number of referenced statements could in principle be obtained from a query
like:
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT (COUNT(DISTINCT ?stmt) AS ?count) WHERE {
?stmt prov:wasDerivedFrom ?ref
}
- though you would need to allow the query more time (and maybe more memory)
than the default WDQS service.
A set of items with exactly one value for a property can be obtained by looking
for a second item, and then excluding any item for which it is present:
?item wdt:PROP ?value1
OPTIONAL {
?item wdt:PROP ?value2 .
FILTER (!sameTerm(?value1, ?value2))
}
FILTER (!bound(?value2))
A breakdown of the incidence-frequencies for larger numbers of different values
can be obtained by using a sub-select:
SELECT ?nvals (COUNT(DISTINCT(?item)) AS ?count) WHERE {
{
SELECT ?item (COUNT(DISTINCT(?value)) AS ?nvals)
WHERE {
?item wdt:PROP ?value
} GROUP BY ?item
}
} ORDER BY ?nvals
Whole-database statistics like the total reference counts may be an exception;
but for most smaller specific parts of the dataset, I would have thought it
makes more sense to obtain counts of multivaluedness by querying than by
actively storing and maintaining a record of the arity in the triplestore.
TASK DETAIL
https://phabricator.wikimedia.org/T120166
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Jheald
Cc: Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, StudiesWorld,
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles,
JeroenDeDauw, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs