Christopher added a comment. @Jheald Thank you for your suggestions. What is fairly clear in my research is that counting type queries on large (or undefined ranges) with an unbound domain are just not possible (without huge resource consumption) when the namespace contains millions and millions of triples. For example, the
PREFIX prov: <http://www.w3.org/ns/prov#> SELECT (COUNT(DISTINCT ?stmt) AS ?count) WHERE { ?stmt prov:wasDerivedFrom ?ref } will not work, even with no query timeout. I have tried it on http://wdm-rdf.wmflabs.org and it uses all of the 8GB heap spaces and crashes Blazegraph. Of course, there are ways to use SPARQL to post-process/filter manageable result sets (in memory) as you suggest, but this seems not possible for the 800M+ triples in wdq. By introducing an "arity class property" (like "hasNullReference"), the evaluation on **all** data, can be achieved with minimal processing overhead because the query range is a boolean value and not a variable like "all references" . TASK DETAIL https://phabricator.wikimedia.org/T120166 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, StudiesWorld, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JeroenDeDauw, Mbch331 _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
