Christopher added a comment.

@Jheald Thank you for your suggestions.  What is fairly clear in my research is 
that counting type queries on large (or undefined ranges) with an unbound 
domain are just not possible (without huge resource consumption) when the 
namespace contains millions and millions of triples.  For example, the

  PREFIX prov: <http://www.w3.org/ns/prov#>
  
  SELECT (COUNT(DISTINCT ?stmt) AS ?count) WHERE {  
     ?stmt prov:wasDerivedFrom ?ref 
  }

will not work, even with no query timeout.  I have tried it on 
http://wdm-rdf.wmflabs.org and it uses all of the 8GB heap spaces and crashes 
Blazegraph.  Of course, there are ways to use SPARQL to post-process/filter 
manageable result sets (in memory) as you suggest, but this seems not possible 
for the 800M+ triples in wdq.

By introducing an "arity class property" (like "hasNullReference"), the 
evaluation on **all** data, can be achieved with minimal processing overhead 
because the query range is a boolean value and not a variable like "all 
references" .


TASK DETAIL
  https://phabricator.wikimedia.org/T120166

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, StudiesWorld, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
JeroenDeDauw, Mbch331



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to