I started a Fuseki server (using the latest 0.2.3-SNAPSHOT release) with a TDB database using a default configuration, and loaded a file with ~500K triples into a graph called <data:input>. Now, I'm trying to do some validation on that data, specifically find resources that use a property but are not explicitly declared as members of that property's domain:
SELECT (count(*) as ?c) WHERE { GRAPH <data:input> { ?p rdfs:domain ?d . ?s ?p ?o MINUS { ?s a ?d } } } (I know that if we're using rdfs:domain then any subjects using that property can be inferred to be members of that property's domain, but that's beside the point). This query doesn't return in any reasonable amount of time (I let it run for about half an hour). So, my next step was to eliminate the join in this query using a temporary graph: INSERT { GRAPH <data:output> { ?s <temp:typeByDomain> ?d } } WHERE { GRAPH <data:input> { ?p rdfs:domain ?d . ?s ?p ?o } } SELECT (count(*) as ?c) WHERE { GRAPH <data:output> { ?s <temp:typeByDomain> ?d } MINUS { GRAPH <data:input> { ?s a ?d } } } This query takes about 15 minutes to execute on my machine -- still longer than I'd like, but at least it's progress. Next I attempted to eliminate the effects of materializing the entire result set by converting this to an ASK query: ASK WHERE { GRAPH <data:output> { ?s <temp:typeByDomain> ?d } MINUS { GRAPH <data:input> { ?s a ?d } } } This query takes about 5 minutes to complete, which is certainly better than not completing at all but still slower than I would like. Is there any way to tune or optimize TDB to better handle this query? As I mentioned, I am using the default TDB configuration (just specifying --loc with an empty directory to the fuseki-server script and accepting whatever it gives me). >From what I can tell in the online help, most of the performance tuning relates to the ordering of triple patterns within a join. Are there any other suggestions to try? FWIW, here are the approximate cardinalities of the various query patterns in my dataset: ?s ?p ?o: 532,000 ?p rdfs:domain ?d: 200 {?p rdfs:domain ?d . ?s ?p ?o}: 62,000 {?s rdf:type ?d}: 37,000 {?p rdfs:domain ?d . ?s ?p ?o} MINUS { ?s rdf:type ?d }: 39,000 Thanks, Alex