Hi, I’m running a particular query in a Fuseki server which performs very differently if the data is in a named graph vs. the default graph. I’m wondering if it’s expected to have a large performance hit if a named graph is specified. The dataset consists of ~462 million triples; it’s this dataset with all graphs merged together: https://github.com/INCATools/ubergraph?tab=readme-ov-file#downloads
I have loaded all the triples into a named graph in TDB2 using this command: tdb2.tdbloader --loc tdb --graph 'http://example.org/ubergraph’ ubergraph.nt.gz My fuseki config is like this: [] rdf:type fuseki:Server ; ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "120000" ] ; fuseki:services ( <#my-service> ) . <#my-service> rdf:type fuseki:Service ; fuseki:name "union" ; fuseki:serviceQuery "sparql" ; fuseki:serviceReadGraphStore "get" ; fuseki:dataset <#dataset> . <#dataset> rdf:type tdb2:DatasetTDB2 ; tdb2:location "tdb" ; tdb2:unionDefaultGraph true . This is my query: PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX cell: <http://purl.obolibrary.org/obo/CL_0000000> PREFIX organ: <http://purl.obolibrary.org/obo/UBERON_0000062> PREFIX abdomen: <http://purl.obolibrary.org/obo/UBERON_0000916> PREFIX part_of: <http://purl.obolibrary.org/obo/BFO_0000050> SELECT DISTINCT ?cell ?organ FROM <http://example.org/ubergraph> WHERE { ?cell rdfs:subClassOf cell: . ?cell part_of: ?organ . ?organ rdfs:subClassOf organ: . ?organ part_of: abdomen: . ?cell rdfs:label ?cell_label . ?organ rdfs:label ?organ_label . } Using the FROM line causes the query to complete in about 40 seconds. Deleting the FROM line allows the query to complete in about 5 seconds. The reason I was testing this in TDB2 is that I first noticed this behavior with an HDT backend, and wanted to make sure it wasn’t only an HDT issue. If I create a dataset using an HDT graph as the default graph, the query completes in a fraction of a second, but if I use the graph as a named graph the time jumps to about 20 seconds. For both of these scenarios (TDB2 and HDT) there is only a single named graph in the dataset. Is there any way to improve performance when using FROM in the query? Thank you, Jim
