Hi Lorenz, I have shared a Java project which includes data here: https://drive.google.com/file/d/1MOQXNmTEmJBnzLIgQ3pQViQbiyvTT76q/view?usp=sharing
In GraphServer.java there is a variable USE_RDFS, which you can use to switch between using RDFS and not. In preparing the repro I realized that the performance difference only occurs on more complex queries than what I originally thought. The test query is this: PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX : <https://stati-cal.com/ontology/al/> PREFIX aln: <https://stati-cal.com/ontology/al/bnode/> SELECT (COUNT(1) AS ?cnt) WHERE { ?cmit a :StaticMethod ; :name "Commit" . ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ; a ?procedureOrTrigger . FILTER(?procedureOrTrigger in (:Procedure, :Trigger)) ?owner :contains+ ?decl ; a ?ownerType . FILTER(?ownerType in (:Table, :TableExtension, :Page, :PageExtension, :Report, :ReportExtension, :Codeunit, :XmlPort, :Query, :ControlAddIn, :Enum, :EnumExtension, :PageCustomization, :Profile, :DotNetPackage, :Interface, :PermissionSet, :PermissionSetExtension, :Entitlement, :DotNet)) ?decl :localKey ?localkey . } (For simplicity, I have used count instead of selecting ?owner and ?localKey which we use in our application.) This is how I can the tests: curl -v -X POST --header "Content-Type: application/sparql-query" --data-binary @test1.sparql http://localhost:3030/CodeGraph/query With RDFS enabled, the query runs in about 80 seconds. With RDFS disabled, it takes about 1-2 seconds. Interestingly, it if I leave out the part that begins with ?owner...: SELECT (COUNT(1) AS ?cnt) WHERE { ?cmit a :StaticMethod ; :name "Commit" . ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ; a ?procedureOrTrigger . FILTER(?procedureOrTrigger in (:Procedure, :Trigger)) } Then performance is similar (and good) with and without RDFS. /Christian On Mon, 13 May 2024 at 12:04, Lorenz Buehmann < buehm...@informatik.uni-leipzig.de> wrote: > Hi, > > does it mean the ?origin is always bound to a resource in the graph? Can > you share the whole query maybe? > > How long are the sequences in the graph? How many paths starting from a > node, i.e. what's the out degree in general per node? > > Also, would it be possible to share some kind of data for investigation? > > In general, the RDFS inference you're using is pretty light-weight, > running at query eval time - all it does at triple pattern eval time is > to incorporate in your case the rdfs:subProperty triple from the schema, > but it might indeed grow at each step on the path > > > Cheers, > > Lorenz > > On 13.05.24 09:41, Christian Clausen wrote: > > In our graph we have :flow properties and need to distinguish different > > kinds of flows, :flowA and :flowB. > > > > We modelled this with in RDFS: > > > > :flowA rdfs:subPropertyOf :flow > > :flowB rdfs:subPropertyOf :flow > > > > Some of our SPARQL queries use :flow+ and some use :flowA+, always from > an > > origin: > > > > ?origin :flowA+ :?result > > > > or > > > > ?origin :flow+ :?result > > > > If we start Fuseki *without* RDFS, the following queries finish in a > second > > or two: > > > > ?origin :flowA+ :?result > > ?origin :(flowA | :flowB)+ :?result > > > > If we start Fuseki *with* RDFS, the following queries take about 85 > seconds: > > > > ?origin :flowA+ :?result > > ?origin :flow+ :?result > > > > > What is causing this difference in performance? Are we missing something > or > > should we avoid RDFS for optimal performance? Any other alternatives? > > > > Our overall process is: > > > > 1. Generate TTL files with :flowA and :flowB properties (not :flow other > > than implied by rdfs:subPropertyOf) > > 2. Load with TDB2 loader > > 3. Start Fuseki (with RDSF vocabulary or not) > > > > Here follows the code we use to start Fuseki. > > > > Without RDFS: > > > > *Dataset data = TDB2Factory.connectDataset(options.directory);* > > > > FusekiServer server = FusekiServer.create() > > .port(options.port) > > .loopback(true) > > *.addDataset(options.datasetName, data.asDatasetGraph())* > > .addEndpoint(options.datasetName, "query", Operation.Query) > > // shortestPath > > .registerOperation(shortestPathOp, > WebContent.contentTypeJSON, > > new ShortestPathService()) > > .addEndpoint(options.datasetName, "shortestPath", > > shortestPathOp) > > // diagnostics > > .verbose(true) > > .enablePing(true) > > .enableStats(true) > > .enableMetrics(true) > > .enableTasks(true) > > .build(); > > > > // Start > > server.start(); > > > > With RDFS: > > > > > > > > *Dataset data = TDB2Factory.connectDataset(options.directory); > Graph > > vocabulary = RDFDataMgr.loadGraph(options.vocabularyFileName); > > DatasetGraph dsg = RDFSFactory.datasetRDFS(data.asDatasetGraph(), > > vocabulary);* > > > > FusekiServer server = FusekiServer.create() > > .port(options.port) > > .loopback(true) > > *.addDataset(options.datasetName,dsg)* > > .addEndpoint(options.datasetName, "query", Operation.Query) > > // shortestPath > > .registerOperation(shortestPathOp, > WebContent.contentTypeJSON, > > new ShortestPathService()) > > .addEndpoint(options.datasetName, "shortestPath", > > shortestPathOp) > > // diagnostics > > .verbose(true) > > .enablePing(true) > > .enableStats(true) > > .enableMetrics(true) > > .enableTasks(true) > > .build(); > > > > // Start > > server.start(); > > > -- > Lorenz Bühmann > Research Associate/Scientific Developer > > Email buehm...@infai.org > > Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 > Leipzig | Germany > >