Hi Lorenz, Thanks for your suggestions.
Interestingly, with VALUES instead of FILTER IN, the query performance with RDFS is improved to about 11 seconds (was about 80). However, the non-RDFS performance degraded to about 11 seconds (was about 1-2 seconds). Thanks for the tip on tdb2.tdbstats. I tried it out and as you sort of predicted, this did not make a difference for the current query. With the FILTER IN query, I captured some algebras with and without RDFS. I have highlighted the differences: WITHOUT RDFS: [2024-05-14 09:11:29] exec INFO QUERY PREFIX : <https://stati-cal.com/ontology/al/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX aln: <https://stati-cal.com/ontology/al/bnode/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT (COUNT(1) AS ?cnt) WHERE { ?cmit rdf:type :StaticMethod ; :name "Commit" . ?decl (:possiblyCommittingProceduralFlow)+ ?cmit . ?decl rdf:type ?procedureOrTrigger FILTER ( ?procedureOrTrigger IN (:Procedure, :Trigger) ) ?owner (:contains)+ ?decl . ?owner rdf:type ?ownerType FILTER ( ?ownerType IN (:Table, :TableExtension, :Page, :PageExtension, :Report, :ReportExtension, :Codeunit, :XmlPort, :Query, :ControlAddIn, :Enum, :EnumExtension, :PageCustomization, :Profile, :DotNetPackage, :Interface, :PermissionSet, :PermissionSetExtension, :Entitlement, :DotNet) ) } [2024-05-14 09:11:29] exec INFO ALGEBRA (project (?cnt) (extend ((?cnt ?.0)) (group () ((?.0 (count 1))) (disjunction (assign ((?procedureOrTrigger < https://stati-cal.com/ontology/al/Procedure>)) (sequence *(quadpattern (quad <urn:x-arq:DefaultGraphNode> ?cmit <http://www.w3.org/1999/02/22-rdf-syntax-ns#type <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>> <https://stati-cal.com/ontology/al/StaticMethod <https://stati-cal.com/ontology/al/StaticMethod>>) (quad <urn:x-arq:DefaultGraphNode> ?cmit <https://stati-cal.com/ontology/al/name <https://stati-cal.com/ontology/al/name>> "Commit") ) (graph <urn:x-arq:DefaultGraphNode> (path ?decl (path+ <https://stati-cal.com/ontology/al/possiblyCommittingProceduralFlow <https://stati-cal.com/ontology/al/possiblyCommittingProceduralFlow>>) ?cmit)) (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?decl <http://www.w3.org/1999/02/22-rdf-syntax-ns#type <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>> <https://stati-cal.com/ontology/al/Procedure <https://stati-cal.com/ontology/al/Procedure>>)) (graph <urn:x-arq:DefaultGraphNode>* (path ?owner (path+ < https://stati-cal.com/ontology/al/contains>) ?decl)) (filter (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (= ?ownerType < https://stati-cal.com/ontology/al/Table>) (= ?ownerType < https://stati-cal.com/ontology/al/TableExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Page>)) (= ?ownerType < https://stati-cal.com/ontology/al/PageExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Report>)) (= ?ownerType < https://stati-cal.com/ontology/al/ReportExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Codeunit>)) (= ?ownerType < https://stati-cal.com/ontology/al/XmlPort>)) (= ?ownerType < https://stati-cal.com/ontology/al/Query>)) (= ?ownerType < https://stati-cal.com/ontology/al/ControlAddIn>)) (= ?ownerType < https://stati-cal.com/ontology/al/Enum>)) (= ?ownerType < https://stati-cal.com/ontology/al/EnumExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/PageCustomization>)) (= ?ownerType < https://stati-cal.com/ontology/al/Profile>)) (= ?ownerType < https://stati-cal.com/ontology/al/DotNetPackage>)) (= ?ownerType < https://stati-cal.com/ontology/al/Interface>)) (= ?ownerType < https://stati-cal.com/ontology/al/PermissionSet>)) (= ?ownerType < https://stati-cal.com/ontology/al/PermissionSetExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Entitlement>)) (= ?ownerType < https://stati-cal.com/ontology/al/DotNet>)) (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?owner < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?ownerType))))) (assign ((?procedureOrTrigger < https://stati-cal.com/ontology/al/Trigger>)) (sequence (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?cmit < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < https://stati-cal.com/ontology/al/StaticMethod>) (quad <urn:x-arq:DefaultGraphNode> ?cmit < https://stati-cal.com/ontology/al/name> "Commit") ) (graph <urn:x-arq:DefaultGraphNode> (path ?decl (path+ < https://stati-cal.com/ontology/al/possiblyCommittingProceduralFlow>) ?cmit)) (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?decl < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < https://stati-cal.com/ontology/al/Trigger>)) (graph <urn:x-arq:DefaultGraphNode> (path ?owner (path+ < https://stati-cal.com/ontology/al/contains>) ?decl)) (filter (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (= ?ownerType < https://stati-cal.com/ontology/al/Table>) (= ?ownerType < https://stati-cal.com/ontology/al/TableExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Page>)) (= ?ownerType < https://stati-cal.com/ontology/al/PageExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Report>)) (= ?ownerType < https://stati-cal.com/ontology/al/ReportExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Codeunit>)) (= ?ownerType < https://stati-cal.com/ontology/al/XmlPort>)) (= ?ownerType < https://stati-cal.com/ontology/al/Query>)) (= ?ownerType < https://stati-cal.com/ontology/al/ControlAddIn>)) (= ?ownerType < https://stati-cal.com/ontology/al/Enum>)) (= ?ownerType < https://stati-cal.com/ontology/al/EnumExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/PageCustomization>)) (= ?ownerType < https://stati-cal.com/ontology/al/Profile>)) (= ?ownerType < https://stati-cal.com/ontology/al/DotNetPackage>)) (= ?ownerType < https://stati-cal.com/ontology/al/Interface>)) (= ?ownerType < https://stati-cal.com/ontology/al/PermissionSet>)) (= ?ownerType < https://stati-cal.com/ontology/al/PermissionSetExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Entitlement>)) (= ?ownerType < https://stati-cal.com/ontology/al/DotNet>)) (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?owner < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?ownerType))))))))) ... WITH RDFS: [2024-05-14 09:05:27] exec INFO QUERY PREFIX : <https://stati-cal.com/ontology/al/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX aln: <https://stati-cal.com/ontology/al/bnode/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT (COUNT(1) AS ?cnt) WHERE { ?cmit rdf:type :StaticMethod ; :name "Commit" . ?decl (:possiblyCommittingProceduralFlow)+ ?cmit . ?decl rdf:type ?procedureOrTrigger FILTER ( ?procedureOrTrigger IN (:Procedure, :Trigger) ) ?owner (:contains)+ ?decl . ?owner rdf:type ?ownerType FILTER ( ?ownerType IN (:Table, :TableExtension, :Page, :PageExtension, :Report, :ReportExtension, :Codeunit, :XmlPort, :Query, :ControlAddIn, :Enum, :EnumExtension, :PageCustomization, :Profile, :DotNetPackage, :Interface, :PermissionSet, :PermissionSetExtension, :Entitlement, :DotNet) ) } [2024-05-14 09:05:27] exec INFO ALGEBRA (project (?cnt) (extend ((?cnt ?.0)) (group () ((?.0 (count 1))) (disjunction (assign ((?procedureOrTrigger < https://stati-cal.com/ontology/al/Procedure>)) (sequence *(bgp (triple ?cmit <http://www.w3.org/1999/02/22-rdf-syntax-ns#type <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>> <https://stati-cal.com/ontology/al/StaticMethod <https://stati-cal.com/ontology/al/StaticMethod>>) (triple ?cmit <https://stati-cal.com/ontology/al/name <https://stati-cal.com/ontology/al/name>> "Commit") ) (path ?decl (path+ <https://stati-cal.com/ontology/al/possiblyCommittingProceduralFlow <https://stati-cal.com/ontology/al/possiblyCommittingProceduralFlow>>) ?cmit) (bgp (triple ?decl <http://www.w3.org/1999/02/22-rdf-syntax-ns#type <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>> <https://stati-cal.com/ontology/al/Procedure <https://stati-cal.com/ontology/al/Procedure>>)) (path ?owner (path+ <https://stati-cal.com/ontology/al/contains <https://stati-cal.com/ontology/al/contains>>) ?decl)* (filter (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (= ?ownerType < https://stati-cal.com/ontology/al/Table>) (= ?ownerType < https://stati-cal.com/ontology/al/TableExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Page>)) (= ?ownerType < https://stati-cal.com/ontology/al/PageExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Report>)) (= ?ownerType < https://stati-cal.com/ontology/al/ReportExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Codeunit>)) (= ?ownerType < https://stati-cal.com/ontology/al/XmlPort>)) (= ?ownerType < https://stati-cal.com/ontology/al/Query>)) (= ?ownerType < https://stati-cal.com/ontology/al/ControlAddIn>)) (= ?ownerType < https://stati-cal.com/ontology/al/Enum>)) (= ?ownerType < https://stati-cal.com/ontology/al/EnumExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/PageCustomization>)) (= ?ownerType < https://stati-cal.com/ontology/al/Profile>)) (= ?ownerType < https://stati-cal.com/ontology/al/DotNetPackage>)) (= ?ownerType < https://stati-cal.com/ontology/al/Interface>)) (= ?ownerType < https://stati-cal.com/ontology/al/PermissionSet>)) (= ?ownerType < https://stati-cal.com/ontology/al/PermissionSetExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Entitlement>)) (= ?ownerType < https://stati-cal.com/ontology/al/DotNet>)) (bgp (triple ?owner < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?ownerType))))) (assign ((?procedureOrTrigger < https://stati-cal.com/ontology/al/Trigger>)) (sequence (bgp (triple ?cmit < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < https://stati-cal.com/ontology/al/StaticMethod>) (triple ?cmit <https://stati-cal.com/ontology/al/name> "Commit") ) (path ?decl (path+ < https://stati-cal.com/ontology/al/possiblyCommittingProceduralFlow>) ?cmit) (bgp (triple ?decl < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> < https://stati-cal.com/ontology/al/Trigger>)) (path ?owner (path+ < https://stati-cal.com/ontology/al/contains>) ?decl) (filter (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (|| (= ?ownerType < https://stati-cal.com/ontology/al/Table>) (= ?ownerType < https://stati-cal.com/ontology/al/TableExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Page>)) (= ?ownerType < https://stati-cal.com/ontology/al/PageExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Report>)) (= ?ownerType < https://stati-cal.com/ontology/al/ReportExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Codeunit>)) (= ?ownerType < https://stati-cal.com/ontology/al/XmlPort>)) (= ?ownerType < https://stati-cal.com/ontology/al/Query>)) (= ?ownerType < https://stati-cal.com/ontology/al/ControlAddIn>)) (= ?ownerType < https://stati-cal.com/ontology/al/Enum>)) (= ?ownerType < https://stati-cal.com/ontology/al/EnumExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/PageCustomization>)) (= ?ownerType < https://stati-cal.com/ontology/al/Profile>)) (= ?ownerType < https://stati-cal.com/ontology/al/DotNetPackage>)) (= ?ownerType < https://stati-cal.com/ontology/al/Interface>)) (= ?ownerType < https://stati-cal.com/ontology/al/PermissionSet>)) (= ?ownerType < https://stati-cal.com/ontology/al/PermissionSetExtension>)) (= ?ownerType < https://stati-cal.com/ontology/al/Entitlement>)) (= ?ownerType < https://stati-cal.com/ontology/al/DotNet>)) (bgp (triple ?owner < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?ownerType))))))))) ... Are the differences expected (quadpatters/graph vs. bgp)? /Christian On Tue, 14 May 2024 at 12:19, Lorenz Buehmann < buehm...@informatik.uni-leipzig.de> wrote: > Hi Christian, > > thanks for sharing a self-contained project. > > What happens if you avoid the FILTER IN expression(s), which can be too > expensive as the filter happens. And maybe use inline data to restrict > the evaluation to given resources: > > > |PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > PREFIX : <https://stati-cal.com/ontology/al/> > PREFIX aln: <https://stati-cal.com/ontology/al/bnode/> > SELECT (COUNT(1) AS ?cnt) > WHERE > { > > ?cmit a :StaticMethod ; > :name "Commit" . > ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ; > a ?procedureOrTrigger . > VALUES ?procedureOrTrigger {:Procedure :Trigger} > > VALUES ?ownerType { > :Table :TableExtension :Page :PageExtension > :Report :ReportExtension :Codeunit :XmlPort > :Query :ControlAddIn :Enum :EnumExtension > :PageCustomization :Profile > :DotNetPackage :Interface > :PermissionSet :PermissionSetExtension > :Entitlement :DotNet > } > ?owner :contains+ ?decl ; > a ?ownerType . > > ?decl :localKey ?localkey . > }| > > > That should at least be a bit faster if I'm not wrong. > > You should also provide the TDB database with some statistics about the > data. Use tdb2.tdbstats for this to create a stats.opt file which you > put in the Data-001 directory. This helps the optimizer in reordering of > triple patterns. Won't work for property paths though, but in general > it's a good idea to give it a try. > > > Cheers, > Lorenz > > On 14.05.24 10:21, Christian Clausen wrote: > > Hi Lorenz, > > > > I have shared a Java project which includes data here: > > > https://drive.google.com/file/d/1MOQXNmTEmJBnzLIgQ3pQViQbiyvTT76q/view?usp=sharing > > > > In GraphServer.java there is a variable USE_RDFS, which you can use to > > switch between using RDFS and not. > > > > In preparing the repro I realized that the performance difference only > > occurs on more complex queries than what I originally thought. > > > > The test query is this: > > > > PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> > > PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> > > PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> > > PREFIX :<https://stati-cal.com/ontology/al/> > > PREFIX aln:<https://stati-cal.com/ontology/al/bnode/> > > > > SELECT (COUNT(1) AS ?cnt) > > WHERE > > { ?cmit a :StaticMethod ; > > :name "Commit" . > > ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ; > > a ?procedureOrTrigger . > > FILTER(?procedureOrTrigger in (:Procedure, :Trigger)) > > ?owner :contains+ ?decl ; > > a ?ownerType . > > FILTER(?ownerType in (:Table, :TableExtension, :Page, > :PageExtension, > > :Report, :ReportExtension, :Codeunit, > :XmlPort, > > :Query, :ControlAddIn, :Enum, :EnumExtension, > > :PageCustomization, :Profile, > > :DotNetPackage, :Interface, > > :PermissionSet, :PermissionSetExtension, > > :Entitlement, :DotNet)) > > ?decl :localKey ?localkey . > > } > > > > (For simplicity, I have used count instead of selecting ?owner and > > ?localKey which we use in our application.) > > > > This is how I can the tests: > > > > curl -v -X POST --header "Content-Type: application/sparql-query" > > --data-binary @test1.sparqlhttp://localhost:3030/CodeGraph/query > > > > With RDFS enabled, the query runs in about 80 seconds. > > > > With RDFS disabled, it takes about 1-2 seconds. > > > > Interestingly, it if I leave out the part that begins with ?owner...: > > > > SELECT (COUNT(1) AS ?cnt) > > WHERE > > { ?cmit a :StaticMethod ; > > :name "Commit" . > > ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ; > > a ?procedureOrTrigger . > > FILTER(?procedureOrTrigger in (:Procedure, :Trigger)) > > } > > > > Then performance is similar (and good) with and without RDFS. > > > > /Christian > > > > > > On Mon, 13 May 2024 at 12:04, Lorenz Buehmann < > > buehm...@informatik.uni-leipzig.de> wrote: > > > >> Hi, > >> > >> does it mean the ?origin is always bound to a resource in the graph? Can > >> you share the whole query maybe? > >> > >> How long are the sequences in the graph? How many paths starting from a > >> node, i.e. what's the out degree in general per node? > >> > >> Also, would it be possible to share some kind of data for investigation? > >> > >> In general, the RDFS inference you're using is pretty light-weight, > >> running at query eval time - all it does at triple pattern eval time is > >> to incorporate in your case the rdfs:subProperty triple from the schema, > >> but it might indeed grow at each step on the path > >> > >> > >> Cheers, > >> > >> Lorenz > >> > >> On 13.05.24 09:41, Christian Clausen wrote: > >>> In our graph we have :flow properties and need to distinguish different > >>> kinds of flows, :flowA and :flowB. > >>> > >>> We modelled this with in RDFS: > >>> > >>> :flowA rdfs:subPropertyOf :flow > >>> :flowB rdfs:subPropertyOf :flow > >>> > >>> Some of our SPARQL queries use :flow+ and some use :flowA+, always from > >> an > >>> origin: > >>> > >>> ?origin :flowA+ :?result > >>> > >>> or > >>> > >>> ?origin :flow+ :?result > >>> > >>> If we start Fuseki *without* RDFS, the following queries finish in a > >> second > >>> or two: > >>> > >>> ?origin :flowA+ :?result > >>> ?origin :(flowA | :flowB)+ :?result > >>> > >>> If we start Fuseki *with* RDFS, the following queries take about 85 > >> seconds: > >>> ?origin :flowA+ :?result > >>> ?origin :flow+ :?result > >>> What is causing this difference in performance? Are we missing > something > >> or > >>> should we avoid RDFS for optimal performance? Any other alternatives? > >>> > >>> Our overall process is: > >>> > >>> 1. Generate TTL files with :flowA and :flowB properties (not :flow > other > >>> than implied by rdfs:subPropertyOf) > >>> 2. Load with TDB2 loader > >>> 3. Start Fuseki (with RDSF vocabulary or not) > >>> > >>> Here follows the code we use to start Fuseki. > >>> > >>> Without RDFS: > >>> > >>> *Dataset data = > TDB2Factory.connectDataset(options.directory);* > >>> > >>> FusekiServer server = FusekiServer.create() > >>> .port(options.port) > >>> .loopback(true) > >>> *.addDataset(options.datasetName, data.asDatasetGraph())* > >>> .addEndpoint(options.datasetName, "query", > Operation.Query) > >>> // shortestPath > >>> .registerOperation(shortestPathOp, > >> WebContent.contentTypeJSON, > >>> new ShortestPathService()) > >>> .addEndpoint(options.datasetName, "shortestPath", > >>> shortestPathOp) > >>> // diagnostics > >>> .verbose(true) > >>> .enablePing(true) > >>> .enableStats(true) > >>> .enableMetrics(true) > >>> .enableTasks(true) > >>> .build(); > >>> > >>> // Start > >>> server.start(); > >>> > >>> With RDFS: > >>> > >>> > >>> > >>> *Dataset data = TDB2Factory.connectDataset(options.directory); > >> Graph > >>> vocabulary = RDFDataMgr.loadGraph(options.vocabularyFileName); > >>> DatasetGraph dsg = RDFSFactory.datasetRDFS(data.asDatasetGraph(), > >>> vocabulary);* > >>> > >>> FusekiServer server = FusekiServer.create() > >>> .port(options.port) > >>> .loopback(true) > >>> *.addDataset(options.datasetName,dsg)* > >>> .addEndpoint(options.datasetName, "query", > Operation.Query) > >>> // shortestPath > >>> .registerOperation(shortestPathOp, > >> WebContent.contentTypeJSON, > >>> new ShortestPathService()) > >>> .addEndpoint(options.datasetName, "shortestPath", > >>> shortestPathOp) > >>> // diagnostics > >>> .verbose(true) > >>> .enablePing(true) > >>> .enableStats(true) > >>> .enableMetrics(true) > >>> .enableTasks(true) > >>> .build(); > >>> > >>> // Start > >>> server.start(); > >>> > >> -- > >> Lorenz Bühmann > >> Research Associate/Scientific Developer > >> > >> emailbuehm...@infai.org > >> > >> Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 > >> Leipzig | Germany > >> > >> > -- > Lorenz Bühmann > Research Associate/Scientific Developer > > emailbuehm...@infai.org > > Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 > Leipzig | Germany >