Hi Christian,

thanks for sharing a self-contained project.

What happens if you avoid the FILTER IN expression(s), which can be too expensive as the filter happens. And maybe use inline data to restrict the evaluation to given resources:


|PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <https://stati-cal.com/ontology/al/>
PREFIX aln: <https://stati-cal.com/ontology/al/bnode/>
SELECT (COUNT(1) AS ?cnt)
WHERE
{

  ?cmit a :StaticMethod ;
  :name "Commit" .
  ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ;
         a ?procedureOrTrigger .
  VALUES ?procedureOrTrigger {:Procedure :Trigger}

  VALUES ?ownerType {
    :Table :TableExtension :Page :PageExtension
    :Report :ReportExtension :Codeunit :XmlPort
    :Query :ControlAddIn :Enum :EnumExtension
    :PageCustomization :Profile
    :DotNetPackage :Interface
    :PermissionSet :PermissionSetExtension
    :Entitlement :DotNet
  }
  ?owner  :contains+  ?decl ;
          a ?ownerType .

  ?decl :localKey ?localkey .
}|


That should at least be a bit faster if I'm not wrong.

You should also provide the TDB database with some statistics about the data. Use tdb2.tdbstats for this to create a stats.opt file which you put in the Data-001 directory. This helps the optimizer in reordering of triple patterns. Won't work for property paths though, but in general it's a good idea to give it a try.


Cheers,
Lorenz

On 14.05.24 10:21, Christian Clausen wrote:
Hi Lorenz,

I have shared a Java project which includes data here:
https://drive.google.com/file/d/1MOQXNmTEmJBnzLIgQ3pQViQbiyvTT76q/view?usp=sharing

In GraphServer.java there is a variable USE_RDFS, which you can use to
switch between using RDFS and not.

In preparing the repro I realized that the performance difference only
occurs on more complex queries than what I originally thought.

The test query is this:

PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX :<https://stati-cal.com/ontology/al/>
PREFIX aln:<https://stati-cal.com/ontology/al/bnode/>

SELECT (COUNT(1) AS ?cnt)
WHERE
   { ?cmit a :StaticMethod ;
           :name "Commit" .
     ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ;
           a ?procedureOrTrigger .
     FILTER(?procedureOrTrigger in (:Procedure, :Trigger))
     ?owner  :contains+  ?decl ;
             a ?ownerType .
     FILTER(?ownerType in (:Table, :TableExtension, :Page, :PageExtension,
                            :Report, :ReportExtension, :Codeunit, :XmlPort,
                            :Query, :ControlAddIn, :Enum, :EnumExtension,
                            :PageCustomization, :Profile,
:DotNetPackage, :Interface,
                            :PermissionSet, :PermissionSetExtension,
:Entitlement, :DotNet))
     ?decl :localKey ?localkey .
   }

(For simplicity, I have used count instead of selecting ?owner and
?localKey which we use in our application.)

This is how I can the tests:

curl -v -X POST --header "Content-Type: application/sparql-query"
--data-binary @test1.sparqlhttp://localhost:3030/CodeGraph/query

With RDFS enabled, the query runs in about 80 seconds.

With RDFS disabled, it takes about 1-2 seconds.

Interestingly, it if I leave out the part that begins with ?owner...:

SELECT (COUNT(1) AS ?cnt)
WHERE
   { ?cmit a :StaticMethod ;
           :name "Commit" .
     ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ;
           a ?procedureOrTrigger .
     FILTER(?procedureOrTrigger in (:Procedure, :Trigger))
   }

Then performance is similar (and good) with and without RDFS.

/Christian


On Mon, 13 May 2024 at 12:04, Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:

Hi,

does it mean the ?origin is always bound to a resource in the graph? Can
you share the whole query maybe?

How long are the sequences in the graph? How many paths starting from a
node, i.e. what's the out degree in general per node?

Also, would it be possible to share some kind of data for investigation?

In general, the RDFS inference you're using is pretty light-weight,
running at query eval time - all it does at triple pattern eval time is
to incorporate in your case the rdfs:subProperty triple from the schema,
but it might indeed grow at each step on the path


Cheers,

Lorenz

On 13.05.24 09:41, Christian Clausen wrote:
In our graph we have :flow properties and need to distinguish different
kinds of flows, :flowA and :flowB.

We modelled this with in RDFS:

      :flowA rdfs:subPropertyOf :flow
      :flowB rdfs:subPropertyOf :flow

Some of our SPARQL queries use :flow+ and some use :flowA+, always from
an
origin:

      ?origin :flowA+ :?result

or

      ?origin :flow+ :?result

If we start Fuseki *without* RDFS, the following queries finish in a
second
or two:

      ?origin :flowA+ :?result
      ?origin :(flowA | :flowB)+ :?result

If we start Fuseki *with* RDFS, the following queries take about 85
seconds:
      ?origin :flowA+ :?result
      ?origin :flow+ :?result
What is causing this difference in performance? Are we missing something
or
should we avoid RDFS for optimal performance? Any other alternatives?

Our overall process is:

1. Generate TTL files with :flowA and :flowB properties (not :flow other
than implied by rdfs:subPropertyOf)
2. Load with TDB2 loader
3. Start Fuseki (with RDSF vocabulary or not)

Here follows the code we use to start Fuseki.

Without RDFS:

          *Dataset data = TDB2Factory.connectDataset(options.directory);*

          FusekiServer server = FusekiServer.create()
              .port(options.port)
              .loopback(true)
              *.addDataset(options.datasetName, data.asDatasetGraph())*
              .addEndpoint(options.datasetName, "query", Operation.Query)
              // shortestPath
              .registerOperation(shortestPathOp,
WebContent.contentTypeJSON,
new ShortestPathService())
              .addEndpoint(options.datasetName, "shortestPath",
shortestPathOp)
              // diagnostics
              .verbose(true)
              .enablePing(true)
              .enableStats(true)
              .enableMetrics(true)
              .enableTasks(true)
              .build();

          // Start
          server.start();

With RDFS:



*Dataset data = TDB2Factory.connectDataset(options.directory);
Graph
vocabulary = RDFDataMgr.loadGraph(options.vocabularyFileName);
DatasetGraph dsg = RDFSFactory.datasetRDFS(data.asDatasetGraph(),
vocabulary);*

          FusekiServer server = FusekiServer.create()
              .port(options.port)
              .loopback(true)
              *.addDataset(options.datasetName,dsg)*
              .addEndpoint(options.datasetName, "query", Operation.Query)
              // shortestPath
              .registerOperation(shortestPathOp,
WebContent.contentTypeJSON,
new ShortestPathService())
              .addEndpoint(options.datasetName, "shortestPath",
shortestPathOp)
              // diagnostics
              .verbose(true)
              .enablePing(true)
              .enableStats(true)
              .enableMetrics(true)
              .enableTasks(true)
              .build();

          // Start
          server.start();

--
Lorenz Bühmann
Research Associate/Scientific Developer

emailbuehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109
Leipzig | Germany


--
Lorenz Bühmann
Research Associate/Scientific Developer

emailbuehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany

Reply via email to