Re: RDFS subPropertyOf property path query performance

Christian Clausen Tue, 14 May 2024 01:21:54 -0700

Hi Lorenz,

I have shared a Java project which includes data here:
https://drive.google.com/file/d/1MOQXNmTEmJBnzLIgQ3pQViQbiyvTT76q/view?usp=sharing


In GraphServer.java there is a variable USE_RDFS, which you can use to
switch between using RDFS and not.

In preparing the repro I realized that the performance difference only
occurs on more complex queries than what I originally thought.

The test query is this:

PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX :     <https://stati-cal.com/ontology/al/>
PREFIX aln:  <https://stati-cal.com/ontology/al/bnode/>

SELECT (COUNT(1) AS ?cnt)
WHERE
  { ?cmit a :StaticMethod ;
          :name "Commit" .
    ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ;
          a ?procedureOrTrigger .
    FILTER(?procedureOrTrigger in (:Procedure, :Trigger))
    ?owner  :contains+  ?decl ;
            a ?ownerType .
    FILTER(?ownerType in (:Table, :TableExtension, :Page, :PageExtension,
                           :Report, :ReportExtension, :Codeunit, :XmlPort,
                           :Query, :ControlAddIn, :Enum, :EnumExtension,
                           :PageCustomization, :Profile,
:DotNetPackage, :Interface,
                           :PermissionSet, :PermissionSetExtension,
:Entitlement, :DotNet))
    ?decl :localKey ?localkey .
  }

(For simplicity, I have used count instead of selecting ?owner and
?localKey which we use in our application.)

This is how I can the tests:

curl -v -X POST --header "Content-Type: application/sparql-query"
--data-binary @test1.sparql  http://localhost:3030/CodeGraph/query

With RDFS enabled, the query runs in about 80 seconds.

With RDFS disabled, it takes about 1-2 seconds.

Interestingly, it if I leave out the part that begins with ?owner...:

SELECT (COUNT(1) AS ?cnt)
WHERE
  { ?cmit a :StaticMethod ;
          :name "Commit" .
    ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ;
          a ?procedureOrTrigger .
    FILTER(?procedureOrTrigger in (:Procedure, :Trigger))
  }

Then performance is similar (and good) with and without RDFS.

/Christian


On Mon, 13 May 2024 at 12:04, Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:

> Hi,
>
> does it mean the ?origin is always bound to a resource in the graph? Can
> you share the whole query maybe?
>
> How long are the sequences in the graph? How many paths starting from a
> node, i.e. what's the out degree in general per node?
>
> Also, would it be possible to share some kind of data for investigation?
>
> In general, the RDFS inference you're using is pretty light-weight,
> running at query eval time - all it does at triple pattern eval time is
> to incorporate in your case the rdfs:subProperty triple from the schema,
> but it might indeed grow at each step on the path
>
>
> Cheers,
>
> Lorenz
>
> On 13.05.24 09:41, Christian Clausen wrote:
> > In our graph we have :flow properties and need to distinguish different
> > kinds of flows, :flowA and :flowB.
> >
> > We modelled this with in RDFS:
> >
> >      :flowA rdfs:subPropertyOf :flow
> >      :flowB rdfs:subPropertyOf :flow
> >
> > Some of our SPARQL queries use :flow+ and some use :flowA+, always from
> an
> > origin:
> >
> >      ?origin :flowA+ :?result
> >
> > or
> >
> >      ?origin :flow+ :?result
> >
> > If we start Fuseki *without* RDFS, the following queries finish in a
> second
> > or two:
> >
> >      ?origin :flowA+ :?result
> >      ?origin :(flowA | :flowB)+ :?result
> >
> > If we start Fuseki *with* RDFS, the following queries take about 85
> seconds:
> >
> >      ?origin :flowA+ :?result
> >      ?origin :flow+ :?result
>
> >
> > What is causing this difference in performance? Are we missing something
> or
> > should we avoid RDFS for optimal performance? Any other alternatives?
> >
> > Our overall process is:
> >
> > 1. Generate TTL files with :flowA and :flowB properties (not :flow other
> > than implied by rdfs:subPropertyOf)
> > 2. Load with TDB2 loader
> > 3. Start Fuseki (with RDSF vocabulary or not)
> >
> > Here follows the code we use to start Fuseki.
> >
> > Without RDFS:
> >
> >          *Dataset data = TDB2Factory.connectDataset(options.directory);*
> >
> >          FusekiServer server = FusekiServer.create()
> >              .port(options.port)
> >              .loopback(true)
> >              *.addDataset(options.datasetName, data.asDatasetGraph())*
> >              .addEndpoint(options.datasetName, "query", Operation.Query)
> >              // shortestPath
> >              .registerOperation(shortestPathOp,
> WebContent.contentTypeJSON,
> > new ShortestPathService())
> >              .addEndpoint(options.datasetName, "shortestPath",
> > shortestPathOp)
> >              // diagnostics
> >              .verbose(true)
> >              .enablePing(true)
> >              .enableStats(true)
> >              .enableMetrics(true)
> >              .enableTasks(true)
> >              .build();
> >
> >          // Start
> >          server.start();
> >
> > With RDFS:
> >
> >
> >
> > *Dataset data = TDB2Factory.connectDataset(options.directory);
> Graph
> > vocabulary = RDFDataMgr.loadGraph(options.vocabularyFileName);
> > DatasetGraph dsg = RDFSFactory.datasetRDFS(data.asDatasetGraph(),
> > vocabulary);*
> >
> >          FusekiServer server = FusekiServer.create()
> >              .port(options.port)
> >              .loopback(true)
> >              *.addDataset(options.datasetName,dsg)*
> >              .addEndpoint(options.datasetName, "query", Operation.Query)
> >              // shortestPath
> >              .registerOperation(shortestPathOp,
> WebContent.contentTypeJSON,
> > new ShortestPathService())
> >              .addEndpoint(options.datasetName, "shortestPath",
> > shortestPathOp)
> >              // diagnostics
> >              .verbose(true)
> >              .enablePing(true)
> >              .enableStats(true)
> >              .enableMetrics(true)
> >              .enableTasks(true)
> >              .build();
> >
> >          // Start
> >          server.start();
> >
> --
> Lorenz Bühmann
> Research Associate/Scientific Developer
>
> Email buehm...@infai.org
>
> Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109
> Leipzig | Germany
>
>

Re: RDFS subPropertyOf property path query performance

Reply via email to