Re: query performance on named graph vs. default graph

Andy Seaborne Sun, 24 Mar 2024 14:08:53 -0700



On 21/03/2024 00:21, Jim Balhoff wrote:

Hi Lorenz,

These both do speed things up quite a bit, but it prevents matching patterns 
that cross graphs in the case where I include multiple graphs.

Thanks,
Jim

It is the combination choosing certain graphs and wanting cross graphpatterns that pushes the code into working in general way. it works inNodes, and that means string comparisons. That looses the TDB abilityto do faster joins using NodeIds which both avoids string comparisonsand retrieving the strings until they are known to be needed for theresults.

Is there a reason for not having a union default graph overall the namedgraphs instead of selecting certain ones? If it is all named graphs, theunion is TDB2 level.

You can have a Fuseki setup with two endpoints - one that does uniondefault graph, one that does not, for the same dataset.


    Andy

On Mar 20, 2024, at 4:28 AM, Lorenz Buehmann 
<buehm...@informatik.uni-leipzig.de> wrote:

Hi,

what about

SELECT *
FROM NAMED <g1>
FROM NAMED <g2>
FROM NAMED  ...
FROM NAMED <gn>
{
   GRAPH ?g {
       ...
   }
}

or

SELECT *
{
  VALUES ?g {<g1> <g2> ... <gn>}
   GRAPH ?g {
     ...
   }
}


does that work better?

On 19.03.24 15:21, Jim Balhoff wrote:

Hi Andy,

On Mar 19, 2024, at 5:02 AM, Andy Seaborne <a...@apache.org> wrote:
Hi Jim,

What happens if you use GRAPH rather than FROM?

WHERE {
   GRAPH <http://example.org/ubergraph> {
     ?cell rdfs:subClassOf cell: .
     ?cell part_of: ?organ .
     ?organ rdfs:subClassOf organ: .
     ?organ part_of: abdomen: .
     ?cell rdfs:label ?cell_label .
     ?organ rdfs:label ?organ_label .
   }
}

This does help. With TDB this is actually faster than using the default graph. 
With the HDT setup it’s about the same (fast). But it doesn’t work that well 
for what I’m trying to do (below).

FROM builds a "view dataset" which is general purpose (e.g. multiple FROM are 
possible) but which is less efficient for basic graph pattern matching. It does not use 
the TDB2 basic graph pattern matcher.

GRAPH restricts to a single graph and the query goes direct to TDB2 basic graph 
pattern matcher.

----

If there is only one name graph, is here a reason to have it as a named graph? 
Using the default graph and no unionDefaultGraph may be

What I am really trying to do is have suite of large graphs that I can choose 
to include or not in a particular query, depending on what data sources I want 
to use in the query. I have several HDT files, one for each data source. I set 
this up as a dataset with a named graph for each data file, and was at first 
very happy with how it performed while turning on and off graphs using FROM 
lines. For example I have Wikidata in one HDT file, and it looks like having it 
available doesn’t slow down queries on other graphs when it’s not included. 
However I did see that performance issue in the query I asked about, and found 
it wasn’t related to having multiple graphs loaded; it happens even with just 
that one graph configured.

If I wrote my own server that accepted a list of data source names in a query 
parameter, and then for each request constructed a union model for executing 
the query over the required HDT graphs, would that work any better? Or is that 
basically the same as what FROM is doing?

Thank you,
Jim

--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany

Re: query performance on named graph vs. default graph

Reply via email to