Re: query performance on named graph vs. default graph

Jim Balhoff Wed, 20 Mar 2024 17:23:56 -0700

Hi Lorenz,

These both do speed things up quite a bit, but it prevents matching patterns 
that cross graphs in the case where I include multiple graphs.


Thanks,
Jim


> On Mar 20, 2024, at 4:28 AM, Lorenz Buehmann 
> <[email protected]> wrote:
> 
> Hi,
> 
> what about
> 
> SELECT *
> FROM NAMED <g1>
> FROM NAMED <g2>
> FROM NAMED  ...
> FROM NAMED <gn>
> {
>   GRAPH ?g {
>       ...
>   }
> }
> 
> or
> 
> SELECT *
> {
>  VALUES ?g {<g1> <g2> ... <gn>}
>   GRAPH ?g {
>     ...
>   }
> }
> 
> 
> does that work better?
> 
> On 19.03.24 15:21, Jim Balhoff wrote:
>> Hi Andy,
>> 
>>> On Mar 19, 2024, at 5:02 AM, Andy Seaborne <[email protected]> wrote:
>>> Hi Jim,
>>> 
>>> What happens if you use GRAPH rather than FROM?
>>> 
>>> WHERE {
>>>   GRAPH <http://example.org/ubergraph> {
>>>     ?cell rdfs:subClassOf cell: .
>>>     ?cell part_of: ?organ .
>>>     ?organ rdfs:subClassOf organ: .
>>>     ?organ part_of: abdomen: .
>>>     ?cell rdfs:label ?cell_label .
>>>     ?organ rdfs:label ?organ_label .
>>>   }
>>> }
>>> 
>> This does help. With TDB this is actually faster than using the default 
>> graph. With the HDT setup it’s about the same (fast). But it doesn’t work 
>> that well for what I’m trying to do (below).
>> 
>>> FROM builds a "view dataset" which is general purpose (e.g. multiple FROM 
>>> are possible) but which is less efficient for basic graph pattern matching. 
>>> It does not use the TDB2 basic graph pattern matcher.
>>> 
>>> GRAPH restricts to a single graph and the query goes direct to TDB2 basic 
>>> graph pattern matcher.
>>> 
>>> ----
>>> 
>>> If there is only one name graph, is here a reason to have it as a named 
>>> graph? Using the default graph and no unionDefaultGraph may be
>> What I am really trying to do is have suite of large graphs that I can 
>> choose to include or not in a particular query, depending on what data 
>> sources I want to use in the query. I have several HDT files, one for each 
>> data source. I set this up as a dataset with a named graph for each data 
>> file, and was at first very happy with how it performed while turning on and 
>> off graphs using FROM lines. For example I have Wikidata in one HDT file, 
>> and it looks like having it available doesn’t slow down queries on other 
>> graphs when it’s not included. However I did see that performance issue in 
>> the query I asked about, and found it wasn’t related to having multiple 
>> graphs loaded; it happens even with just that one graph configured.
>> 
>> If I wrote my own server that accepted a list of data source names in a 
>> query parameter, and then for each request constructed a union model for 
>> executing the query over the required HDT graphs, would that work any 
>> better? Or is that basically the same as what FROM is doing?
>> 
>> Thank you,
>> Jim
>> 
>> 
> -- 
> Lorenz Bühmann
> Research Associate/Scientific Developer
> 
> Email [email protected]
> 
> Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
> Leipzig | Germany
>

Re: query performance on named graph vs. default graph

Reply via email to