Re: Counting all triples performance and named graphs

Andy Seaborne Wed, 21 Sep 2022 04:03:52 -0700



On 21/09/2022 08:57, Simon Bin wrote:
> Hi,
>
> we have a data set with 500 million triples. Single named graph, fuseki
> tdb2.
>
> We observe different performance between
>
> select (count(*) as ?cnt) {
>    ?s ?p ?o
> }
>
> ~4 minutes
>
> select (count(*) as ?cnt) {
>    graph <https://data.coypu.org/> { ?s ?p ?o }
> }
>
> ~3 minutes

Caching.

>
> select (count(*) as ?cnt)
> from <https://data.coypu.org/> {
>    ?s ?p ?o
> }
>
> takes forever and longer...?
>
> Especially the last case is surprising, any thoughts?

If you use FROM, there is a dataset created for the request that has one

graph that points to the TDB database but it isn't itself TDB. Every ?s?p ?o is read because access makes triples.

If it's direct to TDB, and that includes GRAPH, the count counts rowsbut does not actually fetch the values of ?s ?p ?o.


BindingTDB/BindingNodeId do lazy evaluation of variable values.

There can be multiple FROM - the machinery is general has to cope with that.

FROM != GRAPH.

    Andy

Re: Counting all triples performance and named graphs

Reply via email to