On 27/08/2020 09:10, Élie Roux wrote:
Dear all,

I have a dataset with (among other things) about 400,000 triples in the form

(in memory or TDB?)


?a adm:logDate ?d

where ?d is an xsd:dateTime. I'm writing a query to get all the
triples that have a ?d in a certain interval. There are usually very
few of them (around say 200). I'm writing a query that looks like

construct {
     ?va  adm:hasactivityon ?d .
} where {
     ?le adm:logDate ?d .
      FILTER(?d > "2020-08-01T00:00:00"^^xsd:dateTime)
     ?va adm:logEntry ?le .
}

But it's too slow for our purpose (3.5s). I suspect it's conceptually
simple to have very performant implementation (using an index
dedicated to xsd:dateTime literal that could be queried), but I also
suspect SPARQL doesn't make that kind of performant algorithm to
summon in such a query (which is a mix of a bgp and a filter instead
of a direct call to a performant index).

So a few questions:
- are there other ways to write this query to make it more performant?
Not in ARQ ubnelss there are less adm:logEntry triples.

- my impression is that what I want with time is similar to what
GeoSPARQL provides for space... is there something similar to
GeoSPARQL for time?
- would that kind of performant index require the same type of
mechanism as the jena:text extension?

The access to data triples is (S,P,O) where any of S/P/O can be ANY.
So you have (ANY, adm:logDate, ANY)

Ideally, that would be for TDB:

(ANY, adm:logDate, ANY, start O at "2020-08-01T00:00:00"^^xsd:dateTime)

or generally

(ANY, adm:logDate, ANY, min O, max O)

There are complications with illegal literals, mixed types, and encoding restrictions etc. but in TDB2 "2020-08-01T00:00:00"^^xsd:dateTime is stored inline in the ) slot as binary so the index is partially sorted for valid data,

There are precision limits in the encoding for XSD datatime:
Only to millesecond accuracy, timezones must be units of 15 min (which true for all valid tz at the time of writing).

Invalid terms are not recorded inline. They are recorded faithfully but it means the abbreviated range isn't going to see them.

- is it worth reporting this on the SPARQL 1.2 github repo?

It is an implementation issue, not a language design issue.


Thanks in advance,


    Andy

Reply via email to