Re: shortcut for querying dates fast?

Piotr Nowara Fri, 28 Aug 2020 05:34:38 -0700

Hi,

I'm wondering whether or not using xsd:long instead of xsd:dataTime with
timestamps mapped to milliseconds in numerical form would not perform
better.


Best,
Piotr

pt., 28 sie 2020 o 13:15 Andy Seaborne <[email protected]> napisał(a):

>
>
> On 27/08/2020 09:10, Élie Roux wrote:
> > Dear all,
> >
> > I have a dataset with (among other things) about 400,000 triples in the
> form
>
> (in memory or TDB?)
>
> >
> > ?a adm:logDate ?d
> >
> > where ?d is an xsd:dateTime. I'm writing a query to get all the
> > triples that have a ?d in a certain interval. There are usually very
> > few of them (around say 200). I'm writing a query that looks like
> >
> > construct {
> >      ?va  adm:hasactivityon ?d .
> > } where {
> >      ?le adm:logDate ?d .
> >       FILTER(?d > "2020-08-01T00:00:00"^^xsd:dateTime)
> >      ?va adm:logEntry ?le .
> > }
> >
> > But it's too slow for our purpose (3.5s). I suspect it's conceptually
> > simple to have very performant implementation (using an index
> > dedicated to xsd:dateTime literal that could be queried), but I also
> > suspect SPARQL doesn't make that kind of performant algorithm to
> > summon in such a query (which is a mix of a bgp and a filter instead
> > of a direct call to a performant index).
> >
> > So a few questions:
> > - are there other ways to write this query to make it more performant?
> Not in ARQ ubnelss there are less adm:logEntry triples.
>
> > - my impression is that what I want with time is similar to what
> > GeoSPARQL provides for space... is there something similar to
> > GeoSPARQL for time?
> > - would that kind of performant index require the same type of
> > mechanism as the jena:text extension?
>
> The access to data triples is (S,P,O) where any of S/P/O can be ANY.
> So you have (ANY, adm:logDate, ANY)
>
> Ideally, that would be for TDB:
>
> (ANY, adm:logDate, ANY, start O at "2020-08-01T00:00:00"^^xsd:dateTime)
>
> or generally
>
> (ANY, adm:logDate, ANY, min O, max O)
>
> There are complications with illegal literals, mixed types, and encoding
> restrictions  etc. but in TDB2 "2020-08-01T00:00:00"^^xsd:dateTime is
> stored inline in the ) slot as binary so the index is partially sorted
> for valid data,
>
> There are precision limits in the encoding for XSD datatime:
> Only to millesecond accuracy, timezones must be units of 15 min (which
> true for all valid tz at the time of writing).
>
> Invalid terms are not recorded inline. They are recorded faithfully but
> it means the abbreviated range isn't going to see them.
>
> > - is it worth reporting this on the SPARQL 1.2 github repo?
>
> It is an implementation issue, not a language design issue.
>
> >
> > Thanks in advance,
> >
>
>      Andy
>

Re: shortcut for querying dates fast?

Reply via email to