Hi, I'm wondering whether or not using xsd:long instead of xsd:dataTime with timestamps mapped to milliseconds in numerical form would not perform better.
Best, Piotr pt., 28 sie 2020 o 13:15 Andy Seaborne <[email protected]> napisał(a): > > > On 27/08/2020 09:10, Élie Roux wrote: > > Dear all, > > > > I have a dataset with (among other things) about 400,000 triples in the > form > > (in memory or TDB?) > > > > > ?a adm:logDate ?d > > > > where ?d is an xsd:dateTime. I'm writing a query to get all the > > triples that have a ?d in a certain interval. There are usually very > > few of them (around say 200). I'm writing a query that looks like > > > > construct { > > ?va adm:hasactivityon ?d . > > } where { > > ?le adm:logDate ?d . > > FILTER(?d > "2020-08-01T00:00:00"^^xsd:dateTime) > > ?va adm:logEntry ?le . > > } > > > > But it's too slow for our purpose (3.5s). I suspect it's conceptually > > simple to have very performant implementation (using an index > > dedicated to xsd:dateTime literal that could be queried), but I also > > suspect SPARQL doesn't make that kind of performant algorithm to > > summon in such a query (which is a mix of a bgp and a filter instead > > of a direct call to a performant index). > > > > So a few questions: > > - are there other ways to write this query to make it more performant? > Not in ARQ ubnelss there are less adm:logEntry triples. > > > - my impression is that what I want with time is similar to what > > GeoSPARQL provides for space... is there something similar to > > GeoSPARQL for time? > > - would that kind of performant index require the same type of > > mechanism as the jena:text extension? > > The access to data triples is (S,P,O) where any of S/P/O can be ANY. > So you have (ANY, adm:logDate, ANY) > > Ideally, that would be for TDB: > > (ANY, adm:logDate, ANY, start O at "2020-08-01T00:00:00"^^xsd:dateTime) > > or generally > > (ANY, adm:logDate, ANY, min O, max O) > > There are complications with illegal literals, mixed types, and encoding > restrictions etc. but in TDB2 "2020-08-01T00:00:00"^^xsd:dateTime is > stored inline in the ) slot as binary so the index is partially sorted > for valid data, > > There are precision limits in the encoding for XSD datatime: > Only to millesecond accuracy, timezones must be units of 15 min (which > true for all valid tz at the time of writing). > > Invalid terms are not recorded inline. They are recorded faithfully but > it means the abbreviated range isn't going to see them. > > > - is it worth reporting this on the SPARQL 1.2 github repo? > > It is an implementation issue, not a language design issue. > > > > > Thanks in advance, > > > > Andy >
