Re: Inline Values and XSD Time Series

Andy Seaborne Thu, 01 Mar 2018 04:23:58 -0800


On 28/02/18 17:53, Marco Neumann wrote:

thank you, it's less than I hoped for


Concrete example?

but certainly more than what I
can ask for Andy :)

In short I'd like to get the xsd:dateTime scan out of the sparql
filter and perform a more efficient range via a date index similar to
the jena spatial implementation.

I am going to take a look at DateRangeField  and see how it performs
relative to a standard sparql filter range query.

best,
Marco


On Tue, Feb 27, 2018 at 5:21 PM, Andy Seaborne <[email protected]> wrote:


On 27/02/18 11:41, Marco Neumann wrote:


Hi Andy, (I presume you wrote the following below) could you please
elaborate on the significance of this contribution in TDB?



Hi Marco,

For certain XSD datatypes, the value is stored in the NodeId (64 bits, minus
the datatype indicator - 56 bits for TDB1, up to 62 bits for TDB2 for
xsd:doubles) itself. It is faster to get the node back out the database.

If value does not fit in the bits available, the long form is used.  In the
long form, the NodeId is a pointer into the node table and the node is
stoted as the lexical form+datatype (TDB1: in text; TDB2 in binary / RDF
Thrift). This applies to strings and URIs.


"The xsd:dateTime and xsd:date ranges cover about 8000 years from year
zero with a precision down to 1 millisecond. Timezone information is
retained to an accuracy of 15 minutes with special timezones for Z and
for no explicit timezone."



That's the limit for xsd:dataTime in 56 bits.


https://jena.apache.org/documentation/tdb/architecture.html#inline-values

does this give us enhanced temporal access methods via TDB that are
exposed as property functions in SPARQL?



What exactly are you looking for here? Range queries or a database you can
view at a point in time? ("Temporal database" can mean either.)

You get the same SPARQL file capabilities but the inline form is faster
(measurable and by quite a lot) because it does not go to the node table.
Despite caching of the node table, it is still faster to get nodes out of
the DB form the inline form (and I'd like to go faster still).

Point-on-database.

Not possible in TDB1.
Possible (but not exposed) in TDB2.  TDB2 never forgets!

In particular I'd be interested in range queries on xsd:dateTime  here
and the possible  use of DateRangeField (SOLR) along jena-spatial.



Range queries - it would be possible to start in the right place for a range
scan because the values are in sorted order under this design.

Insert complexity for the different datatypes possible - it might need a
"this is a value centric database" flag so e.g. integers, whether xsd:short
or xsd:??? are stored as binary integers loosing the datatype.

In TDB1, that's true, TDB2 does keep the original datatype. Both are valid
choices to different use cases.

Hope that answers your questions,

     Andy



Best,
Marco

Re: Inline Values and XSD Time Series

Reply via email to