On 27/02/18 11:41, Marco Neumann wrote:
Hi Andy, (I presume you wrote the following below) could you please
elaborate on the significance of this contribution in TDB?
Hi Marco,
For certain XSD datatypes, the value is stored in the NodeId (64 bits, minus
the datatype indicator - 56 bits for TDB1, up to 62 bits for TDB2 for
xsd:doubles) itself. It is faster to get the node back out the database.
If value does not fit in the bits available, the long form is used. In the
long form, the NodeId is a pointer into the node table and the node is
stoted as the lexical form+datatype (TDB1: in text; TDB2 in binary / RDF
Thrift). This applies to strings and URIs.
"The xsd:dateTime and xsd:date ranges cover about 8000 years from year
zero with a precision down to 1 millisecond. Timezone information is
retained to an accuracy of 15 minutes with special timezones for Z and
for no explicit timezone."
That's the limit for xsd:dataTime in 56 bits.
https://jena.apache.org/documentation/tdb/architecture.html#inline-values
does this give us enhanced temporal access methods via TDB that are
exposed as property functions in SPARQL?
What exactly are you looking for here? Range queries or a database you can
view at a point in time? ("Temporal database" can mean either.)
You get the same SPARQL file capabilities but the inline form is faster
(measurable and by quite a lot) because it does not go to the node table.
Despite caching of the node table, it is still faster to get nodes out of
the DB form the inline form (and I'd like to go faster still).
Point-on-database.
Not possible in TDB1.
Possible (but not exposed) in TDB2. TDB2 never forgets!
In particular I'd be interested in range queries on xsd:dateTime here
and the possible use of DateRangeField (SOLR) along jena-spatial.
Range queries - it would be possible to start in the right place for a range
scan because the values are in sorted order under this design.
Insert complexity for the different datatypes possible - it might need a
"this is a value centric database" flag so e.g. integers, whether xsd:short
or xsd:??? are stored as binary integers loosing the datatype.
In TDB1, that's true, TDB2 does keep the original datatype. Both are valid
choices to different use cases.
Hope that answers your questions,
Andy
Best,
Marco