Beebs.systap added a comment.

In https://phabricator.wikimedia.org/T88717#1037721, @Manybubbles wrote:

> Looks like we're going to have trouble with some dates too.  xsd:dateTime 
> <http://www.w3.org/TR/xmlschema-2/#dateTime-lexical-representation> supports 
> 13798 million years BCE <https://www.wikidata.org/wiki/Q1#P580> but I think 
> BigData will have trouble with it what with this comment from 
> DateTimeExtension:
>
>   /**
>    * This implementation of {@link IExtension} implements inlining for 
> literals
>    * that represent xsd:dateTime literals.  These literals will be stored as 
> time 
>    * in milliseconds since the epoch.  The milliseconds are encoded as an 
> inline 
>    * long.
>    */
>   
>
> Not that I've had a chance to test it yet.  It'd be one of the first things 
> imported during a full import of the statements.  I see in the RDF dump on 
> labs its actually a xsd:gYear type though.


From Bryan Thompson.

The native xsd:dateTime support is based on an int64 value. It does support 
negative int64 values, which is what a date before the epoch is translated 
into.  When using the xsd:dateTime inlining, what happens is that dates are not 
entered into the dictionary.  They appear as inline values within the statement 
indices instead.  This avoids a dictionary lookup for date materialization.  It 
also lets us use the OSP index for key-range scans on xsd:dateTIme values.

If you need to go beyond an int64 value, then the graph database can also 
inline xsd:integer values (BigInteger).  This would allow general cosmology 
dates.

One limitation of this approach (which is completely optional and which can be 
disabled using AbstractTripleStore.Options.INLINE_DATE_TIMES) is that the 
dateTime is converted to a point value - the timezone information needs to be 
normalized.  This would also be true of any custom inlining scheme developed 
for xsd:integer rather than xsd:long.  The general problem is that xsd:dateTime 
specifies two dimensions (a point in time and a timezone) and allows the 
timezone to be optional.  Oops.  There is no way to translate that into a 
single point, which is what you need to be able to compare values in an index, 
do key-range scans, etc.

You have some options.  You can disable dateTime inlining.  This will preserve 
all information.  You can store both the non-inline version (with timezone 
information) and the inline version (by using different predicates). This would 
preserve the opportunity for key range scans on date while also preserving 
timezone information. And if necessary you could use an alternative inlining 
scheme for dates in the extreme past or future.

Bryan


TASK DETAIL
  https://phabricator.wikimedia.org/T88717

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Beebs.systap
Cc: Jdouglas, Beebs.systap, Aklapper, Manybubbles, jkroll, Smalyshev, 
Wikidata-bugs, aude, GWicke, daniel, JanZerebecki



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to