Beebs.systap added a comment. In https://phabricator.wikimedia.org/T88717#1037721, @Manybubbles wrote:
> Looks like we're going to have trouble with some dates too. xsd:dateTime > <http://www.w3.org/TR/xmlschema-2/#dateTime-lexical-representation> supports > 13798 million years BCE <https://www.wikidata.org/wiki/Q1#P580> but I think > BigData will have trouble with it what with this comment from > DateTimeExtension: > > /** > * This implementation of {@link IExtension} implements inlining for > literals > * that represent xsd:dateTime literals. These literals will be stored as > time > * in milliseconds since the epoch. The milliseconds are encoded as an > inline > * long. > */ > > > Not that I've had a chance to test it yet. It'd be one of the first things > imported during a full import of the statements. I see in the RDF dump on > labs its actually a xsd:gYear type though. From Bryan Thompson. The native xsd:dateTime support is based on an int64 value. It does support negative int64 values, which is what a date before the epoch is translated into. When using the xsd:dateTime inlining, what happens is that dates are not entered into the dictionary. They appear as inline values within the statement indices instead. This avoids a dictionary lookup for date materialization. It also lets us use the OSP index for key-range scans on xsd:dateTIme values. If you need to go beyond an int64 value, then the graph database can also inline xsd:integer values (BigInteger). This would allow general cosmology dates. One limitation of this approach (which is completely optional and which can be disabled using AbstractTripleStore.Options.INLINE_DATE_TIMES) is that the dateTime is converted to a point value - the timezone information needs to be normalized. This would also be true of any custom inlining scheme developed for xsd:integer rather than xsd:long. The general problem is that xsd:dateTime specifies two dimensions (a point in time and a timezone) and allows the timezone to be optional. Oops. There is no way to translate that into a single point, which is what you need to be able to compare values in an index, do key-range scans, etc. You have some options. You can disable dateTime inlining. This will preserve all information. You can store both the non-inline version (with timezone information) and the inline version (by using different predicates). This would preserve the opportunity for key range scans on date while also preserving timezone information. And if necessary you could use an alternative inlining scheme for dates in the extreme past or future. Bryan TASK DETAIL https://phabricator.wikimedia.org/T88717 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, Beebs.systap Cc: Jdouglas, Beebs.systap, Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, aude, GWicke, daniel, JanZerebecki _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
