Smalyshev added a comment.

> Item 1 will ensure that we can work with the dates as they will most likely 
> be when we enter production.


I'm not sure what is the basis of this assertion? I didn't see any plans for 
BlazeGraph to move to new standard in the near term, and the same for Virtuoso. 
I will create an issue with Blazegraph but even if/when we convince them to 
move (as this would mean everybody that uses older 1.0 format will not be able 
to receive correct results from RDF anymore, I'm not sure they'd be eager to 
switch) it may take time, and more realistically we'd probably to have to rely 
on our own date handling eventually. However, this is for BlazeGraph, but 
loading the same data into Virtuoso would still produce broken results for any 
BCE date.

More importantly, do we have a commitment that Wikidata data format will be 
changed in the very near future so that all stored dates are actually valid 
proleptic Gregorian? We've initiated this discussion very recently and I didn't 
see any definite resolution yet, much less commitment for the time when the 
data would actually be like this (and how Julian dates will be represented in 
this case). Do you propose that until this happens, our date values in RDF dump 
for every Julian date, evern BCE date, and all dates like yyyy-02-31 should be 
unusuable? I'm not sure how that improves anything.

> It would be a waste of programming time to work around issues that others are 
> already trying to fix as we speak.


There's no programming time - current model is already implemented (and Julian 
handling too). Of course, it's not a problem to change it, but for that we need 
to know what we're changing it to and how it would work. Assuming that 
everything is ISO 8601:2000 when in fact it is not does not seem like correct 
way - BlazeGraph would just reject invalid dates (so, not useful for queries), 
and even worse - it would consume current BCE dates not in ISO 8601:2000 but in 
its own understanding of dates, which will make all queries touching 1BCE and 
below invalid, at least until we implement custom date handling. I'm not sure I 
understand why this is the best option, unless we //know// all internal dates 
move to  ISO 8601:2000 very soon.

> I agree with you that some dates will not be interpreted as intended, but 
> this is unavoidable


It is true that there will be invalid dates. But you seem to be proposing just 
ignoring this fact and put them in the data - with full knowledge they are 
invalid and either can not be consumed by the query engine or, even worse, will 
be interpreted incorrectly by the query engine. I am proposing to try and fix 
those that we can so that we allow the query engine to make sense for most of 
them. For others, to just provide string representation and not claim that some 
random assembly of characters that we can not validate is actually xsd:dateTime.

> Your finding of "0000" years in our Virtuoso instance is quite peculiar


Virtuoso seems to be able to import invalid dates. I'm not sure if it's 
actually able to index it (probably not but I can check). However other tools 
can reject them or even fail the whole import.


TASK DETAIL
  https://phabricator.wikimedia.org/T94064

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, 
JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to