Smalyshev added a comment. > Item 1 will ensure that we can work with the dates as they will most likely > be when we enter production.
I'm not sure what is the basis of this assertion? I didn't see any plans for BlazeGraph to move to new standard in the near term, and the same for Virtuoso. I will create an issue with Blazegraph but even if/when we convince them to move (as this would mean everybody that uses older 1.0 format will not be able to receive correct results from RDF anymore, I'm not sure they'd be eager to switch) it may take time, and more realistically we'd probably to have to rely on our own date handling eventually. However, this is for BlazeGraph, but loading the same data into Virtuoso would still produce broken results for any BCE date. More importantly, do we have a commitment that Wikidata data format will be changed in the very near future so that all stored dates are actually valid proleptic Gregorian? We've initiated this discussion very recently and I didn't see any definite resolution yet, much less commitment for the time when the data would actually be like this (and how Julian dates will be represented in this case). Do you propose that until this happens, our date values in RDF dump for every Julian date, evern BCE date, and all dates like yyyy-02-31 should be unusuable? I'm not sure how that improves anything. > It would be a waste of programming time to work around issues that others are > already trying to fix as we speak. There's no programming time - current model is already implemented (and Julian handling too). Of course, it's not a problem to change it, but for that we need to know what we're changing it to and how it would work. Assuming that everything is ISO 8601:2000 when in fact it is not does not seem like correct way - BlazeGraph would just reject invalid dates (so, not useful for queries), and even worse - it would consume current BCE dates not in ISO 8601:2000 but in its own understanding of dates, which will make all queries touching 1BCE and below invalid, at least until we implement custom date handling. I'm not sure I understand why this is the best option, unless we //know// all internal dates move to ISO 8601:2000 very soon. > I agree with you that some dates will not be interpreted as intended, but > this is unavoidable It is true that there will be invalid dates. But you seem to be proposing just ignoring this fact and put them in the data - with full knowledge they are invalid and either can not be consumed by the query engine or, even worse, will be interpreted incorrectly by the query engine. I am proposing to try and fix those that we can so that we allow the query engine to make sense for most of them. For others, to just provide string representation and not claim that some random assembly of characters that we can not validate is actually xsd:dateTime. > Your finding of "0000" years in our Virtuoso instance is quite peculiar Virtuoso seems to be able to import invalid dates. I'm not sure if it's actually able to index it (probably not but I can check). However other tools can reject them or even fail the whole import. TASK DETAIL https://phabricator.wikimedia.org/T94064 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: Lydia_Pintscher, Denny, Manybubbles, daniel, mkroetzsch, Smalyshev, JanZerebecki, Aklapper, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
