revans2 commented on PR #45294: URL: https://github.com/apache/spark/pull/45294#issuecomment-2352962655
I am conflicted on this. I agree with @cloud-fan that changing the behavior to be more like the interpreted code is not ideal in a big fix release is we are worried about consistency for the user. But at the same time this is totally a corner case and I cannot think of a real world situation where the truncation is not better than overflowing. The overflow starts to happen sometime in the year 292278994, so effectively no legitimate timestamp, should ever hit this. Yes, estimates for the heat death of the universe are after that. So, then I have to think what are the cases where this could become a problem for a user. Perhaps detecting and filtering out bad dates/timestamps? I know that this can happen in practice. But the simplest way to do that is to compare the timestamp against an allowed range, and that does not involve casting the timestamp to seconds since the epoch. So it would have to be a case where a user wants seconds since the epoch and is doing filtering/cleanup after the conversion. But in this case the truncation makes it 100% guaranteed to catch all bad timestamps because overflow could covert a timestamp back to a "valid" one. Unless your "valid" range is outside of the overflow/underflow range. The only use case I can think of where the overflow is better is if someone is trying to detect the overflow to mark the conversion as bad (essentially look for a change in the sign of the value after the conversion). But that would also imply that the timestamp is good, which I really doubt. Perhaps there are some physics or sci-fi datasets out there where this really would be valid. I just don't know. For me I would vote to keep this change as is, but perhaps I have too limited of a view on how this is being used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
