bart-samwel commented on pull request #28568: URL: https://github.com/apache/spark/pull/28568#issuecomment-630247808
@cloud-fan @MaxGekk FYI There's also PR #28534, which tries to solve the same thing using explicit functions. To be honest, I'm not a big fan of using compatibility flags unless we're actually planning to deprecate the old behavior and change the behavior by default. Realistically, the next time we can change the default behavior is in Spark 4.0, which is likely to be several years out. And until then, throughout the Spark 3.x line, you may have Spark deployments out there where some query unexpectedly has different semantics than on other Spark deployments. The behavior change also doesn't stick if you then port that same workload over to other deployments of Spark, and given that it's not made explicit in the queries what they mean, and there's no errors, you may silently produce incorrect results after changing the deployment. If anything, I'd be in favor of: - Doing the thing from PR #28534 (adding `TIMESTAMP_FROM_SECONDS` etc.). - If we really care enough to change the behavior (and hence break existing workloads), we should use a legacy compatibility flag that disables this CAST by default, and to let people choose between the (legacy) Spark behavior or the (new) Hive behavior. With the strong advice in the "this is disabled" error message to migrate to the functions above instead and to leave the setting at "disabled". Then people can shoot themselves in the foot if they really want to, but then at least we told them so. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
