ashrithb opened a new pull request, #53373: URL: https://github.com/apache/spark/pull/53373
### What changes were proposed in this pull request? In ANSI mode, `avg(timestamp)` and `sum(timestamp)` now fail with a proper type error instead of silently casting to double. ### Why are the changes needed? Per SQL standards, AVG is only defined for numeric types. The current behavior (inherited from Hive compatibility) silently casts timestamp to epoch seconds as a double, which: 1. Returns confusing results (e.g., `-979200.0` for `1969-12-21`) 2. Violates ANSI SQL semantics 3. Differs from other databases like PostgreSQL which reject this ### Does this PR introduce any user-facing change? Yes. In ANSI mode (`spark.sql.ansi.enabled=true`), queries like `SELECT avg(timestamp_col) FROM table` will now fail with: DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE: The 1st parameter requires the NUMERIC or ANSI INTERVAL type, however timestamp_col has the type TIMESTAMP. Non-ANSI mode behavior is unchanged (backward compatible). ### How was this patch tested? - Added unit tests in `AnsiTypeCoercionSuite` - Added unit tests in `TypeCoercionSuite` - All existing tests pass ### Was this patch authored or co-authored using generative AI tooling? No, except for PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
