This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new af6c1ec7c79 [SPARK-43751][SQL][DOC] Document `unbase64` behavior change af6c1ec7c79 is described below commit af6c1ec7c795584c28e15e4963eed83917e2f06a Author: Cheng Pan <cheng...@apache.org> AuthorDate: Fri May 26 11:33:38 2023 +0800 [SPARK-43751][SQL][DOC] Document `unbase64` behavior change ### What changes were proposed in this pull request? After SPARK-37820, `select unbase64("abcs==")`(malformed input) always throws an exception, this PR does not help in that case, it only improves the error message for `to_binary()`. So, `unbase64()`'s behavior for malformed input changed silently after SPARK-37820: - before: return a best-effort result, because it uses [LENIENT](https://github.com/apache/commons-codec/blob/rel/commons-codec-1.15/src/main/java/org/apache/commons/codec/binary/Base64InputStream.java#L46) policy: any trailing bits are composed into 8-bit bytes where possible. The remainder are discarded. - after: throw an exception And there is no way to restore the previous behavior. To tolerate the malformed input, the user should migrate `unbase64(<input>)` to `try_to_binary(<input>, 'base64')` to get NULL instead of interrupting by exception. ### Why are the changes needed? Add the behavior change to migration guide. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Manuelly review. Closes #41280 from pan3793/SPARK-43751. Authored-by: Cheng Pan <cheng...@apache.org> Signed-off-by: Kent Yao <y...@apache.org> --- docs/sql-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 80df50273a1..58627801fc7 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -91,6 +91,8 @@ license: | - Since Spark 3.3, the precision of the return type of round-like functions has been fixed. This may cause Spark throw `AnalysisException` of the `CANNOT_UP_CAST_DATATYPE` error class when using views created by prior versions. In such cases, you need to recreate the views using ALTER VIEW AS or CREATE OR REPLACE VIEW AS with newer Spark versions. + - Since Spark 3.3, the `unbase64` function throws error for a malformed `str` input. Use `try_to_binary(<str>, 'base64')` to tolerate malformed input and return NULL instead. In Spark 3.2 and earlier, the `unbase64` function returns a best-efforts result for a malformed `str` input. + - Since Spark 3.3.1 and 3.2.3, for `SELECT ... GROUP BY a GROUPING SETS (b)`-style SQL statements, `grouping__id` returns different values from Apache Spark 3.2.0, 3.2.1, 3.2.2, and 3.3.0. It computes based on user-given group-by expressions plus grouping set columns. To restore the behavior before 3.3.1 and 3.2.3, you can set `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`. For details, see [SPARK-40218](https://issues.apache.org/jira/browse/SPARK-40218) and [SPARK-40562](https:/ [...] ## Upgrading from Spark SQL 3.1 to 3.2 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org