[spark] branch master updated: [SPARK-43751][SQL][DOC] Document `unbase64` behavior change

yao Thu, 25 May 2023 20:34:02 -0700

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new af6c1ec7c79 [SPARK-43751][SQL][DOC] Document `unbase64` behavior change
af6c1ec7c79 is described below

commit af6c1ec7c795584c28e15e4963eed83917e2f06a
Author: Cheng Pan <cheng...@apache.org>
AuthorDate: Fri May 26 11:33:38 2023 +0800

    [SPARK-43751][SQL][DOC] Document `unbase64` behavior change
    
    ### What changes were proposed in this pull request?
    
    After SPARK-37820, `select unbase64("abcs==")`(malformed input) always 
throws an exception, this PR does not help in that case, it only improves the 
error message for `to_binary()`.
    
    So, `unbase64()`'s behavior for malformed input changed silently after 
SPARK-37820:
    - before: return a best-effort result, because it uses 
[LENIENT](https://github.com/apache/commons-codec/blob/rel/commons-codec-1.15/src/main/java/org/apache/commons/codec/binary/Base64InputStream.java#L46)
 policy: any trailing bits are composed into 8-bit bytes where possible. The 
remainder are discarded.
    - after: throw an exception
    
    And there is no way to restore the previous behavior. To tolerate the 
malformed input, the user should migrate `unbase64(<input>)` to 
`try_to_binary(<input>, 'base64')` to get NULL instead of interrupting by 
exception.
    
    ### Why are the changes needed?
    
    Add the behavior change to migration guide.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes.
    
    ### How was this patch tested?
    
    Manuelly review.
    
    Closes #41280 from pan3793/SPARK-43751.
    
    Authored-by: Cheng Pan <cheng...@apache.org>
    Signed-off-by: Kent Yao <y...@apache.org>
---
 docs/sql-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 80df50273a1..58627801fc7 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -91,6 +91,8 @@ license: |
 
   - Since Spark 3.3, the precision of the return type of round-like functions 
has been fixed. This may cause Spark throw `AnalysisException` of the 
`CANNOT_UP_CAST_DATATYPE` error class when using views created by prior 
versions. In such cases, you need to recreate the views using ALTER VIEW AS or 
CREATE OR REPLACE VIEW AS with newer Spark versions.
 
+  - Since Spark 3.3, the `unbase64` function throws error for a malformed 
`str` input. Use `try_to_binary(<str>, 'base64')` to tolerate malformed input 
and return NULL instead. In Spark 3.2 and earlier, the `unbase64` function 
returns a best-efforts result for a malformed `str` input.
+
   - Since Spark 3.3.1 and 3.2.3, for `SELECT ... GROUP BY a GROUPING SETS 
(b)`-style SQL statements, `grouping__id` returns different values from Apache 
Spark 3.2.0, 3.2.1, 3.2.2, and 3.3.0. It computes based on user-given group-by 
expressions plus grouping set columns. To restore the behavior before 3.3.1 and 
3.2.3, you can set `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`. For 
details, see [SPARK-40218](https://issues.apache.org/jira/browse/SPARK-40218) 
and [SPARK-40562](https:/ [...]
 
 ## Upgrading from Spark SQL 3.1 to 3.2


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43751][SQL][DOC] Document `unbase64` behavior change

Reply via email to