Ngone51 commented on a change in pull request #28311:
URL: https://github.com/apache/spark/pull/28311#discussion_r413618292
##########
File path: docs/sql-migration-guide.md
##########
@@ -81,7 +81,7 @@ license: |
- In Spark version 2.4 and below, you can create a map with duplicated keys
via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of
map with duplicated keys is undefined, for example, map look up respects the
duplicated key appears first, `Dataset.collect` only keeps the duplicated key
appears last, `MapKeys` returns duplicated keys, etc. In Spark 3.0, Spark
throws `RuntimeException` when duplicated keys are found. You can set
`spark.sql.mapKeyDedupPolicy` to `LAST_WIN` to deduplicate map keys with last
wins policy. Users may still read map values with duplicated keys from data
sources which do not enforce it (for example, Parquet), the behavior is
undefined.
- - In Spark 3.0, using `org.apache.spark.sql.functions.udf(AnyRef, DataType)`
is not allowed by default. Set `spark.sql.legacy.allowUntypedScalaUDF` to true
to keep using it. In Spark version 2.4 and below, if
`org.apache.spark.sql.functions.udf(AnyRef, DataType)` gets a Scala closure
with primitive-type argument, the returned UDF returns null if the input values
is null. However, in Spark 3.0, the UDF returns the default value of the Java
type if the input value is null. For example, `val f = udf((x: Int) => x,
IntegerType)`, `f($"x")` returns null in Spark 2.4 and below if column `x` is
null, and return 0 in Spark 3.0. This behavior change is introduced because
Spark 3.0 is built with Scala 2.12 by default.
+ - In Spark 3.0, using `org.apache.spark.sql.functions.udf(AnyRef, DataType)`
is not allowed by default. Remove your dataType info to automatically switch to
typed Scala udf is recommended. Or, set `spark.sql.legacy.allowUntypedScalaUDF`
to true to keep using it. In Spark version 2.4 and below, if
`org.apache.spark.sql.functions.udf(AnyRef, DataType)` gets a Scala closure
with primitive-type argument, the returned UDF returns null if the input values
is null. However, in Spark 3.0, the UDF returns the default value of the Java
type if the input value is null. For example, `val f = udf((x: Int) => x,
IntegerType)`, `f($"x")` returns null in Spark 2.4 and below if column `x` is
null, and return 0 in Spark 3.0. This behavior change is introduced because
Spark 3.0 is built with Scala 2.12 by default.
Review comment:
Diff: added `Remove your dataType info to automatically switch to typed
Scala udf is recommended.`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]