maropu commented on a change in pull request #31859:
URL: https://github.com/apache/spark/pull/31859#discussion_r595777484
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
##########
@@ -191,9 +194,35 @@ object AnsiTypeCoercion extends TypeCoercionBase {
case (DateType, TimestampType) => Some(TimestampType)
// When we reach here, input type is not acceptable for any types in
this type collection,
- // try to find the first one we can implicitly cast.
+ // first try to find the all the expected types we can implicitly cast:
+ // 1. if there is no convertible data types, return None;
+ // 2. if there is only one convertible data type, cast input as it;
+ // 3. otherwise if there are multiple convertible data types, find the
narrowest common data
+ // type among them. If there is no such narrowest common data type,
return None.
case (_, TypeCollection(types)) =>
- types.flatMap(implicitCast(inType, _, isInputFoldable)).headOption
+ // Since Spark contains special objects like `NumericType` and
`DecimalType`, which accepts
+ // multiple types and they are `AbstractDataType` instead of
`DataType`, here we use the
+ // conversion result their representation.
+ val convertibleTypes = types.flatMap(implicitCast(inType, _,
isInputFoldable))
+ if (convertibleTypes.isEmpty) {
+ None
+ } else {
+ // find the narrowest common data type, which can be implicit cast
to all other
+ // convertible types.
+ val narrowestCommonType = convertibleTypes.find { dt =>
+ convertibleTypes.forall { target =>
+ implicitCast(dt, target, isInputFoldable = false).isDefined
+ }
+ }
Review comment:
It seems there is no explicit definition of `narrowest` data types. I'm
not 100% sure about this logic though, does
the `find` method (finding a first data type) works fine for any input?
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
##########
@@ -191,9 +194,35 @@ object AnsiTypeCoercion extends TypeCoercionBase {
case (DateType, TimestampType) => Some(TimestampType)
// When we reach here, input type is not acceptable for any types in
this type collection,
- // try to find the first one we can implicitly cast.
+ // first try to find the all the expected types we can implicitly cast:
+ // 1. if there is no convertible data types, return None;
+ // 2. if there is only one convertible data type, cast input as it;
+ // 3. otherwise if there are multiple convertible data types, find the
narrowest common data
+ // type among them. If there is no such narrowest common data type,
return None.
case (_, TypeCollection(types)) =>
- types.flatMap(implicitCast(inType, _, isInputFoldable)).headOption
+ // Since Spark contains special objects like `NumericType` and
`DecimalType`, which accepts
+ // multiple types and they are `AbstractDataType` instead of
`DataType`, here we use the
+ // conversion result their representation.
+ val convertibleTypes = types.flatMap(implicitCast(inType, _,
isInputFoldable))
+ if (convertibleTypes.isEmpty) {
+ None
+ } else {
+ // find the narrowest common data type, which can be implicit cast
to all other
+ // convertible types.
+ val narrowestCommonType = convertibleTypes.find { dt =>
+ convertibleTypes.forall { target =>
+ implicitCast(dt, target, isInputFoldable = false).isDefined
+ }
+ }
Review comment:
It seems there is no explicit definition of `narrowest` data types. I'm
not 100% sure about this logic though, does the `find` method (finding a first
data type) works fine for any input?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]