HyukjinKwon commented on a change in pull request #22038: [SPARK-25056][SQL]
Unify the InConversion and BinaryComparison behavior
URL: https://github.com/apache/spark/pull/22038#discussion_r392801210
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
##########
@@ -491,10 +491,21 @@ object TypeCoercion {
i
}
- case i @ In(a, b) if b.exists(_.dataType != a.dataType) =>
- findWiderCommonType(i.children.map(_.dataType)) match {
- case Some(finalDataType) => i.withNewChildren(i.children.map(Cast(_,
finalDataType)))
- case None => i
+ case i @ In(value, list) if list.exists(_.dataType != value.dataType) =>
+ if
(conf.getConf(SQLConf.LEGACY_IN_PREDICATE_FOLLOW_BINARY_COMPARISON_TYPE_COERCION))
{
+ findWiderCommonType(list.map(_.dataType)) match {
+ case Some(listType) =>
+ val finalDataType =
findCommonTypeForBinaryComparison(value.dataType, listType, conf)
Review comment:
@wangyum, the behaviours between decimals and strings look good. But what
about other types affected here?
If we think about interpreting `IN` as `=` with `OR`, we should think about
other rules applied to equality comparison, for example:
```scala
// For equality between string and timestamp we cast the string to a
timestamp
// so that things like rounding of subsecond precision does not affect
the comparison.
case p @ Equality(left @ StringType(), right @ TimestampType()) =>
p.makeCopy(Array(Cast(left, TimestampType), right))
case p @ Equality(left @ TimestampType(), right @ StringType()) =>
p.makeCopy(Array(left, Cast(right, TimestampType)))
```
What do you think about fixing this issue completely rather than fixing
cases one by one? I didn't check ANSI or other DBMSs yet but I know `IN` is
able to be rewritten to `=` with `OR`. Considering that, I suspect the type
coercion will be similar too.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]