Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21331#discussion_r189407673
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
---
@@ -85,6 +87,14 @@ object Canonicalize {
case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
+ // order the list in the In operator
+ // we can do this only if all the elements in the list are literals
with the same datatype
+ case i @ In(value, list)
+ if i.inSetConvertible &&
list.map(_.dataType.asNullable).distinct.size == 1 =>
--- End diff --
thanks for your comment @dongjoon-hyun, but I am not sure I agree with you.
What if we have something like ` in (array(null, 1), array(1, 2, 3), array(3,
2, 1))`? The first literal would contain an array which can contain nulls while
the others would not be, so in this case we would have 2 distinct datatypes
(because of nullability).
Am I missing something? Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]