Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21416#discussion_r190472138
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -219,7 +219,11 @@ object ReorderAssociativeOperator extends
Rule[LogicalPlan] {
object OptimizeIn extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
- case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
+ case In(v, list) if list.isEmpty =>
+ // When v is not nullable, the following expression will be
optimized
+ // to FalseLiteral which is tested in OptimizeInSuite.scala
+ If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType))
+ case In(v, list) if list.length == 1 => EqualTo(v, list.head)
--- End diff --
Why does it have any implication on typecasting? With this PR, it seems I
get the correct result.
```scala
== Analyzed Logical Plan ==
(CAST(1.1 AS STRING) IN (CAST(1 AS STRING))): boolean, (CAST(1.1 AS INT) =
1): boolean
Project [cast(1.1 as string) IN (cast(1 as string)) AS (CAST(1.1 AS STRING)
IN (CAST(1 AS STRING)))#484, (cast(1.1 as int) = 1) AS (CAST(1.1 AS INT) =
1)#485]
+- OneRowRelation
== Optimized Logical Plan ==
Project [false AS (CAST(1.1 AS STRING) IN (CAST(1 AS STRING)))#484, true AS
(CAST(1.1 AS INT) = 1)#485]
+- OneRowRelation
== Physical Plan ==
*(1) Project [false AS (CAST(1.1 AS STRING) IN (CAST(1 AS STRING)))#484,
true AS (CAST(1.1 AS INT) = 1)#485]
+- Scan OneRowRelation[]
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]