cfmcgrady commented on a change in pull request #32488:
URL: https://github.com/apache/spark/pull/32488#discussion_r635736572
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -89,10 +89,11 @@ import org.apache.spark.sql.types._
*/
object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan =
plan.transformWithPruning(
- _.containsPattern(BINARY_COMPARISON), ruleId) {
+ _.containsAnyPattern(BINARY_COMPARISON, IN), ruleId) {
Review comment:
not really!
For instance:
```scala
spark.range(50)
.write
.mode("overwrite")
.parquet("/tmp/parquet/t1")
val condition = InSet($"id".expr, Set(1, 2, "4"))
val df = spark.read.parquet("/tmp/parquet/t1")
.filter(Column(condition))
df.queryExecution.optimizedPlan foreach {
case f: Filter =>
val inset = f.condition.asInstanceOf[InSet]
println(s"InSet.value.dataType: [ ${inset.child.dataType} ]")
println("InSet.hset.Type: " +
inset.hset.toArray.map(_.getClass.getCanonicalName).mkString("[ ", ",", " ]"))
case _ =>
}
```
Output:
```
InSet.value.dataType: [ LongType ]
InSet.hset.Type: [ java.lang.Integer,java.lang.Integer,java.lang.String ]
```
I also found that
1. Spark SQL has no syntax for `InSet` predicate, and
`org.apache.spark.sql.catalyst.dsl.scala` don't have either.
2. The answer to this query is incorrect.
```
actual: expected;
+---+ +---+
| id| | id|
+---+ +---+
| 1| | 1|
| 2| | 2|
+---+ | 4|
+---+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]