mingjialiu commented on a change in pull request #29564:
URL: https://github.com/apache/spark/pull/29564#discussion_r482172221
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
##########
@@ -54,7 +55,7 @@ case class DataSourceV2Relation(
tableIdent.map(_.unquotedString).getOrElse(s"${source.name}:unknown")
}
- override def pushedFilters: Seq[Expression] = Seq.empty
+ override def pushedFilters: Seq[Filter] = Seq.empty
Review comment:
More explanation. Why changing from Expression to
org.apache.spark.sql.sources.Filter
DataSourceV2ScanExec.pushedFilters are defined as array of Expressions whose
equal function has expression_id in scope. So for example, Expression
isnotnull(d_day_name#22364) is not considered equal to
isnotnull(d_day_name#22420). Therefore, the right thing is to define and
compare pushedFilter as Filter class.
At both Spark 3.0 and affected Spark 2.4's tests suite, Filter is the class
being used. And the above 4 places seem to be the only places that miss to have
pushedFilter as class Filter.
(Because pushedFilters are defined the right way in the above test suite,
Spark 32708 was not caught by my tests previously added for SPARK-32609,
another exchange reuse bug.
Usage of Expression was introduced by PR [SPARK-23203][SQL] DataSourceV2:
Use immutable logical plan. From the PR's description and original intention, I
don't see a necessary reason to maintain Expression.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]