[GitHub] [spark] mingjialiu commented on a change in pull request #29564: [WIP][SPARK-32708] Query optimization fails to reuse exchange with DataSourceV2

GitBox Wed, 02 Sep 2020 08:42:05 -0700


mingjialiu commented on a change in pull request #29564:
URL: https://github.com/apache/spark/pull/29564#discussion_r482172221




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
##########
@@ -54,7 +55,7 @@ case class DataSourceV2Relation(
     tableIdent.map(_.unquotedString).getOrElse(s"${source.name}:unknown")
   }
 
-  override def pushedFilters: Seq[Expression] = Seq.empty
+  override def pushedFilters: Seq[Filter] = Seq.empty

Review comment:
       More explanation. Why changing from Expression to 
org.apache.spark.sql.sources.Filter
   
   DataSourceV2ScanExec.pushedFilters are defined as array of Expressions whose 
equal function has expression_id in scope. So for example, Expression 
isnotnull(d_day_name#22364) is not considered equal to 
isnotnull(d_day_name#22420). Therefore, the right thing is to define and 
compare pushedFilter as Filter class.
   
   At both Spark 3.0 and affected Spark 2.4's tests suite, Filter is the class 
being used. And the above 4 places seem to be the only places that miss to have 
pushedFilter as class Filter.
   (Because pushedFilters are defined the right way in the above test suite, 
Spark 32708 was not caught by my tests previously added for SPARK-32609, 
another exchange reuse bug.
   
   Usage of Expression was introduced by PR [SPARK-23203][SQL] DataSourceV2: 
Use immutable logical plan. From the PR's description and original intention, I 
don't see a necessary reason to maintain Expression.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mingjialiu commented on a change in pull request #29564: [WIP][SPARK-32708] Query optimization fails to reuse exchange with DataSourceV2

Reply via email to