Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20485#discussion_r165578555
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
 ---
    @@ -99,15 +100,22 @@ object PushDownOperatorsToDataSource extends 
Rule[LogicalPlan] with PredicateHel
     
           case relation: DataSourceV2Relation => relation.reader match {
             case reader: SupportsPushDownRequiredColumns =>
    +          // TODO: Enable the below assert after we make 
`DataSourceV2Relation` immutable. Fow now
    +          // it's possible that the mutable reader being updated by 
someone else, and we need to
    +          // always call `reader.pruneColumns` here to correct it.
    +          // assert(relation.output.toStructType == reader.readSchema(),
    +          //  "Schema of data source reader does not match the relation 
plan.")
    +
               val requiredColumns = 
relation.output.filter(requiredByParent.contains)
               reader.pruneColumns(requiredColumns.toStructType)
    +          relation.copy(output = requiredColumns)
    --- End diff --
    
    @rdblue This is the bug I mentioned before. Finally I figured out a way to 
fix it surgically: always run column pruning even no column needs to be pruned. 
This helps us correct the required schema of the reader, if it's updated by 
someone else.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to