[GitHub] spark pull request #20831: [SPARK-23614][SQL] Fix incorrect reuse exchange w...

viirya Tue, 20 Mar 2018 20:56:34 -0700

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20831#discussion_r175980243
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
    @@ -169,7 +174,10 @@ case class InMemoryTableScanExec(
       override def outputOrdering: Seq[SortOrder] =
         
relation.child.outputOrdering.map(updateAttribute(_).asInstanceOf[SortOrder])
     
    -  private def statsFor(a: Attribute) = 
relation.partitionStatistics.forAttribute(a)
    +  // When we make canonicalized plan, we can't find a normalized attribute 
in this map.
    +  // We return a `ColumnStatisticsSchema` for normalized attribute in this 
case.
    --- End diff --
    
    I've tried that at beginning. However, `partitionFilters` uses 
`buildFilter`. Making `partitionFilters` a lazy doesn't work because when do 
`copy`, the initialization of `InMemoryTableScanExec` will try to materialize 
`partitionFilters` for coping it value.
    
    Making `partitionFilters`, `buildFilter` as methods is not enough too, we 
also need to remove `@transient` from `relation` and 
`InMemoryRelation.partitionStatistics`. So I think it isn't worth and leave it 
as is.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20831: [SPARK-23614][SQL] Fix incorrect reuse exchange w...

Reply via email to