Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21805#discussion_r204212287
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 ---
    @@ -50,6 +50,8 @@ case class CachedRDDBuilder(
         tableName: Option[String])(
         @transient private var _cachedColumnBuffers: RDD[CachedBatch] = null) {
     
    +  override def toString: String = s"CachedRDDBuilder($useCompression, 
$batchSize, $storageLevel)"
    --- End diff --
    
    yea, I think the output should be the same with one in v2.3;
    ```
    scala> val df = Seq((1, 2), (3, 4)).toDF("a", "b")
    scala> val testDf = df.join(df, "a").join(df, "a").cache
    scala> testDf.groupBy("a").count().explain
    == Physical Plan ==
    *(2) HashAggregate(keys=[a#309], functions=[count(1)])
    +- Exchange hashpartitioning(a#309, 200)
       +- *(1) HashAggregate(keys=[a#309], functions=[partial_count(1)])
          +- *(1) InMemoryTableScan [a#309]
                +- InMemoryRelation [a#309, b#310, b#314, b#319], true, 10000, 
StorageLevel(disk, memory, deserialized, 1 replicas)
                      +- *(3) Project [a#60, b#61, b#212, b#217]
                         +- *(3) BroadcastHashJoin [a#60], [a#216], Inner, 
BuildRight
                            :- *(3) Project [a#60, b#61, b#212]
                            :  +- *(3) BroadcastHashJoin [a#60], [a#211], 
Inner, BuildRight
                            :     :- *(3) InMemoryTableScan [a#60, b#61]
                            :     :     +- InMemoryRelation [a#60, b#61], true, 
10000, StorageLevel(disk, memory, deserialized, 1 replicas)
                            :     :           +- LocalTableScan [a#15, b#16]
                            :     +- BroadcastExchange 
HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
                            :        +- *(1) InMemoryTableScan [a#211, b#212]
                            :              +- InMemoryRelation [a#211, b#212], 
true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
                            :                    +- LocalTableScan [a#15, b#16]
                            +- BroadcastExchange 
HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
                               +- *(2) InMemoryTableScan [a#216, b#217]
                                     +- InMemoryRelation [a#216, b#217], true, 
10000, StorageLevel(disk, memory, deserialized, 1 replicas)
                                           +- LocalTableScan [a#15, b#16]
    ``` 
    The output of this current pr is still different, so can you fix that way? 
@onursatici 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to