Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/21805#discussion_r204212287
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
---
@@ -50,6 +50,8 @@ case class CachedRDDBuilder(
tableName: Option[String])(
@transient private var _cachedColumnBuffers: RDD[CachedBatch] = null) {
+ override def toString: String = s"CachedRDDBuilder($useCompression,
$batchSize, $storageLevel)"
--- End diff --
yea, I think the output should be the same with one in v2.3;
```
scala> val df = Seq((1, 2), (3, 4)).toDF("a", "b")
scala> val testDf = df.join(df, "a").join(df, "a").cache
scala> testDf.groupBy("a").count().explain
== Physical Plan ==
*(2) HashAggregate(keys=[a#309], functions=[count(1)])
+- Exchange hashpartitioning(a#309, 200)
+- *(1) HashAggregate(keys=[a#309], functions=[partial_count(1)])
+- *(1) InMemoryTableScan [a#309]
+- InMemoryRelation [a#309, b#310, b#314, b#319], true, 10000,
StorageLevel(disk, memory, deserialized, 1 replicas)
+- *(3) Project [a#60, b#61, b#212, b#217]
+- *(3) BroadcastHashJoin [a#60], [a#216], Inner,
BuildRight
:- *(3) Project [a#60, b#61, b#212]
: +- *(3) BroadcastHashJoin [a#60], [a#211],
Inner, BuildRight
: :- *(3) InMemoryTableScan [a#60, b#61]
: : +- InMemoryRelation [a#60, b#61], true,
10000, StorageLevel(disk, memory, deserialized, 1 replicas)
: : +- LocalTableScan [a#15, b#16]
: +- BroadcastExchange
HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
: +- *(1) InMemoryTableScan [a#211, b#212]
: +- InMemoryRelation [a#211, b#212],
true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
: +- LocalTableScan [a#15, b#16]
+- BroadcastExchange
HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
+- *(2) InMemoryTableScan [a#216, b#217]
+- InMemoryRelation [a#216, b#217], true,
10000, StorageLevel(disk, memory, deserialized, 1 replicas)
+- LocalTableScan [a#15, b#16]
```
The output of this current pr is still different, so can you fix that way?
@onursatici
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]