GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/19289
[SPARK-22076][SQL] Expand.projections should not be a Stream
## What changes were proposed in this pull request?
Spark with Scala 2.10 fails with a group by cube:
```
spark.range(1).select($"id" as "a", $"id" as
"b").write.partitionBy("a").mode("overwrite").saveAsTable("rollup_bug")
spark.sql("select 1 from rollup_bug group by rollup ()").show
```
It can be traced back to https://github.com/apache/spark/pull/15484 , which
made `Expand.projections` a lazy `Stream` for group by cube.
In scala 2.10 `Stream` captures a lot of stuff, and in this case it
captures the entire query plan which has some un-serializable parts.
This change is also good for master branch, to reduce the serialized size
of `Expand.projections`.
## How was this patch tested?
manually verified with Spark with Scala 2.10.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark bug
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19289.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19289
----
commit 20ea0c4f3ad9c8916f09c344a0278889dc8c95bb
Author: Wenchen Fan <[email protected]>
Date: 2017-09-20T06:47:15Z
Expand.projections should not be a Stream
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]