[GitHub] spark pull request #19289: [SPARK-22076][SQL] Expand.projections should not ...

cloud-fan Tue, 19 Sep 2017 23:57:08 -0700

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/19289


    [SPARK-22076][SQL] Expand.projections should not be a Stream

    ## What changes were proposed in this pull request?
    
    Spark with Scala 2.10 fails with a group by cube:
    ```
    spark.range(1).select($"id" as "a", $"id" as 
"b").write.partitionBy("a").mode("overwrite").saveAsTable("rollup_bug")
    spark.sql("select 1 from rollup_bug group by rollup ()").show
    ```
    
    It can be traced back to https://github.com/apache/spark/pull/15484 , which 
made `Expand.projections` a lazy `Stream` for group by cube.
    
    In scala 2.10 `Stream` captures a lot of stuff, and in this case it 
captures the entire query plan which has some un-serializable parts.
    
    This change is also good for master branch, to reduce the serialized size 
of `Expand.projections`.
    
    ## How was this patch tested?
    
    manually verified with Spark with Scala 2.10.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark bug

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19289
    
----
commit 20ea0c4f3ad9c8916f09c344a0278889dc8c95bb
Author: Wenchen Fan <[email protected]>
Date:   2017-09-20T06:47:15Z

    Expand.projections should not be a Stream

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19289: [SPARK-22076][SQL] Expand.projections should not ...

Reply via email to