[GitHub] spark pull request #21783: [SPARK-24799]A solution of dealing with data skew...

marymwu Mon, 16 Jul 2018 03:15:01 -0700

GitHub user marymwu opened a pull request:

    https://github.com/apache/spark/pull/21783


    [SPARK-24799]A solution of dealing with data skew in left,right,inner join

    ## What changes were proposed in this pull request?
    
       For the left,right,inner join statment execution, this solution is 
mainling about to devide the partions where the data skew has occured into 
serveral partions with smaller data scale, in order to parallelly execute more 
tasks to increase effeciency.
    
    ## How was this patch tested?
    Unit tests in DatasetSuite.scala
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marymwu/spark branch-2.3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21783.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21783
    
----
commit 2a01c813b6ef7223a489a4bcda3c9e5feb899060
Author: wangsm9 <wangsm9@...>
Date:   2018-07-16T09:48:44Z

    âdata skew code for spark2.3

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21783: [SPARK-24799]A solution of dealing with data skew...

Reply via email to