GitHub user marymwu opened a pull request:
https://github.com/apache/spark/pull/21783
[SPARK-24799]A solution of dealing with data skew in left,right,inner join
## What changes were proposed in this pull request?
For the left,right,inner join statment execution, this solution is
mainling about to devide the partions where the data skew has occured into
serveral partions with smaller data scale, in order to parallelly execute more
tasks to increase effeciency.
## How was this patch tested?
Unit tests in DatasetSuite.scala
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/marymwu/spark branch-2.3
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21783.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21783
----
commit 2a01c813b6ef7223a489a4bcda3c9e5feb899060
Author: wangsm9 <wangsm9@...>
Date: 2018-07-16T09:48:44Z
âdata skew code for spark2.3
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]