subject:"\[GitHub\] spark issue #15297\: \[SPARK\-9862\]Handling data skew"

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2018-06-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2017-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2017-02-09 Thread deswal-ajit

Github user deswal-ajit commented on the issue:

https://github.com/apache/spark/pull/15297
  
hi


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-12-11 Thread YuhuWang2002

Github user YuhuWang2002 commented on the issue:

https://github.com/apache/spark/pull/15297
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69990/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15297
  
**[Test build #69990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69990/consoleFull)**
 for PR 15297 at commit 
[`99b8305`](https://github.com/apache/spark/commit/99b830584aafb53112b5bdd2d723080fa19baa54).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15297
  
**[Test build #69990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69990/consoleFull)**
 for PR 15297 at commit 
[`99b8305`](https://github.com/apache/spark/commit/99b830584aafb53112b5bdd2d723080fa19baa54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-26 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15297
  
This is a really big change - and handling skewed data in joins is 
certainly an important consideration - have you considered making a design 
document and running it by the dev list? Maybe something similar to the 
recently proposed Spark Improvement Proposals process?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68448/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15297
  
**[Test build #68448 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68448/consoleFull)**
 for PR 15297 at commit 
[`1bb158b`](https://github.com/apache/spark/commit/1bb158b3035cd4f69dd2f47c26ef1c67bc5e6a6c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15297
  
**[Test build #68448 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68448/consoleFull)**
 for PR 15297 at commit 
[`1bb158b`](https://github.com/apache/spark/commit/1bb158b3035cd4f69dd2f47c26ef1c67bc5e6a6c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68446/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15297
  
**[Test build #68446 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68446/consoleFull)**
 for PR 15297 at commit 
[`8728d33`](https://github.com/apache/spark/commit/8728d334a79f3cb385937d61956ea47d2e9a4650).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15297
  
**[Test build #68446 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68446/consoleFull)**
 for PR 15297 at commit 
[`8728d33`](https://github.com/apache/spark/commit/8728d334a79f3cb385937d61956ea47d2e9a4650).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15297
  
**[Test build #68442 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68442/consoleFull)**
 for PR 15297 at commit 
[`b60f9bc`](https://github.com/apache/spark/commit/b60f9bc76763a0c149cb32bf8b3ab3f318a86635).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68442/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15297
  
**[Test build #68442 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68442/consoleFull)**
 for PR 15297 at commit 
[`b60f9bc`](https://github.com/apache/spark/commit/b60f9bc76763a0c149cb32bf8b3ab3f318a86635).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-08 Thread scwf

Github user scwf commented on the issue:

https://github.com/apache/spark/pull/15297
  
@YuhuWang2002 
We should limit the use case for outer join: 
For left outer join, such as A left join B, this implementation now can not 
handle the case of  skew of  table B.  That's because the result of join 
depends on the all data of the same reduce data of B, you can not split it to 
multi-tasks.

Similarly, for right outer join, such as A right join B, this 
implementation now can not handle the case of  skew of  table A. And for full 
outer join, we can not use the optimization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-10-25 Thread YuhuWang2002

Github user YuhuWang2002 commented on the issue:

https://github.com/apache/spark/pull/15297
  
I do some performance test between use skew join algorithm and not use skew 
join  algorithm.
I generate 2 table with 1/5 data skew in table S and 1/1 data skew in 
table R. Two table skew in same key.

spark.sql.adaptive.skewjoin.threshold   600
spark.sql.adaptive.shuffle.targetPostShuffleInputSize   500
record: S 1000 rows; R 1 rows
sql:
select count(*) from R,S where rid=sid and sname>'wang9' and rname > 
'zhang9';

skew algorithm : 167.695s
normal algorithm: 303.922s

R2_txt is 1 rows without data skew.
sql: select count(*) from R2_txt,S where rid=sid and sname>'wang' and rname 
> 'zhang9';
skew algorithm : 38.717s
normal algorithm: 114.21s



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-10-24 Thread YuhuWang2002

Github user YuhuWang2002 commented on the issue:

https://github.com/apache/spark/pull/15297
  
skewed join implementation suit for dataframe and sql statement
you will get 210 output files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-10-24 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/15297
  
Ok so are you saying this skewed join implementation doesn't apply to other 
dataframe operations, something like:

 val df_pixels = sqlContext.read.parquet("somefile")
val df_pixels_renamed = df_pixels.withColumnRenamed("photo_id", 
"pixels_photo_id")
val df_meta = sqlContext.read.parquet("somemeta")
val df = df_meta.as("meta").join(df_pixels_renamed, $"meta.photo_id" 
=== $"pixels_photo_id", "inner").drop("pixels_photo_id")
df.write.parquet("someoutputfile")

Where normally spark.sql.shuffle.partitions=X would configure the number of 
output files.  So in my example if I set spark.sql.shuffle.partitions=200 but 
skewed join use 210, what happens, how many output files would I get?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-10-22 Thread YuhuWang2002

Github user YuhuWang2002 commented on the issue:

https://github.com/apache/spark/pull/15297
  
@tgravescs : In join case,some like : select count(*) from A join B.  if 
the parameter spark.sql.shuffle.partitions=200 ,then we get 200 tasks output 
about 'count num', the output is not in HDFS but cache in spark . Calculate the 
sum of 200 tasks. we got the correct value.  If skewed. wo get 210 tasks  
output about 'count num'.  it's some processing about next step.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-10-21 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/15297
  
Ok so how does that affect the overall job and # of outputs?  I don't know 
the internals of Spark SQL so sorry if I'm missing something obvious.  
Basically now you will have multiple tasks whereas it used to use 1.  So lets 
say I have spark.sql.shuffle.partitions=200 to start with, the skewed join add 
tasks to process some skewed partition so lets say it runs 210 tasks, then lets 
say I save that to HDFS, do I get 210 output files or does it join those 10 
back into 1 again?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-10-20 Thread YuhuWang2002

Github user YuhuWang2002 commented on the issue:

https://github.com/apache/spark/pull/15297
  
@tgravescs ï¼
Thank you for your response, when a single reduce task handling huge data, 
it's slowly and unstable. so we split one reduce task to multi- reduce task.
A single reduce task doing like A join B. we split to multi-task. task 1 
doing A1 join B,  task 2 dong A2 join B and so on.  A1 is a part of A which 
read from a range of maps output.  For spark sql, it is the A1 as a  separate 
partitions when processing. so it can use mutil-executor to run the task.  for 
dispersion the process pressure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-10-20 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/15297
  
I haven't looked through the code in detail but can you clarify the design 
a bit on this, the design pretty much just says we are splitting up the fetch 
of the map outputs but it doesn't say what happens then or how this really 
solves the problem.  

You say that instead of doing A join B you are splitting it up to do 
something like A1 join B + A2 join B + â¦. An join B.  Is it still just one 
reduce task fetching it in separate chunks if so how does this fix the problem 
or is it treating each one of those fetches as a separate partitions when 
processing it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

30 matches

Site Navigation

Mail list logo

Footer information